In Defense of Purist Skill Rating

Weds. Jun 21st:

Intro:

This essay will defend a vastly simpler implementation of Skill Rating adjustment than currently exists in Overwatch’s Ranked Matchmaking. I will suggest that removing all influencers of Skill Rating besides winning & losing (adjusted to game difficulty) will result in a number of improvements to the Ranked Matchmaking experience, especially with an eye towards the OWL and the eSports possibilities for Overwatch in general.

Incentives & Behavior:

Most game theoretic models begin with a simple assumption termed ‘rational self interest’, or the idea that individuals will take the course of action which most benefits themselves. This assumption is imperfect, as humans have been repeatedly shown to exhibit altruistic and pay-to-punish behavior patterns in empirical studies. However, broadly speaking, the notion that people will act in service of their own goals is a plausible one. It is especially so in an online context that lacks face-to-face empathic accountability.

Beginning from rational self interest, then, we can understand and predict the behavior patterns of players in Overwatch by examining the incentive structures that they face. Furthermore, alterations to these incentive structures have the power to dramatically change the decisions players make and even the mindset with which individuals approach the game.

The most clear and impactful incentive that Overwatch players (or at least those that choose to play Ranked Matchmaking) face is Skill Rating (hereinafter ‘SR’). Rising through the ranks feels satisfying and validating, placing in a top division can be a status symbol, and a high top-500 placement might even land you tryouts to play professionally. Naturally, then, many players are highly incentivized to seek to maximize their SR.

Skill Rating Maximization:

SR maximization will always be an incentivized behavior pattern. People want to be highly skilled, but more than that they want to appear to be highly skilled. This distinction seems small but is in fact very important. Crucially then, the key motivation for many (especially for the vast majority of players who will never compete in an eSports context) is to reach the highest SR that they can. This should be juxtaposed against the incentive to become the best player one can be: seeking to have the maximum impact upon a given team’s win probability (i.e. the eSports motivation).

Ideally then, the SR system should be set up such that ‘SR maximization behavior’ guides players to make the sort of decisions that positively impact the community and create the best gameplay environment possible. In my judgement, such an ideal system would align the SR maximization behavior with the eSports motivation, especially with an eye towards the Overwatch League. The current system fails to accomplish this alignment.

One Trick Players (OTPs):

While ‘one-tricking’ is not a behavior that I think should be actively discouraged or disallowed, I contend that it’s also a behavior that shouldn’t be specifically incentivized. In my view, the ideal system would be entirely equivocal towards OTPs.

Consider a hypothetical Mercy OTP (anecdotally the most commonly one-tricked hero, although I don’t have data that support this) who has reached a very high SR with essentially no other heroes played.

The current SR system rewards players who are playing at a high skill percentile compared to other players on that hero. This comparison is drawn not within one game instance, but rather across the entire dataset of all Ranked Matchmaking time played on that hero. What this means for our hypothetical Mercy OTP is that, so long as he/she plays better than other Mercy players, lost games will net a smaller SR drop and won games will net a larger SR gain. This impact is so significant that winning vs. losing is in fact a secondary concern to the ‘Mercy percentile’ our OTP is playing at.

We’ll get back to our hypothetical OTP in a moment, but now let’s take a step back to examine the bigger picture. The current SR system is crucially problematic for many reasons, but I’ll focus on two: (1) statistical judgements of skill are weak (for some heroes more than others) and (2) it leads different players to have different incentive structures.

(1) Statistical Judgements of Skill Are Weak:

The strength of this proposition is such that I’ll use the best counterexample as my own starting point: McCree. He is a hero with extremely low utility, extremely low survivability, and extremely high damage potential. A player with high accuracy, high damage per minute, and few deaths per minute is very likely to be a higher impact player than someone with weaker statistics. Such a player is minimizing McCree’s weaknesses (i.e. avoiding death) while playing to his strengths (high damage output). It is very likely that such a player is contributing more to an average game than a player with worse statistics. Even for McCree, though, these statistics are imperfect. Is a given player’s damage relevant? How often is he/she spamming enemy heroes without any plausible follow up (i.e. feeding ultimate charge to enemy supports)? A player who hits a few precise shots to pick a key player at a key moment (e.g. a support at the beginning of the fight or a DPS who is preparing to ult) is inarguably much more impactful to securing wins than one who merely sits in the back making poor focus decisions, yet the latter player would be statistically superior by the previously stated standards.

We can apply this same analysis to quite a few heroes, revealing that statistical judgements of skill become weaker and weaker as we move from the most mechanically demanding heroes in the roster to those with very little ‘traditional FPS skill’ requirements. Even a hero such as Roadhog demands a deeper statistical evaluation to really get at skill. One must weigh damage per minute and survivability against damage taken, as a great Roadhog knows how to minimize his exposure and with it the rate at which he feeds the enemy team ultimate. There is no magic formula to successfully achieve such a balancing act. How can one statistically capture the impact of a Whole Hog that prevents a Dragonblade and a Primal Rage from destroying one’s backline (while doing very little damage and earning no kills)? In a game as complex and decision-rich as Overwatch, I don’t see a way that these judgements can be made accurately and reliably by a predetermined formula.

The ultimate example of how useless statistical measurements of skill are–and how bad percentile-based SR adjustment can be–is of course my favorite foil Mercy. The impact of virtually every aspect of Mercy’s kit is poorly captured by statistical measurements. Hitting a 5 player Resurrection that is responded to by a 6 player Earth Shatter or Graviton Surge is in fact game losing. The statistics show a high ‘resurrected players per ultimate cast’ while the reality in game is that the enemy team just farmed MULTIPLE new ultimates. The entire HP pool of your composition just went into the enemy team’s ultimate bank TWO TIMES OVER. I can’t really overstate how bad it is to make a poor decision about using Resurrection. In these cases, not only would it have been better to save one’s own ultimate, but also it would have been better to disconnect from the server and let your team play 5v6 because at least then you would have had a chance to swing Ultimate tempo. Even if there is no immediate Ult-response to a big Resurrection, if your team fails to win the fight the situation is the same: massive Ultimate tempo swing to the opposing team. Very often, the most impactful Resurrections are instant casts to revive one key player that just died (because the opposing team has often expended cooldowns and cannot kill them again). Thus, playing to maximize the statistical measurements of Resurrection (i.e. waiting for a big Res) is in fact seriously detrimental to the success of the team.

Resurrection is furthermore a relatively weak support ultimate because it requires your teammates’ deaths instead of preventing them as all of the others do (once again Symmetra is not a support). Thus a very smart Mercy player actually chooses not to heal in many scenarios so that her support partner can get his/her ultimate faster. Heals per minute is therefore a fickle statistic whose maximization does not reliably communicate skillful or intelligent play.

Low deaths per minute and high damage boosted are the only statistical measurements of Mercy play that I see as actually meaningful, as these statistics communicate intelligent play and impact maximization. Solo kills with the pistol are also probably quite meaningful, but of course a Mercy player who seeks these out at poor times would be called a thrower. It’s not that Mercy is a ‘no-skill hero’, the key problem is that skillful Mercy play is almost never communicated by impressive stats. Even these statistics I mention as impactful fail to even come close to telling the whole story of player skill and game impact.

(2) Failure to Align Incentives:

Not only are OTPs highly incentivized to  by the current SR system to continue one-tricking and to play for statistical maximization over wins and losses, these incentives are crucially opposed to the incentive structure that flexible players face. A flex player knows that he/she won’t be playing at the far right tail of his/her heroes’ skill distributions because his/her mastery of the game is spread across many heroes and many situations. The flex player seeks to achieve a high SR by playing the perfect hero imperfectly while the OTP seeks to achieve a high SR by playing the imperfect hero perfectly. While I don’t think that either of these strategies is deserving of punishment, I think that its important that the system not prioritize one over the other at any echelon of SR.

In the current system, the flexible player must maintain a higher win percentage (abstracting away from game difficulty) to reach the same SR as the OTP. This is deeply problematic in my eyes, as I see hero swapping as a fundamental part of the game. If an OTP doesn’t wish to engage with hero swapping as a part of gameplay, that’s fine, but their SR should reflect that choice. The same goes for players who don’t wish to engage with communication as a fundamental part of the game: you don’t have to talk, but if you lose games because of it then that is on you and ought to be reflected in your Skill Rating. A truly great player has the knowledge, intelligence, and decisiveness to pick the right hero for the right situation, filling in the gaps of his/her team composition while at the same time countering opposing composition decisions. Not every player has to aspire to be the greatest player of all time, but in my view the entire purpose of having a Skill Rating system to begin with is to measure and validate that very pursuit of greatness.

Suggestion:

Incentive alignment is a goal very worth of pursuing. When all players have the same goals, the potential for toxicity is greatly diminished (though certainly not eliminated). I personally find it quite frustrating to queue into Ranked Matchmaking with the goal of winning games, only to find other players do not share the same incentives. At the very top of the Skill Rating system, one should find other players that want to win games, not those that wish to engage in roleplay. This isn’t to say that OTPs can’t be good or impactful to winning games, my argument is rather that OTPs should be judged by their wins and losses rather than by the extent to which they engage in one-tricking. The current system punishes adaptation and experimentation vastly more than it needs to.

There is only one way to guarantee that every player has the same incentive: strip away all of the hidden formulas and percentile adjustments. Only when each player has only one incentive–to win–will incentive alignment truly come about. The only thing that should impact the SR consequences of a win or a loss is the relative skill of each team. Win a hard game and you should clearly be rewarded more than for winning an easy game, vice versa for losses.

The meaningfulness of Skill Rating is especially important as it is the only clearly available measurement of player skill outside of actual eSports experience. With the Overwatch League on the horizon, the time is now to restructure the system such that the very best rise to the top and have a fair shot at becoming professionals. Right now, the only way to scout talent is to do it on an individual, observational basis. Look at Dota 2, you will see fresh talent rising out of Ranked Matchmaking and being given a shot at a professional career simply for reaching the very top of the ladder. That’s because their MMR system answers exactly one question: ‘how good are you at winning difficult games?’

If I worked at Blizzard, I’d be demanding a HARD Skill Rating reset at the end of this season and an entirely purified win-loss SR adjustment regime going forward. If Blizzard really wants the best of the best to get their chance at fame and fortune in eSports, then there really is only one way.

Counterarguments:

The existence of percentile SR adjustment is primarily, in my understanding, to combat smurfing (or the purchasing of new accounts to play at a lower level than one’s true skill). Want to get serious about smurfing, Blizzard? IP & MAC check new accounts and tag them for evaluation while adding a report option for suspected smurfs to cross reference: if you can statistically target and punish throwers then there is no reason you can’t statistically target and adjust smurf accounts. It’s fine if statistical adjustments are used in exceptional and targeted cases, just get rid of them as the default for the entire player base.

“But I wanna one trick!” Go right ahead. No one can (or should) stop you. But if you lose games because of it, don’t expect special treatment. OTPs don’t deserve punishment, but they certainly don’t deserve specific rewards over players who choose to engage with hero-swapping as a fundamental and crucially necessary mechanic in Overwatch. This is especially the case as Blizzard is beginning to employ SR as a way to qualify for tournaments (see: OW Open) and they seem to be considering it as a potential scouting mechanic for new talent once the scene is more established.

To Blizzard: fix it now, or condemn the eSports potential of Overwatch in the long run.

 

EDIT: An earlier version of this article referenced Contenders as an example of a SR-gated tournament. This is inaccurate, as Contenders was never SR restricted. Rather it is the Overwatch Open that Blizzard is requiring a certain SR for.

The Fundamentals of Balance

Wed. June 14th:

Intro

This essay will focus on the balancing philosophy expressed in Overwatch. It will analyze the achievability of the Overwatch team’s apparent desire to achieve balance in all skill brackets simultaneously, and make a few suggestions to this end.

A Philosophical Problem

Before one evaluates the success and failure of specific Overwatch patches, one must establish a clear value–a metric by which to gauge the degree to which a change makes the game better or worse.

Jeff Kaplan, in an AMA three months ago, described the Blizzard approach as a ‘triangle’. “I feel like there are 3 key factors that guide us: The players, statistics and… us… our own feelings as players.” He continued on to add that “Internally, we have a ‘competitive’ playtest that’s helpful to get good feedback from Diamond+ players who work here […] None of this is perfect… but we try hard to listen to feedback and keep the game balanced.”

Ultimately, the system Jeff describes here (also confirmed by other Dev posts on the battle.net forums) is one that seeks to achieve relative balance throughout the skill spectrum. All three points of his triangle belie this reality: player feedback, developer intuition, and even statistics to some extent abstract away from player skill. Keeping a sharp eye on professional pickrates would be importantly revealing, but at the very least it isn’t clear that this is happening. The notion of balance-for-all seems nice enough prima facie, but further analysis reveals a considerable challenge to successfully implementing this broad balance goal.

This fundamental challenge is skill curve differential. Different heroes in Overwatch have remarkably different rates of return on skill growth investment; this is to say that they have significantly distinct skill curves. I use ‘skill curve’ here to mean the rate at which performance (i.e. game impact) increases with constant skill growth.

To illustrate the skill curve differential problem, consider two heroes: Genji and Junkrat. Now consider two players corresponding to each of these heroes. One of each in the 10th percentile of skill (worse than 90% of players) and one of each in the 90th percentile of skill. (worse than just 10% of players). The 90th percentile Junkrat is certainly more impactful than the 10th percentile Junkrat player, but the gulf between the 90th and 10th percentile Genji players is vastly larger. The 10th percentile Genji player is a glorified rock-slinger. Unable to consistently leverage dash resets or find high value reflects, he or she has far less game impact than the 10th percentile Junkrat player. When we reach the right tail of the skill distribution, however, exactly the opposite situation persists. Against strong opponents, Junkrat lacks high-level outplay options and is ultimately left to punish misplays or exploit weak links in the opposing team. At the very highest levels of professional play, this is why he is essentially unplayable outside a very small niche. For our high level Genji player, it is a different story. The design of the character yields exponential gains to game impact resultant from skill growth: as accuracy, speed, and aggression increase so do mobility and longevity in a positive feedback cycle.

Every character has a skill curve of some slope, that is, there is no hero which can honestly be said to require ‘no skill’. However, one can see the skill curve differential problem even embedded in core hero statistics. Ana and Mercy are both powerful single target healers (comparing two different heroes will always be comparing apples to oranges, but hopefully this example is nonetheless illustrative).

Heal Rates: (source: owinfinity.com)

ANA: 75 healed per shot * 1.3 shots per second * (% accuracy) = Effective Heals per second

MERCY: 60 = Effective Heals per second

These Effective Heals per second values equalize when the Ana player’s accuracy reaches ~61.5%. That is to say that an Ana player with accuracy lower than that value will heal less per second than a Mercy and an Ana with higher accuracy will heal more per second. The point of equalization isn’t particularly important, but the fact that Ana is able to do her central job as a healer (healing) faster and more reliably the higher her accuracy gets reveals that her skill curve is steeper than that of Mercy. This cuts both ways; at the far left tail of the skill distribution (where % accuracy values are generally much lower) Mercy outputs more heals per second than Ana. Mercy has important decisions to make in order to maximize her survivability, but Ana’s self defense options are no less complex or skill dense (in fact they, like her healing rate, are significantly more responsive to skill increases)

Ana and Mercy, Genji and Junkrat: contrasting these pairs reveals the central difficulty of simultaneously satisfying players across the entire skill distribution. Professional players lament that Junkrat is meme-trash-tier in organized competitive play while he simultaneously reigns as the uncontested King of Brawl Winnin’ and The Silver Division. Ana’s winrate, meanwhile, steadily climbs with skill tier from a tragic 38.9% in Bronze to a respectable 51.9% in Grandmaster.

Nowhere are the consequences of the skill curve differentials more apparent than when comparing Ranked Matchmaking (of any level) to organized professional play (hereinafter ‘eSports’). Mercy, statistically speaking, performs well (above 50% winrate) all the way up to the top few percentiles of Ranked Matchmaking with a remarkably high pick rate. In professional play, she goes virtually untouched outside of the Pharah + Mercy combo.

This difference is a consequence of an added challenge of balancing popular online games that are also eSports. True coordination (in composition choice and game style) radically changes the way the game is played. Because Resurrect is fundamentally reactive, high level teams will often simply not allow an unsupported Mercy to garner value from her ultimate. She will be hunted by flanking DPS while the rest of the team intentionally staggers kills or saves ultimates to reduce the effectiveness of any Resurrect the Mercy is able to cast. As someone who has spent every season of Ranked Matchmaking at the very highest level of play, I can attest that these sort of plays are rarely if ever made regardless of the skill level of players on either team. I contend that an important reason Ana is so weak in low-tier play is that she demands coordinated protection to fully leverage her abilities (coordination that is virtually nonexistent at low level play). Likewise Mercy is incredibly punishing of undirected or uncoordinated play. Fail to hunt her down at the proper time or forget to save a key ultimate to counterplay Resurrect and a teamfight is quickly lost.

So what can we do? How can Overwatch feel fresh and full of optionality in an eSports context while also remaining balanced and enjoyable to play for those further to the left on the skill distribution?

Moving Forward

Skill curve differential isn’t going anywhere, and in my opinion it shouldn’t. Blizzard intelligently marketed Overwatch much more widely than the traditional first person shooter target audience. This wasn’t just a marketing strategy though; the game design purposely features heroes, for instance Mercy, that aren’t so demanding of traditional arena shooter skills and rather allow positioning and decision making to determine game impact. In the long run, I think that this is a good thing. Purity is the enemy of innovation while community stagnancy is in direct opposition to promotion to a wider audience (something absolutely critical to achieving a public perception of legitimacy for eSports and even gaming as a hobby).

The only important question that remains is how to rise to the challenge of balancing for diverse skill tiers simultaneously. The approach that I’d like to see taken more often is the differentiation of mechanical changes and statistical changes.

Sometimes a number gets into the game that is simply broken. Bastion’s 35% value for his Ironclad passive springs to mind as a classic example of “utterly fucking busted”. Sometimes a character just doesn’t have the stats to compare favorably against his/her/its closest substitutes; pre-buff Soldier 76 is a good example. I don’t have date-accurate statistics for the strength of these heroes across skill tiers, but I contend that pre-buff Soldier 76 was probably too weak at every point on the skill distribution and pre-nerf Bastion was probably too strong at every point on the skill distribution. For these kind of across-the-board balance issues, statistical adjustments are warranted as they will have similar impacts on players of all skill levels.

These are the easy variety of balance problems. For the more complex varieties, a mechanical change in isolation or a combination of mechanical & statistical changes is necessary.

A strong example of a very good combination buff is the recent (live) patch to Hanzo. Hanzo felt a little too weak across the board, but at a high level aggressive compositions came to render him nearly obsolete. The 10% charge time buff to Hanzo is significant, but I would argue that even more impactful for high level players is the ability to hold a charged arrow while wall-climbing and to spawn your Dragonstrike early if the arrow collides with a wall. These changes make the space of options for Hanzo players significantly wider and enable much more aggressive and independent play. However, this kind of freedom doesn’t aid those who aren’t ready to use it. The change in totality made Hanzo players of all skill levels slightly stronger but had a significantly greater impact on expert players who can most creatively leverage the new mechanics. Widening the space of options doesn’t make a big difference to players who weren’t already pushing the boundaries of how a hero can be effectively played.

We can use this same mechanical vs statistical differentiation to better examine the past Genji nerf that removed his ability to triple jump in one continuous airtime via wall climbing. For low tier players who weren’t even aware of this possibility, the change had virtually zero impact. For high tier players who were exploiting it as often as possible to maximize mobility and survivability, the change had real consequences to Genji’s overall strength and playability. My honest assessment of the Overwatch development team is that they never thought about these differential effects and instead saw the triple jump as just an unintended bug to be patched out. The ledge-dash-super-jump mechanic was probably thought of similarly, and patching it out only really affected the few hundred (I doubt it was really this many) players who could hit it reliably enough to implement it as part of their play style. The important lesson here is that these pure-mechanical patches had radically different impacts on players of different skill levels.

These two examples provide a powerful blueprint for the formulation of balance adjustments that demand different impacts upon different skill tiers:

If a hero is in a good place for low-skill players but too weak for high-skill players: widen the option space by loosening mechanical restrictions and let creativity and talent shine through as increased game impact by high-skill players.

If a hero is popular and strong in the hands of high-skill players but a bit weak when used by newer players, combine a statistical buff with a restriction of option space. Make the hero more narrowly defined and yet more powerful within that narrow role. This variety of change must be done most carefully, though, as elite players will always seek to exploit any statistical buffs to their maximum potential even if it requires playing the hero in a radically different way (see the most recent attempt at nerfing Lucio).

That’s the theory, but here are my resultant suggestions for real balances changes. Feel free to leave feedback on the article as a whole or just these ideas! My twitter is @jake_overwatch 🙂

Suggestions:

Bastion is a worse choice than soldier 76 virtually 100% of the time in high level play, but has a comfortable niche at median and below skill. Remove the self-stun upon Tank Transformation and when returning to Recon mode to allow for more aggressive initiations and the option to use Tank Form as effective counterplay in a fight. Also remove or adjust the Self-Repair animations that block the crosshair (that shit’s just annoying, yo). Average players will play just like they always have, but those smart enough to leverage these adjustments into a much more aggressive style will reap the rewards.

Widowmaker has felt incredibly map-dependent across the skill spectrum even after her charge-time buff. Decrease Hookshot cooldown (I suggest by 2-3 seconds) to increase mobility and escape options versus the dive composition that has come to define the meta. It is very dangerous to buff this hero with pure DPS, but giving her a slightly less narrow role might help her pick and win rates with skilled players.

Junkrat is an effective spammer that applies a ton of pressure to slow team compositions. His ultimate is reasonably effective against newer players but rarely finds sufficient utility in high level play to justify what is very often a suicide play. Give the Rip Tire a new ability (activated with whatever key is bound to Ability 1) that allows it to hop into a drift (yes I do mean cart-racer style) with a short cooldown. This will give stronger players options to bait out counterplays and reasonably juke players with moderate aiming skill while being difficult to abuse by those lower-tier players that don’t have a precise understanding of which counterplays they need to bait and which enemies they need to juke.

(maybe I want to roleplay Junkenstien)

Quality over Quantity: Revisited

Sun. June 11th

 

I read the comment thread on /r/competitiveoverwatch and thought I should make this update to discuss two really crucial lines of argument that I noticed throughout the reddit thread and in people’s responses to my twitter.

Criticism 1: Jake, your system is idealistic. Players who refuse to play healers or tanks in game will still check those boxes just to get faster queue times. This will leave their teammates in the lurch and ultimately ruin the system.

Criticism 2: Jake, your system would encourage one-tricking and diminish the rewards accrued to versatile players.

In response to C1, I argue that the motivation not to abuse the system is inherent to its design. Lets say I’m a Bastion only one trick. I won’t play anything besides Bastion for any reason. I could check the healer & tank checkboxes–clearly justified in this instance because Bastion is both :)–but I won’t because I want the system to work. No matter what role you want to play, you have a better chance of a competitive and fun ranked match if your team has a more balanced composition.

For those who worry that true griefers/trolls (those trying to lose from the outset of the match) might abuse the system to maximize trolling potential, I would argue that the added impact is relatively small. If I were to pick Roadhog and self-heal in front of the enemy team every time I spawn, my team isn’t going to win. It really doesn’t matter what our composition is or who is willing to flex to what. I’ve actually done this exact thing to ensure that a hacker on my team loses. Even an aimbot isn’t enough to 5v6 a team that gets fed 900 hp of ult-charge per Roadhog spawn. Overwatch is a team game, one person aggressively trying to lose is more than enough to achieve that goal in the current system. Personally, I have not seen many players truly griefing in this extreme sense. Most ‘griefing’ comes from people being tilted about team composition or teammate performance in my experience.

C2 is a bit more tricky, though I would argue my system is nonetheless well designed to encourage versatility. Anyone who has played Overwatch for a significant amount of time can recognize that one healer and one tank is not the strongest team foundation in nearly any scenario. Even if your team already has one of each guaranteed by the matchmaker, there is still tremendous room to increase the strength of your composition by adding a second healer or more tanks. Versatile players can still accrue value from their diverse abilities under the role-queue system I defend.

Regarding the encouragement of one-tricking, I think of the system as a response to the prevalence of one-tricking rather than a cause of it. In the status quo, I already see a very high incidence of one-tricking a hero or, even more commonly, a role. It is the rare player who plays DPS, Tanks, and Healers all at an equivalent level. The vast majority of people, in my experience, have a significant majority of their playtime spent in one role (if not one hero). If the current system is punishing one-tricking relative to the system I propose, then it’s really doing a terrible job.

There is a deeper philosophical question here, though. Is one-tricking an acceptable way to play Ranked Matchmaking in Overwatch? Should it be discouraged? I would argue that, regardless of the answers to these questions, it cannot be stopped without a tremendous cost to the creativity that makes hero-driven shooters so fun. One-tricking happens in every game with character or weapon selection: it’s human nature to have preferences and some people really love to maximize their skill in a very narrow category rather than experience all the possibilities the game has to offer. If you can only play one hero, you don’t have much hope of going pro (except Lucio, but maybe Blizz will figure out how not to buff that hero someday). In my view, thats OK. Not everyone aspires to play professionally; people come to the game for really different reasons, even at the far right tail of the skill distribution.

The best way to design the system, in my view, requires accepting that there are many different types of players with many different motivations. Fighting to change people is a losing battle, why not build a system that offers fun matches whether you want to one-trick or flex every role as your team needs it?

 

 

P.S. Shoutout to /r/competitiveoverwatch for the great feedback and response! I’ll be back next week with another article, although I’m not sure exactly which topic to pick just yet. Tweet me some suggestions! (@jake_overwatch)

P.P.S. Some people suggested a DPS check box in addition to my suggestion. My main resistance to this suggestion is that Blizzard has done a really poor job with the hero classifications in the DPS role. Hanzo is, at least at a high level, unpickable on defense but sometimes viable when attacking. Many of the defense characters are like this, due to their one-dimensionality they are easily counterpicked and so are poor choices when actually defending. On offense though, they can exploit weaknesses in defending teams locked into their composition. Clearly every team does not need one Offense character and one Defense character in the same way that every team does need one Healer and one Tank. In my view, the problems with hero classification in the DPS role need to be solved before a system could be implemented that would specifically indict the failures of this existing hero classification system.

 

Quality over Quantity

Sat. June 10th

Intro

This essay will defend a limited role queueing system for improving the overall experience of (exclusively) Ranked Matchmaking in Overwatch. I recognize the importance of minimizing queue times, but hold that the concessions to such a system–properly designed and well implemented–are quite small relative to the potential gains in match quality and the ability of the Skill Rating system to accurately track skill.

The Fundamentals

I don’t defend any system that enforces the specificities of the metagame. By metagame here, I mean the current team compositions and play styles found to be the strongest in a given patch by the esport side of Overwatch. However, I do think that there are some propositions so fundamental to the design of the game itself that they remain constant across all metagames, patches, and skill levels.

P1: A team composition that features at least one healer hero will be significantly more effective than any composition with zero healing heroes.

P2: A team composition that features at least one tank hero will be significantly more effective than any composition with zero tank heroes.

P1 is, in my judgement, very plausibly true at every skill level. If one was to measure the power of a team at any given moment in game, total heath pool as a percentage of maximum would be the most impactful variable in the formula. Support characters (except Symmetra who has been misclassified since her rework) hold the vast majority of the responsibility for the regeneration of this absolutely crucial resource.

P2 is slightly more controversial. It is slightly harder to see the truth of P2 because tanks primarily contribute to the relative aggregated resource pool of their team by stymieing opposing attempts to diminish it rather than by increasing it directly or reducing that of opponents. Nonetheless, show me a team composition with no tanks and I’ll show you a composition that has a directly superior counterpart.

The Suggestion

I hear and respect the concerns voiced by the Overwatch developers themselves and by the community at large. I would hate to see any system implemented that would hinder the ability of players to be creative. That’s why my implementation is simple, unobtrusive, and sharply limited in scope.

The User Interface requirement of the role queue system I imagine would be two check boxes. These boxes would be marked ‘Healer’ and ‘Tank’ respectively. Players could check neither, one, or both of these options depending on their predilection for different roles. The matchmaker would then ensure that any potential match includes at least one player who has checked ‘Healer’ and at least one (distinct) player who has checked ‘Tank’ on each team.

Thats it. No hero restrictions, no indication to teammates of who has checked which box(es), no metagame enforcement beyond the one tank one healer minimums.

The Argument

Overwatch is radically deep and continually surprising. To this day I continue to see creative players play and win the game in ways previously unimagined. Sometimes even your healers should respawn as Mei or Tracer for a game-changing point contest. The game should never limit the options of players to make these kinds of creative decisions on the fly. Furthermore, any proposed systemic change that affects the matchmaker must weigh the impact on queue times against the theoretical match quality improvement.

I would contend that such a system would have a very small negative impact on queue times. In the majority of my games, propositions one and two are never questioned because both teams virtually always have team compositions that satisfy them. Thanks to Overwatch’s quality game design, Tanks and Healers are fun and rewarding to play compared to other games’ implementations of these roles. This diversity in player preferences means that most matches the matchmaker considers will likely already satisfy the one healer + one tank minimums, thereby having no impact on the queue times of these matches.

In some games, though, these compositions are a result of mutual understanding rather than natural player preference. In others, natural player preference trumps desire to win and extremely poor team compositions are fielded. It is in these games that I would argue the matchmaker has erred.

Consider a hypothetical matchup: Red vs Blue. The variables looked good enough: the players had been waiting for some time and of course the matchmaker hates to delay. Excitedly, it assembled two teams of 6 that, oh joy, had equivalent average MMR! In the eyes of the matchmaker, this is a perfect 50/50 game. The best possible way to measure the relative ability of the players of Red Team versus those of Blue Team. This time, though, something is wrong. Blue Team has tragically found itself with six Mercy-Only roleplayers! While Red Team readies it’s aggressive dive composition (featuring Winston and Lucio as its core enablers) and prepares a strategy to assault the first point, Blue Team is mired in an extended discussion of who has not yet polluted his/her career profile with non-mercy play time.

I believe that the players on Blue Team, roughly 60 seconds after the gates open, would prefer to have waited a bit longer in the queue so that they could each find a team that would permit them to victoriously fulfill their healer fantasies. Perhaps Red Team enjoys such a matchup, but even they fall victim to an artificial inflation of their skill rating. Were Blue Team distributed across a few different matches, each could have used their talent for supporting to defend their teammates against the assault of the Red Team and perhaps emerge victorious. It is not the ability of Blue Team that has led to their defeat, it is simply their misfortune to have specifically been placed together. In this sense then, all 12 players’ Skill Rating has been distorted from its ‘true value’ as a consequences of the randomness inherent to the current system. In this sense then, the match was bad from a system wide perspective. I would contend that the playerbase is intelligent enough to desire the defense of such small guardrails

Some might worry that this argument could theoretically be extended to defend even a full metagame enforcement system. However, the concerns of creativity in exigent circumstances and the incredible diversity of situations in Overwatch would make such a system disastrous. The extremely limited implementation of the one healer one tank minimums that I defend is intended to slightly increase the lower bound on match quality without significantly negatively impacting queue times or player freedom. I would argue that those extra seconds spent in queue when a ‘quality match’ under the current system is rejected for the failure to meet the one tank one healer minimums are worth it from a player perspective. No one wants to play without a tank to peel for them, no one wants to play without a healer to restore their HP and enable their plays.

Conclusions

My suggestion rewards players who are most willing to adapt to the needs of their team with shorter queue times, while ensuring that those who are less willing nonetheless find themselves in ‘winnable’ games. Skill Rating distortion would be reduced, yielding further gains to the quality of the matchmakers judgement and thereby the quality of the player experience. The cost is small, the gains are high. In my judgement, this system or something similar would positively impact the player experience in Ranked Matchmaking in Overwatch.

Leave a comment with your thoughts (or feel free to mock my spelling and/or grammar)!