Something to Strive For

Or: Rewards for dedication are severely lacking in Overwatch

Rancor is in the air. Many are calling Season 5 of Ranked Matchmaking in Overwatch the worst since S1’s disastrous coin flips. Due to the total lack of transparency from Blizzard, it is unclear if the Matchmaking algorithm has in fact changed (and led to genuinely worse matchmaking) or if tensions are simply reaching a boiling point. Either way, the potential for disruption in the market for competitive Overwatch matchmaking has never been greater.

The reasons are obvious: one-tricking remains a behavior officially accepted by Blizzard, the punishment system feels toothless, and the Skill Rating algorithm is embarrassingly manipulable. 1/10 Ranked games feel competitive and interesting on a good day.

Even if these glaring failures are rectified, the prioritization of queue time minimization has left striving for the top of the ladder feeling deeply unrewarding. Overwatch, from a fundamental game-design perspective, is the eSport with the greatest demand for constant coordination. Games like CS:GO, Dota 2, and League of Legends reward coordinated executions and smart team play, but Overwatch demands it constantly. True 1v1s are incredibly rare and virtually every fight is decided with crucial contributions from many players. As individual SR presses past 4300, however, wins and losses are decided by carry play and team coordination goes out the window. When a 46-4700 rated player solo-queues into a game, it is virtually impossible that his/her teammates will be able and willing to keep up. Although queue times stay relatively fast with this system, it feels as if the matchmaker asks only the question of which team will more effectively stymie the efforts of their one or two carry-players. There’s no value in a brief queue time if the majority of matches are poor quality.

As a player at this skill range, these sort of games are incredibly frustrating. Although Ranked Matchmaking will never perfectly simulate an organized competitive environment, its power to shine the spotlight on new talent (as in other eSports titles) is directly correlated to the degree of similarity it can achieve. One of the most compelling parts of eSports is its accessibility. There is a sort of egalitarian charm to the idea that anyone can make a name online and earn a chance to be rewarded for their dedication and skill. Overwatch is failing terribly in this respect.

A competitor to Ranked Matchmaking (similar to the offerings of Faceit or ESEA in other games) may be the best path forward. There is tremendous demand for a more meaningful proxy to true competitive Overwatch, both from established professional players and from those who wish for a legitimate arena in which to display their potential. Something as simple as a captain’s draft system or a classical Elo measurement would yield a product far superior to what Blizzard has produced.

Beyond prizes and external motivations, I know that I would personally pay for a subscription just to guarantee a consistently serious and competitive mindset among my teammates. Ranked in its present state is a remarkably poor environment in which to practice the most important skill of Overwatch: team play. I had hoped Blizzard would act faster, but the deterioration of the past few seasons makes one thing strikingly clear: Blizzard’s game development priorities seem to put Quickplay on par with Top 500. For the organic growth of the eSport in the long term, the need for something to strive for is greater than ever.

 

 

P.S. My apologies for the delay between articles. I’m taking college courses online now in order to finish my degree (on top of World Cup practice), so my time is a bit more constrained than usual. As always, let me know what you think in the comments and on twitter at @jake_overwatch

The Perfect Meta-game

And how to achieve it.

It’s time to go there. This piece will be structured with relatively simple contentions, the defense of which aims to construct a coherent set of guidelines for how to balance Overwatch most efficiently and effectively.

For those with a bit less eSports savvy: in Overwatch, the ‘meta-game’ is comprised of the sum of expectations about which team compositions are strong in certain situations.

Firstly, the standard:

Claim 1: The degree of freedom that a meta-game instantiates is the best available standard by which to evaluate its quality.

I contend that the ideal meta-game consists of the maximum amount of competitively viable team compositions and styles of play. There is no objective way to measure what makes a game fun, however I would argue that novelty is a close proxy and a goal internally worthy of pursuit. Novelty is best measured through variety, counterplay, and creative potential; in other words the degree of freedom that a meta-game instantiates.

We can compare this standard against the emotive responses of the playerbase to help evaluate its quality as a metric for meta-game quality. The infamous Quad-Tank meta (under which my team, then Bird Noises, made its name) was near universally despised. What my team discovered in this patch was that there was no need to run any other composition in any circumstance; the only counter to Quad-Tank was to play Quad-Tank more aggressively than your opponent. This would be an example of a meta-game with an extremely low degree of freedom; only one composition and one play style is viable. Here my standard would concur with community sentiment of the time; a meta-game with less choices is less fun.

Comparing this meta to the post-Dva-&-Ana-nerfs meta, nearly everyone would agree that the ‘quality’ of the meta-game went up relative to Quad-Tank. Out of the three plausible off-tanks in this meta-game, different teams chose different pairs out of the set of Dva/Zarya/Roadhog. Some teams (e.g. Selfless) bucked even the name of the meta-game and chose instead to play a 2-2-2 style with a very high degree of success. While virtually the rest of the world continued to play Rein-centric compositions, Rogue impressed everyone paying any attention with a Triple-DPS dive comp that took the competitive scene by storm, proving its viability with undeniably dominant results. Again, my standard matches the sentiment of the community in declaring this meta much more fun than the one preceding it.

If asked to evaluate the present Counter-Dive meta, most would call it a regression from what was previously achieved (although perhaps better than metas like Quad-Tank). Once more my standard concurs with this sentiment, since the spectrum of viable compositions and play styles has grayed drastically over the past few patches. Presently, Dva/Winston/Tracer/Lucio are approaching perma-run status with a few exceptions on exceptionally enclosed or flank-less map locations. The choice between Zen/Ana and Soldier/Genji (or Pharah+Mercy) with the occasional and situational Sombra flex is essentially all that is available to competitive teams. Apex results seem to show that even Rogue’s unmatched mastery of the Triple-DPS play style was insufficient to overcome the dominance of the 2-2-2 meta. Those stubborn teams that have stuck to Rein-centric compositions have been consistently trampled underfoot by one very angry scientist.

From these instances, I conclude that what makes a meta-game good or bad is the degree to which teams can convert their unique individual styles and ideas about the game into genuinely competitive strategies. Fostering creativity as a means to victory is a powerful way to elevate Overwatch above the aim-duels that are lent such primacy in mirror matches. As a side note, I believe that diminishing the importance of these extremely mechanical aim-duels and elevating the importance of team-composition makes Overwatch vastly more entertaining and watchable from a spectator’s perspective. The narrative of one team outsmarting the other is much more compelling in my eyes than that of the more skilled players dismantling their weaker counterparts.

The immediate next question to ask once one accepts this standard is ‘how does one best achieve the maximum degree of freedom in a meta-game?’. This question is slightly more complex, yet no less answerable:

Claim 2: At their core, Overwatch’s meta-games and overall balance are about team composition.

Winning or losing a game of Overwatch depends entirely on a team’s ability to successfully attack and defend various objectives within a roughly given timeframe. As tempting as it is to consider a hero’s balance in a vacuum, such an hero-centric approach to balancing is doomed to failure.

It seems quite plausible that the vast availability of statistics regarding hero play in Ranked Matchmaking has tempted the OW dev-team to think of each hero as an island. When a hero seems to be winning or losing a little too often it seems a prime candidate for a nerf or a buff, respectively. This logic misses what was in front of our eyes the whole time, that one hero choice is only strong or weak relative to other options and the team composition that surrounds and opposes it. Heroes don’t win games, compositions do.

Consider Genji. In Triple-Tank his role is essentially to farm Dragonblade as quickly as possible to participate in combo play with his primary enablers: Lucio, Ana, Zarya, Rein, etc. In dive compositions, however, Genji acts as the secondary initiator alongside Winston and Tracer. Dive seeks to enable the Genji to maximize dash resets while the primacy of Dragonblade is significantly reduced relative to Triple-Tank Genji play. The shift in team composition fundamentally alters the role of the Genji player as his primary ‘partner heroes’ become fellow damage-dealers rather than defensive enablers. This is a crucial distinction to recognize. Hypothetically, were Genji oppressively strong, composition-defining, and thus demanding of a nerf it would be very important to change him in the right way so as to properly affect the meta-monopolizing composition without fully eroding his general viability.

Dva can benefit from a similar analysis, sans hypotheticals. After her originally massive buff was toned down, she didn’t feel oppressively overpowered in tank compositions. Her mobility wasn’t so incredibly useful in slower compositions, yet it felt like she had a good place in countering spam-centric opposing team comps and enabling more aggressive DPS choices in Triple-Tank (like Genji). Without any changes directly to Dva, the massive buffs to Winston, Lucio, and Zenyatta combined with Rein & Roadhog nerfs have left her feeling oppressively strong. The Zenyatta buffs and the Lucio rework established a much more cohesive backline than had ever existed in Rein-less compositions. Dva perfectly fit the niche of peeling for this backline perfectly while also soft-countering Discord Orb and often preventing the all important Dash-resets of Genji comps. This instance reveals that hero balance cannot be examined in a vacuum, even with statistical evaluation; Dva shifted from ‘viable-yet-unpopular’ to ‘must-have’ without a single direct change to her kit.

Herein lies the biggest problem to successfully balancing Overwatch. The above paragraphs are significantly less true if we are considering Ranked Matchmaking rather than organized competitive eSports. In Ranked, the near total lack of coordination greatly diminishes the importance of full compositions and lends much more credence to claims that a hero is strong or weak in a vacuum. Without fixing Ranked play (see my earlier blog posts on the subject) I can’t imagine a solution to this dilemma, except to plead with all my heart that Blizzard prioritize balance for those who dedicate their dreams, careers, and lives to Overwatch.

Playing eSports doesn’t make you better or more valuable than a casual player, but I believe that that kind dedication is deserving of the respect and priority of the dev team. If a character is a bit too strong in low-skill public games, some casual players will have an infinitesimally more difficult Ranked experience. If the Overwatch eSports meta becomes stagnant and/or unenjoyable to watch, careers and lives are potentially ruined. The best of the best will find success regardless, but it is the scale of the eSports scene upon which those on the margins of top play depend. Furthermore, I would argue that balancing for eSports will ultimately benefit the whole playerbase, although that’s a topic for another article.

The world could always use more heroes.

Claim 3: Presently, the game is more defined by choice of Main Tank than by any other role. Choosing Winston or Rein will dictate more strategy than almost any other role selection.

With the heroes presently available in Overwatch, the degrees of freedom available in terms of composition and strategy selection are almost entirely dictated by Main Tank selection. When a team selects Winston, more than half of the heroes in the selection screen might as well be blacked out for how weak and non-viable they are in aggressive dive compositions. Reinhardt hero selection acts in a similar way, except that he fully ‘blacks out’ fewer heroes and rather simply demands that a significant portion of his teammates’ heroes are devoted primarily to his defense (a role for which there are only a few meaningful choices).

Under this situation, then, ensuring the viability of both Rein-centric and Winston-centric compositions (as close to a 50/50 as possible) is what will result in the most variable and creatively adaptable meta-game. In the short term, this is the only solution to stagnant meta-games that prevent individual and team flavors from expressing themselves in team-composition choice.

Ideally though, I’d like to see heroes that either add a third option to the Rein/Winston dichotomy or allow the game to potentially be played in a way that isn’t so fundamentally tank-centric (although this may simply be a reality for Overwatch in the medium term). I’m looking at you, Doomfist…

 

If you read this far, don’t hesitate to give me feedback in the comments or on twitter at @jake_overwatch. This article was pretty intensely theoretical, so if you made it all the way through I appreciate your dedication.

I’d also like to thank Wojtek for his instrumental assistance in refining this piece and also for inspiring its focus.

 

Elegy for a Swine

or: Constructive Feedback on Roadhog’s Balancing

Roadhog was the first character in Overwatch that I really wanted to master. Coming from TF2 as an avid MGE (My Gaming Edge is a popular 1v1 practice mod in TF2) Soldier player, discovering the balance between the Rocket Launcher and the Shotgun lent incredible depth to a character of such apparent simplicity and formed the foundation of my passion for competitive gaming. Diving into the nuances of Roadhog, I felt just the same as I had in the early stages of mastering MGE’s Soldier duels: perfect play was so clearly possible and yet always tantalizingly out of reach.

Missing or landing a hook almost never felt like a game of chance, and a perfect understanding of range was a must-have to do battle while Hook was on cooldown. The character rewarded skill with unique carry-potential and punished mistakes and poor positioning with significant contributions to enemy ultimate tempo. Then came the hook 2.0 update.

It only took a few minutes of playing the new Roadhog for me to recognize the gravity of what was lost. Impressive hooks were often broken by odd geometry or falling opponents, only occasionally did this new mechanic yield the sense that your target had truly outplayed the hook. On the receiving end, I felt the same. Once in a while I truly intended to sidestep a hook and broke it with my cover, but more often than not my response was to say a quick prayer to RNGesus.

The mechanical change to the way the pull itself occurred was also rather shockingly bad. Characters hooked off of high ground or from height were not brought straight to the Road player, but rather in a diagonal trajectory that put the two players on level ground. As someone who practiced with the original hook mechanics for hundreds of hours, this change was both annoying and counter-intuitive while having no discernible impact on the balance of the character. Some heroes were originally quite difficult to consistently one-shot combo as Roadhog, most notably Ana. Prior to these changes, only a very small minority of players could truly achieve a very reliable maximum damage combo. The change to pull consistency was perhaps well intended yet in my view only achieved a ‘dumbing down’ of the character’s fundamental mechanics.

I don’t contend that no nerf was deserved, but rather that changes to Roadhog have been poorly designed. Roadhog as he was on release was most certainly overpowered. His one-shot potential was simply too high and counterplay options were sharply limited by his low cooldowns. The hook was also apparently designed for the lowest common denominator of players with a hitbox nearly the size of a payload cart. Despite these problems, at the end of the day the Pig was a hell of a lot of fun to play because the hook was a hell of a lot of fun to use.

The initial changes proved insufficient to properly balance the hero, so the devs turned next to a 33% increase to Hook’s cooldown combined with buffs to the spread of the Scrap Gun and a decrease in pull distance to compensate. The intent was apparently to make him less reliant on his role-defining cooldown and remake him in the style of a classic DPS character. Philosophical problems with this kind of change aside, this patch led to a significant decrease in Roadhog’s vulnerability during Hook cooldown and a significant increase to his ability to drop enemies into death-pits.

In the most recent balance patch, our beloved swine was finally driven into competitive irrelevancy with Ranked win rates approaching 40% and a near total lack of playtime in professional play. The developers had the following to say: “The Scrap Gun changes reduce the power of his hook combo and alternate fire burst damage potential while still keeping his DPS roughly the same.” (For those unaware, the recent patch decreased Hog’s damage by 33% while increasing rate of fire by 30% and clip size by 25%).

The notion that Roadhog could still realistically output the same amount of DPS as pre-patch is pretty laughable. As soon as I saw these patch notes on the PTR I knew Roadhog was destined for the garbage bin if they went live (and live they went). In a game with healing as cheap and effective as it is in Overwatch, burst damage is vastly stronger than damage dealt over time. Just because Roadhog has about the same ability to break a Rein shield (under perfect conditions) as before doesn’t mean that his meaningful DPS will be anywhere close to what it was. Furthermore, dealing the same DPS requires landing more shots than before and thus exposing oneself more than before.

If the developers had written that ‘Roadhog was much too strong and that these changes were intended to bring him in line’ I would disagree with their assessment but agree with the means by which they responded to it. What is really upsetting is that, from the above developer comment I interpret that they didn’t see these changes as a significant nerf at all. Perhaps with the decrease to his critical hitbox size they potentially even saw this update as a buff. The reality, however, is that these changes are perhaps the most significant nerfing any character has received throughout Overwatch except perhaps all of the Ana nerfs combined into one patch. Worse yet, this brutal blow from the nerf-bat was delivered to a character that was already fading out of competitive viability. What that says about the developers’ understanding of their own game is up to the reader to decide.

I am happy and sad at the same time with these changes. Happy because Roadhog stopped being fun for me with the first iteration of Hook 2.0 and now he is so competitively irrelevant that I’ll never need to touch him again barring radical re-balancing. Sad because Roadhog was the character that first made me want to become great at Overwatch.

All along, the changes Roadhog needed were so simple. Were I balancing Overwatch, the next balance patch would do the following.

  1. Reset Roadhog to exactly as he was on release
  2. Hook cooldown to 9 seconds
  3. Hook hitbox size decreased by 33%
  4. Take a Breather healing down to 250 from 300 (or 1-2 second increase in cool down)

Being deleted in one shot isn’t very fun. For newer players it is probably quite frustrating since they don’t understand the game well enough to really engage with counterplay options. These changes will make successfully landing hooks much harder, remove the original ability of the hook to pull players who were completely out of line of sight, and increase the size of the vulnerability window that Roadhog creates when he uses Chain Hook. Reverting the spread changes will push the character back into his original role of a space-denying tank and defender of back lines and further open up counterplay options to reward players who successfully bait out a Hook. If Blizzard remains really resistant to reverting the hook-break mechanic, the hook should instantly stop the motion of its target from the moment of connection through the completed pull so that skillful and ‘legit’ hooks are at least more rarely broken by gravity or odd map geometry.

 

P.S. This essay was more in a narrative structure than my previous pieces because I felt that it served the point I was trying to make better. Let me know what you thought in the comments; more pieces and potentially more new styles coming soon.

In Defense of Purist Skill Rating

Weds. Jun 21st:

Intro:

This essay will defend a vastly simpler implementation of Skill Rating adjustment than currently exists in Overwatch’s Ranked Matchmaking. I will suggest that removing all influencers of Skill Rating besides winning & losing (adjusted to game difficulty) will result in a number of improvements to the Ranked Matchmaking experience, especially with an eye towards the OWL and the eSports possibilities for Overwatch in general.

Incentives & Behavior:

Most game theoretic models begin with a simple assumption termed ‘rational self interest’, or the idea that individuals will take the course of action which most benefits themselves. This assumption is imperfect, as humans have been repeatedly shown to exhibit altruistic and pay-to-punish behavior patterns in empirical studies. However, broadly speaking, the notion that people will act in service of their own goals is a plausible one. It is especially so in an online context that lacks face-to-face empathic accountability.

Beginning from rational self interest, then, we can understand and predict the behavior patterns of players in Overwatch by examining the incentive structures that they face. Furthermore, alterations to these incentive structures have the power to dramatically change the decisions players make and even the mindset with which individuals approach the game.

The most clear and impactful incentive that Overwatch players (or at least those that choose to play Ranked Matchmaking) face is Skill Rating (hereinafter ‘SR’). Rising through the ranks feels satisfying and validating, placing in a top division can be a status symbol, and a high top-500 placement might even land you tryouts to play professionally. Naturally, then, many players are highly incentivized to seek to maximize their SR.

Skill Rating Maximization:

SR maximization will always be an incentivized behavior pattern. People want to be highly skilled, but more than that they want to appear to be highly skilled. This distinction seems small but is in fact very important. Crucially then, the key motivation for many (especially for the vast majority of players who will never compete in an eSports context) is to reach the highest SR that they can. This should be juxtaposed against the incentive to become the best player one can be: seeking to have the maximum impact upon a given team’s win probability (i.e. the eSports motivation).

Ideally then, the SR system should be set up such that ‘SR maximization behavior’ guides players to make the sort of decisions that positively impact the community and create the best gameplay environment possible. In my judgement, such an ideal system would align the SR maximization behavior with the eSports motivation, especially with an eye towards the Overwatch League. The current system fails to accomplish this alignment.

One Trick Players (OTPs):

While ‘one-tricking’ is not a behavior that I think should be actively discouraged or disallowed, I contend that it’s also a behavior that shouldn’t be specifically incentivized. In my view, the ideal system would be entirely equivocal towards OTPs.

Consider a hypothetical Mercy OTP (anecdotally the most commonly one-tricked hero, although I don’t have data that support this) who has reached a very high SR with essentially no other heroes played.

The current SR system rewards players who are playing at a high skill percentile compared to other players on that hero. This comparison is drawn not within one game instance, but rather across the entire dataset of all Ranked Matchmaking time played on that hero. What this means for our hypothetical Mercy OTP is that, so long as he/she plays better than other Mercy players, lost games will net a smaller SR drop and won games will net a larger SR gain. This impact is so significant that winning vs. losing is in fact a secondary concern to the ‘Mercy percentile’ our OTP is playing at.

We’ll get back to our hypothetical OTP in a moment, but now let’s take a step back to examine the bigger picture. The current SR system is crucially problematic for many reasons, but I’ll focus on two: (1) statistical judgements of skill are weak (for some heroes more than others) and (2) it leads different players to have different incentive structures.

(1) Statistical Judgements of Skill Are Weak:

The strength of this proposition is such that I’ll use the best counterexample as my own starting point: McCree. He is a hero with extremely low utility, extremely low survivability, and extremely high damage potential. A player with high accuracy, high damage per minute, and few deaths per minute is very likely to be a higher impact player than someone with weaker statistics. Such a player is minimizing McCree’s weaknesses (i.e. avoiding death) while playing to his strengths (high damage output). It is very likely that such a player is contributing more to an average game than a player with worse statistics. Even for McCree, though, these statistics are imperfect. Is a given player’s damage relevant? How often is he/she spamming enemy heroes without any plausible follow up (i.e. feeding ultimate charge to enemy supports)? A player who hits a few precise shots to pick a key player at a key moment (e.g. a support at the beginning of the fight or a DPS who is preparing to ult) is inarguably much more impactful to securing wins than one who merely sits in the back making poor focus decisions, yet the latter player would be statistically superior by the previously stated standards.

We can apply this same analysis to quite a few heroes, revealing that statistical judgements of skill become weaker and weaker as we move from the most mechanically demanding heroes in the roster to those with very little ‘traditional FPS skill’ requirements. Even a hero such as Roadhog demands a deeper statistical evaluation to really get at skill. One must weigh damage per minute and survivability against damage taken, as a great Roadhog knows how to minimize his exposure and with it the rate at which he feeds the enemy team ultimate. There is no magic formula to successfully achieve such a balancing act. How can one statistically capture the impact of a Whole Hog that prevents a Dragonblade and a Primal Rage from destroying one’s backline (while doing very little damage and earning no kills)? In a game as complex and decision-rich as Overwatch, I don’t see a way that these judgements can be made accurately and reliably by a predetermined formula.

The ultimate example of how useless statistical measurements of skill are–and how bad percentile-based SR adjustment can be–is of course my favorite foil Mercy. The impact of virtually every aspect of Mercy’s kit is poorly captured by statistical measurements. Hitting a 5 player Resurrection that is responded to by a 6 player Earth Shatter or Graviton Surge is in fact game losing. The statistics show a high ‘resurrected players per ultimate cast’ while the reality in game is that the enemy team just farmed MULTIPLE new ultimates. The entire HP pool of your composition just went into the enemy team’s ultimate bank TWO TIMES OVER. I can’t really overstate how bad it is to make a poor decision about using Resurrection. In these cases, not only would it have been better to save one’s own ultimate, but also it would have been better to disconnect from the server and let your team play 5v6 because at least then you would have had a chance to swing Ultimate tempo. Even if there is no immediate Ult-response to a big Resurrection, if your team fails to win the fight the situation is the same: massive Ultimate tempo swing to the opposing team. Very often, the most impactful Resurrections are instant casts to revive one key player that just died (because the opposing team has often expended cooldowns and cannot kill them again). Thus, playing to maximize the statistical measurements of Resurrection (i.e. waiting for a big Res) is in fact seriously detrimental to the success of the team.

Resurrection is furthermore a relatively weak support ultimate because it requires your teammates’ deaths instead of preventing them as all of the others do (once again Symmetra is not a support). Thus a very smart Mercy player actually chooses not to heal in many scenarios so that her support partner can get his/her ultimate faster. Heals per minute is therefore a fickle statistic whose maximization does not reliably communicate skillful or intelligent play.

Low deaths per minute and high damage boosted are the only statistical measurements of Mercy play that I see as actually meaningful, as these statistics communicate intelligent play and impact maximization. Solo kills with the pistol are also probably quite meaningful, but of course a Mercy player who seeks these out at poor times would be called a thrower. It’s not that Mercy is a ‘no-skill hero’, the key problem is that skillful Mercy play is almost never communicated by impressive stats. Even these statistics I mention as impactful fail to even come close to telling the whole story of player skill and game impact.

(2) Failure to Align Incentives:

Not only are OTPs highly incentivized to  by the current SR system to continue one-tricking and to play for statistical maximization over wins and losses, these incentives are crucially opposed to the incentive structure that flexible players face. A flex player knows that he/she won’t be playing at the far right tail of his/her heroes’ skill distributions because his/her mastery of the game is spread across many heroes and many situations. The flex player seeks to achieve a high SR by playing the perfect hero imperfectly while the OTP seeks to achieve a high SR by playing the imperfect hero perfectly. While I don’t think that either of these strategies is deserving of punishment, I think that its important that the system not prioritize one over the other at any echelon of SR.

In the current system, the flexible player must maintain a higher win percentage (abstracting away from game difficulty) to reach the same SR as the OTP. This is deeply problematic in my eyes, as I see hero swapping as a fundamental part of the game. If an OTP doesn’t wish to engage with hero swapping as a part of gameplay, that’s fine, but their SR should reflect that choice. The same goes for players who don’t wish to engage with communication as a fundamental part of the game: you don’t have to talk, but if you lose games because of it then that is on you and ought to be reflected in your Skill Rating. A truly great player has the knowledge, intelligence, and decisiveness to pick the right hero for the right situation, filling in the gaps of his/her team composition while at the same time countering opposing composition decisions. Not every player has to aspire to be the greatest player of all time, but in my view the entire purpose of having a Skill Rating system to begin with is to measure and validate that very pursuit of greatness.

Suggestion:

Incentive alignment is a goal very worth of pursuing. When all players have the same goals, the potential for toxicity is greatly diminished (though certainly not eliminated). I personally find it quite frustrating to queue into Ranked Matchmaking with the goal of winning games, only to find other players do not share the same incentives. At the very top of the Skill Rating system, one should find other players that want to win games, not those that wish to engage in roleplay. This isn’t to say that OTPs can’t be good or impactful to winning games, my argument is rather that OTPs should be judged by their wins and losses rather than by the extent to which they engage in one-tricking. The current system punishes adaptation and experimentation vastly more than it needs to.

There is only one way to guarantee that every player has the same incentive: strip away all of the hidden formulas and percentile adjustments. Only when each player has only one incentive–to win–will incentive alignment truly come about. The only thing that should impact the SR consequences of a win or a loss is the relative skill of each team. Win a hard game and you should clearly be rewarded more than for winning an easy game, vice versa for losses.

The meaningfulness of Skill Rating is especially important as it is the only clearly available measurement of player skill outside of actual eSports experience. With the Overwatch League on the horizon, the time is now to restructure the system such that the very best rise to the top and have a fair shot at becoming professionals. Right now, the only way to scout talent is to do it on an individual, observational basis. Look at Dota 2, you will see fresh talent rising out of Ranked Matchmaking and being given a shot at a professional career simply for reaching the very top of the ladder. That’s because their MMR system answers exactly one question: ‘how good are you at winning difficult games?’

If I worked at Blizzard, I’d be demanding a HARD Skill Rating reset at the end of this season and an entirely purified win-loss SR adjustment regime going forward. If Blizzard really wants the best of the best to get their chance at fame and fortune in eSports, then there really is only one way.

Counterarguments:

The existence of percentile SR adjustment is primarily, in my understanding, to combat smurfing (or the purchasing of new accounts to play at a lower level than one’s true skill). Want to get serious about smurfing, Blizzard? IP & MAC check new accounts and tag them for evaluation while adding a report option for suspected smurfs to cross reference: if you can statistically target and punish throwers then there is no reason you can’t statistically target and adjust smurf accounts. It’s fine if statistical adjustments are used in exceptional and targeted cases, just get rid of them as the default for the entire player base.

“But I wanna one trick!” Go right ahead. No one can (or should) stop you. But if you lose games because of it, don’t expect special treatment. OTPs don’t deserve punishment, but they certainly don’t deserve specific rewards over players who choose to engage with hero-swapping as a fundamental and crucially necessary mechanic in Overwatch. This is especially the case as Blizzard is beginning to employ SR as a way to qualify for tournaments (see: OW Open) and they seem to be considering it as a potential scouting mechanic for new talent once the scene is more established.

To Blizzard: fix it now, or condemn the eSports potential of Overwatch in the long run.

 

EDIT: An earlier version of this article referenced Contenders as an example of a SR-gated tournament. This is inaccurate, as Contenders was never SR restricted. Rather it is the Overwatch Open that Blizzard is requiring a certain SR for.

The Fundamentals of Balance

Wed. June 14th:

Intro

This essay will focus on the balancing philosophy expressed in Overwatch. It will analyze the achievability of the Overwatch team’s apparent desire to achieve balance in all skill brackets simultaneously, and make a few suggestions to this end.

A Philosophical Problem

Before one evaluates the success and failure of specific Overwatch patches, one must establish a clear value–a metric by which to gauge the degree to which a change makes the game better or worse.

Jeff Kaplan, in an AMA three months ago, described the Blizzard approach as a ‘triangle’. “I feel like there are 3 key factors that guide us: The players, statistics and… us… our own feelings as players.” He continued on to add that “Internally, we have a ‘competitive’ playtest that’s helpful to get good feedback from Diamond+ players who work here […] None of this is perfect… but we try hard to listen to feedback and keep the game balanced.”

Ultimately, the system Jeff describes here (also confirmed by other Dev posts on the battle.net forums) is one that seeks to achieve relative balance throughout the skill spectrum. All three points of his triangle belie this reality: player feedback, developer intuition, and even statistics to some extent abstract away from player skill. Keeping a sharp eye on professional pickrates would be importantly revealing, but at the very least it isn’t clear that this is happening. The notion of balance-for-all seems nice enough prima facie, but further analysis reveals a considerable challenge to successfully implementing this broad balance goal.

This fundamental challenge is skill curve differential. Different heroes in Overwatch have remarkably different rates of return on skill growth investment; this is to say that they have significantly distinct skill curves. I use ‘skill curve’ here to mean the rate at which performance (i.e. game impact) increases with constant skill growth.

To illustrate the skill curve differential problem, consider two heroes: Genji and Junkrat. Now consider two players corresponding to each of these heroes. One of each in the 10th percentile of skill (worse than 90% of players) and one of each in the 90th percentile of skill. (worse than just 10% of players). The 90th percentile Junkrat is certainly more impactful than the 10th percentile Junkrat player, but the gulf between the 90th and 10th percentile Genji players is vastly larger. The 10th percentile Genji player is a glorified rock-slinger. Unable to consistently leverage dash resets or find high value reflects, he or she has far less game impact than the 10th percentile Junkrat player. When we reach the right tail of the skill distribution, however, exactly the opposite situation persists. Against strong opponents, Junkrat lacks high-level outplay options and is ultimately left to punish misplays or exploit weak links in the opposing team. At the very highest levels of professional play, this is why he is essentially unplayable outside a very small niche. For our high level Genji player, it is a different story. The design of the character yields exponential gains to game impact resultant from skill growth: as accuracy, speed, and aggression increase so do mobility and longevity in a positive feedback cycle.

Every character has a skill curve of some slope, that is, there is no hero which can honestly be said to require ‘no skill’. However, one can see the skill curve differential problem even embedded in core hero statistics. Ana and Mercy are both powerful single target healers (comparing two different heroes will always be comparing apples to oranges, but hopefully this example is nonetheless illustrative).

Heal Rates: (source: owinfinity.com)

ANA: 75 healed per shot * 1.3 shots per second * (% accuracy) = Effective Heals per second

MERCY: 60 = Effective Heals per second

These Effective Heals per second values equalize when the Ana player’s accuracy reaches ~61.5%. That is to say that an Ana player with accuracy lower than that value will heal less per second than a Mercy and an Ana with higher accuracy will heal more per second. The point of equalization isn’t particularly important, but the fact that Ana is able to do her central job as a healer (healing) faster and more reliably the higher her accuracy gets reveals that her skill curve is steeper than that of Mercy. This cuts both ways; at the far left tail of the skill distribution (where % accuracy values are generally much lower) Mercy outputs more heals per second than Ana. Mercy has important decisions to make in order to maximize her survivability, but Ana’s self defense options are no less complex or skill dense (in fact they, like her healing rate, are significantly more responsive to skill increases)

Ana and Mercy, Genji and Junkrat: contrasting these pairs reveals the central difficulty of simultaneously satisfying players across the entire skill distribution. Professional players lament that Junkrat is meme-trash-tier in organized competitive play while he simultaneously reigns as the uncontested King of Brawl Winnin’ and The Silver Division. Ana’s winrate, meanwhile, steadily climbs with skill tier from a tragic 38.9% in Bronze to a respectable 51.9% in Grandmaster.

Nowhere are the consequences of the skill curve differentials more apparent than when comparing Ranked Matchmaking (of any level) to organized professional play (hereinafter ‘eSports’). Mercy, statistically speaking, performs well (above 50% winrate) all the way up to the top few percentiles of Ranked Matchmaking with a remarkably high pick rate. In professional play, she goes virtually untouched outside of the Pharah + Mercy combo.

This difference is a consequence of an added challenge of balancing popular online games that are also eSports. True coordination (in composition choice and game style) radically changes the way the game is played. Because Resurrect is fundamentally reactive, high level teams will often simply not allow an unsupported Mercy to garner value from her ultimate. She will be hunted by flanking DPS while the rest of the team intentionally staggers kills or saves ultimates to reduce the effectiveness of any Resurrect the Mercy is able to cast. As someone who has spent every season of Ranked Matchmaking at the very highest level of play, I can attest that these sort of plays are rarely if ever made regardless of the skill level of players on either team. I contend that an important reason Ana is so weak in low-tier play is that she demands coordinated protection to fully leverage her abilities (coordination that is virtually nonexistent at low level play). Likewise Mercy is incredibly punishing of undirected or uncoordinated play. Fail to hunt her down at the proper time or forget to save a key ultimate to counterplay Resurrect and a teamfight is quickly lost.

So what can we do? How can Overwatch feel fresh and full of optionality in an eSports context while also remaining balanced and enjoyable to play for those further to the left on the skill distribution?

Moving Forward

Skill curve differential isn’t going anywhere, and in my opinion it shouldn’t. Blizzard intelligently marketed Overwatch much more widely than the traditional first person shooter target audience. This wasn’t just a marketing strategy though; the game design purposely features heroes, for instance Mercy, that aren’t so demanding of traditional arena shooter skills and rather allow positioning and decision making to determine game impact. In the long run, I think that this is a good thing. Purity is the enemy of innovation while community stagnancy is in direct opposition to promotion to a wider audience (something absolutely critical to achieving a public perception of legitimacy for eSports and even gaming as a hobby).

The only important question that remains is how to rise to the challenge of balancing for diverse skill tiers simultaneously. The approach that I’d like to see taken more often is the differentiation of mechanical changes and statistical changes.

Sometimes a number gets into the game that is simply broken. Bastion’s 35% value for his Ironclad passive springs to mind as a classic example of “utterly fucking busted”. Sometimes a character just doesn’t have the stats to compare favorably against his/her/its closest substitutes; pre-buff Soldier 76 is a good example. I don’t have date-accurate statistics for the strength of these heroes across skill tiers, but I contend that pre-buff Soldier 76 was probably too weak at every point on the skill distribution and pre-nerf Bastion was probably too strong at every point on the skill distribution. For these kind of across-the-board balance issues, statistical adjustments are warranted as they will have similar impacts on players of all skill levels.

These are the easy variety of balance problems. For the more complex varieties, a mechanical change in isolation or a combination of mechanical & statistical changes is necessary.

A strong example of a very good combination buff is the recent (live) patch to Hanzo. Hanzo felt a little too weak across the board, but at a high level aggressive compositions came to render him nearly obsolete. The 10% charge time buff to Hanzo is significant, but I would argue that even more impactful for high level players is the ability to hold a charged arrow while wall-climbing and to spawn your Dragonstrike early if the arrow collides with a wall. These changes make the space of options for Hanzo players significantly wider and enable much more aggressive and independent play. However, this kind of freedom doesn’t aid those who aren’t ready to use it. The change in totality made Hanzo players of all skill levels slightly stronger but had a significantly greater impact on expert players who can most creatively leverage the new mechanics. Widening the space of options doesn’t make a big difference to players who weren’t already pushing the boundaries of how a hero can be effectively played.

We can use this same mechanical vs statistical differentiation to better examine the past Genji nerf that removed his ability to triple jump in one continuous airtime via wall climbing. For low tier players who weren’t even aware of this possibility, the change had virtually zero impact. For high tier players who were exploiting it as often as possible to maximize mobility and survivability, the change had real consequences to Genji’s overall strength and playability. My honest assessment of the Overwatch development team is that they never thought about these differential effects and instead saw the triple jump as just an unintended bug to be patched out. The ledge-dash-super-jump mechanic was probably thought of similarly, and patching it out only really affected the few hundred (I doubt it was really this many) players who could hit it reliably enough to implement it as part of their play style. The important lesson here is that these pure-mechanical patches had radically different impacts on players of different skill levels.

These two examples provide a powerful blueprint for the formulation of balance adjustments that demand different impacts upon different skill tiers:

If a hero is in a good place for low-skill players but too weak for high-skill players: widen the option space by loosening mechanical restrictions and let creativity and talent shine through as increased game impact by high-skill players.

If a hero is popular and strong in the hands of high-skill players but a bit weak when used by newer players, combine a statistical buff with a restriction of option space. Make the hero more narrowly defined and yet more powerful within that narrow role. This variety of change must be done most carefully, though, as elite players will always seek to exploit any statistical buffs to their maximum potential even if it requires playing the hero in a radically different way (see the most recent attempt at nerfing Lucio).

That’s the theory, but here are my resultant suggestions for real balances changes. Feel free to leave feedback on the article as a whole or just these ideas! My twitter is @jake_overwatch 🙂

Suggestions:

Bastion is a worse choice than soldier 76 virtually 100% of the time in high level play, but has a comfortable niche at median and below skill. Remove the self-stun upon Tank Transformation and when returning to Recon mode to allow for more aggressive initiations and the option to use Tank Form as effective counterplay in a fight. Also remove or adjust the Self-Repair animations that block the crosshair (that shit’s just annoying, yo). Average players will play just like they always have, but those smart enough to leverage these adjustments into a much more aggressive style will reap the rewards.

Widowmaker has felt incredibly map-dependent across the skill spectrum even after her charge-time buff. Decrease Hookshot cooldown (I suggest by 2-3 seconds) to increase mobility and escape options versus the dive composition that has come to define the meta. It is very dangerous to buff this hero with pure DPS, but giving her a slightly less narrow role might help her pick and win rates with skilled players.

Junkrat is an effective spammer that applies a ton of pressure to slow team compositions. His ultimate is reasonably effective against newer players but rarely finds sufficient utility in high level play to justify what is very often a suicide play. Give the Rip Tire a new ability (activated with whatever key is bound to Ability 1) that allows it to hop into a drift (yes I do mean cart-racer style) with a short cooldown. This will give stronger players options to bait out counterplays and reasonably juke players with moderate aiming skill while being difficult to abuse by those lower-tier players that don’t have a precise understanding of which counterplays they need to bait and which enemies they need to juke.

(maybe I want to roleplay Junkenstien)

Quality over Quantity: Revisited

Sun. June 11th

 

I read the comment thread on /r/competitiveoverwatch and thought I should make this update to discuss two really crucial lines of argument that I noticed throughout the reddit thread and in people’s responses to my twitter.

Criticism 1: Jake, your system is idealistic. Players who refuse to play healers or tanks in game will still check those boxes just to get faster queue times. This will leave their teammates in the lurch and ultimately ruin the system.

Criticism 2: Jake, your system would encourage one-tricking and diminish the rewards accrued to versatile players.

In response to C1, I argue that the motivation not to abuse the system is inherent to its design. Lets say I’m a Bastion only one trick. I won’t play anything besides Bastion for any reason. I could check the healer & tank checkboxes–clearly justified in this instance because Bastion is both :)–but I won’t because I want the system to work. No matter what role you want to play, you have a better chance of a competitive and fun ranked match if your team has a more balanced composition.

For those who worry that true griefers/trolls (those trying to lose from the outset of the match) might abuse the system to maximize trolling potential, I would argue that the added impact is relatively small. If I were to pick Roadhog and self-heal in front of the enemy team every time I spawn, my team isn’t going to win. It really doesn’t matter what our composition is or who is willing to flex to what. I’ve actually done this exact thing to ensure that a hacker on my team loses. Even an aimbot isn’t enough to 5v6 a team that gets fed 900 hp of ult-charge per Roadhog spawn. Overwatch is a team game, one person aggressively trying to lose is more than enough to achieve that goal in the current system. Personally, I have not seen many players truly griefing in this extreme sense. Most ‘griefing’ comes from people being tilted about team composition or teammate performance in my experience.

C2 is a bit more tricky, though I would argue my system is nonetheless well designed to encourage versatility. Anyone who has played Overwatch for a significant amount of time can recognize that one healer and one tank is not the strongest team foundation in nearly any scenario. Even if your team already has one of each guaranteed by the matchmaker, there is still tremendous room to increase the strength of your composition by adding a second healer or more tanks. Versatile players can still accrue value from their diverse abilities under the role-queue system I defend.

Regarding the encouragement of one-tricking, I think of the system as a response to the prevalence of one-tricking rather than a cause of it. In the status quo, I already see a very high incidence of one-tricking a hero or, even more commonly, a role. It is the rare player who plays DPS, Tanks, and Healers all at an equivalent level. The vast majority of people, in my experience, have a significant majority of their playtime spent in one role (if not one hero). If the current system is punishing one-tricking relative to the system I propose, then it’s really doing a terrible job.

There is a deeper philosophical question here, though. Is one-tricking an acceptable way to play Ranked Matchmaking in Overwatch? Should it be discouraged? I would argue that, regardless of the answers to these questions, it cannot be stopped without a tremendous cost to the creativity that makes hero-driven shooters so fun. One-tricking happens in every game with character or weapon selection: it’s human nature to have preferences and some people really love to maximize their skill in a very narrow category rather than experience all the possibilities the game has to offer. If you can only play one hero, you don’t have much hope of going pro (except Lucio, but maybe Blizz will figure out how not to buff that hero someday). In my view, thats OK. Not everyone aspires to play professionally; people come to the game for really different reasons, even at the far right tail of the skill distribution.

The best way to design the system, in my view, requires accepting that there are many different types of players with many different motivations. Fighting to change people is a losing battle, why not build a system that offers fun matches whether you want to one-trick or flex every role as your team needs it?

 

 

P.S. Shoutout to /r/competitiveoverwatch for the great feedback and response! I’ll be back next week with another article, although I’m not sure exactly which topic to pick just yet. Tweet me some suggestions! (@jake_overwatch)

P.P.S. Some people suggested a DPS check box in addition to my suggestion. My main resistance to this suggestion is that Blizzard has done a really poor job with the hero classifications in the DPS role. Hanzo is, at least at a high level, unpickable on defense but sometimes viable when attacking. Many of the defense characters are like this, due to their one-dimensionality they are easily counterpicked and so are poor choices when actually defending. On offense though, they can exploit weaknesses in defending teams locked into their composition. Clearly every team does not need one Offense character and one Defense character in the same way that every team does need one Healer and one Tank. In my view, the problems with hero classification in the DPS role need to be solved before a system could be implemented that would specifically indict the failures of this existing hero classification system.

 

Quality over Quantity

Sat. June 10th

Intro

This essay will defend a limited role queueing system for improving the overall experience of (exclusively) Ranked Matchmaking in Overwatch. I recognize the importance of minimizing queue times, but hold that the concessions to such a system–properly designed and well implemented–are quite small relative to the potential gains in match quality and the ability of the Skill Rating system to accurately track skill.

The Fundamentals

I don’t defend any system that enforces the specificities of the metagame. By metagame here, I mean the current team compositions and play styles found to be the strongest in a given patch by the esport side of Overwatch. However, I do think that there are some propositions so fundamental to the design of the game itself that they remain constant across all metagames, patches, and skill levels.

P1: A team composition that features at least one healer hero will be significantly more effective than any composition with zero healing heroes.

P2: A team composition that features at least one tank hero will be significantly more effective than any composition with zero tank heroes.

P1 is, in my judgement, very plausibly true at every skill level. If one was to measure the power of a team at any given moment in game, total heath pool as a percentage of maximum would be the most impactful variable in the formula. Support characters (except Symmetra who has been misclassified since her rework) hold the vast majority of the responsibility for the regeneration of this absolutely crucial resource.

P2 is slightly more controversial. It is slightly harder to see the truth of P2 because tanks primarily contribute to the relative aggregated resource pool of their team by stymieing opposing attempts to diminish it rather than by increasing it directly or reducing that of opponents. Nonetheless, show me a team composition with no tanks and I’ll show you a composition that has a directly superior counterpart.

The Suggestion

I hear and respect the concerns voiced by the Overwatch developers themselves and by the community at large. I would hate to see any system implemented that would hinder the ability of players to be creative. That’s why my implementation is simple, unobtrusive, and sharply limited in scope.

The User Interface requirement of the role queue system I imagine would be two check boxes. These boxes would be marked ‘Healer’ and ‘Tank’ respectively. Players could check neither, one, or both of these options depending on their predilection for different roles. The matchmaker would then ensure that any potential match includes at least one player who has checked ‘Healer’ and at least one (distinct) player who has checked ‘Tank’ on each team.

Thats it. No hero restrictions, no indication to teammates of who has checked which box(es), no metagame enforcement beyond the one tank one healer minimums.

The Argument

Overwatch is radically deep and continually surprising. To this day I continue to see creative players play and win the game in ways previously unimagined. Sometimes even your healers should respawn as Mei or Tracer for a game-changing point contest. The game should never limit the options of players to make these kinds of creative decisions on the fly. Furthermore, any proposed systemic change that affects the matchmaker must weigh the impact on queue times against the theoretical match quality improvement.

I would contend that such a system would have a very small negative impact on queue times. In the majority of my games, propositions one and two are never questioned because both teams virtually always have team compositions that satisfy them. Thanks to Overwatch’s quality game design, Tanks and Healers are fun and rewarding to play compared to other games’ implementations of these roles. This diversity in player preferences means that most matches the matchmaker considers will likely already satisfy the one healer + one tank minimums, thereby having no impact on the queue times of these matches.

In some games, though, these compositions are a result of mutual understanding rather than natural player preference. In others, natural player preference trumps desire to win and extremely poor team compositions are fielded. It is in these games that I would argue the matchmaker has erred.

Consider a hypothetical matchup: Red vs Blue. The variables looked good enough: the players had been waiting for some time and of course the matchmaker hates to delay. Excitedly, it assembled two teams of 6 that, oh joy, had equivalent average MMR! In the eyes of the matchmaker, this is a perfect 50/50 game. The best possible way to measure the relative ability of the players of Red Team versus those of Blue Team. This time, though, something is wrong. Blue Team has tragically found itself with six Mercy-Only roleplayers! While Red Team readies it’s aggressive dive composition (featuring Winston and Lucio as its core enablers) and prepares a strategy to assault the first point, Blue Team is mired in an extended discussion of who has not yet polluted his/her career profile with non-mercy play time.

I believe that the players on Blue Team, roughly 60 seconds after the gates open, would prefer to have waited a bit longer in the queue so that they could each find a team that would permit them to victoriously fulfill their healer fantasies. Perhaps Red Team enjoys such a matchup, but even they fall victim to an artificial inflation of their skill rating. Were Blue Team distributed across a few different matches, each could have used their talent for supporting to defend their teammates against the assault of the Red Team and perhaps emerge victorious. It is not the ability of Blue Team that has led to their defeat, it is simply their misfortune to have specifically been placed together. In this sense then, all 12 players’ Skill Rating has been distorted from its ‘true value’ as a consequences of the randomness inherent to the current system. In this sense then, the match was bad from a system wide perspective. I would contend that the playerbase is intelligent enough to desire the defense of such small guardrails

Some might worry that this argument could theoretically be extended to defend even a full metagame enforcement system. However, the concerns of creativity in exigent circumstances and the incredible diversity of situations in Overwatch would make such a system disastrous. The extremely limited implementation of the one healer one tank minimums that I defend is intended to slightly increase the lower bound on match quality without significantly negatively impacting queue times or player freedom. I would argue that those extra seconds spent in queue when a ‘quality match’ under the current system is rejected for the failure to meet the one tank one healer minimums are worth it from a player perspective. No one wants to play without a tank to peel for them, no one wants to play without a healer to restore their HP and enable their plays.

Conclusions

My suggestion rewards players who are most willing to adapt to the needs of their team with shorter queue times, while ensuring that those who are less willing nonetheless find themselves in ‘winnable’ games. Skill Rating distortion would be reduced, yielding further gains to the quality of the matchmakers judgement and thereby the quality of the player experience. The cost is small, the gains are high. In my judgement, this system or something similar would positively impact the player experience in Ranked Matchmaking in Overwatch.

Leave a comment with your thoughts (or feel free to mock my spelling and/or grammar)!