Wed. June 14th:
This essay will focus on the balancing philosophy expressed in Overwatch. It will analyze the achievability of the Overwatch team’s apparent desire to achieve balance in all skill brackets simultaneously, and make a few suggestions to this end.
A Philosophical Problem
Before one evaluates the success and failure of specific Overwatch patches, one must establish a clear value–a metric by which to gauge the degree to which a change makes the game better or worse.
Jeff Kaplan, in an AMA three months ago, described the Blizzard approach as a ‘triangle’. “I feel like there are 3 key factors that guide us: The players, statistics and… us… our own feelings as players.” He continued on to add that “Internally, we have a ‘competitive’ playtest that’s helpful to get good feedback from Diamond+ players who work here […] None of this is perfect… but we try hard to listen to feedback and keep the game balanced.”
Ultimately, the system Jeff describes here (also confirmed by other Dev posts on the battle.net forums) is one that seeks to achieve relative balance throughout the skill spectrum. All three points of his triangle belie this reality: player feedback, developer intuition, and even statistics to some extent abstract away from player skill. Keeping a sharp eye on professional pickrates would be importantly revealing, but at the very least it isn’t clear that this is happening. The notion of balance-for-all seems nice enough prima facie, but further analysis reveals a considerable challenge to successfully implementing this broad balance goal.
This fundamental challenge is skill curve differential. Different heroes in Overwatch have remarkably different rates of return on skill growth investment; this is to say that they have significantly distinct skill curves. I use ‘skill curve’ here to mean the rate at which performance (i.e. game impact) increases with constant skill growth.
To illustrate the skill curve differential problem, consider two heroes: Genji and Junkrat. Now consider two players corresponding to each of these heroes. One of each in the 10th percentile of skill (worse than 90% of players) and one of each in the 90th percentile of skill. (worse than just 10% of players). The 90th percentile Junkrat is certainly more impactful than the 10th percentile Junkrat player, but the gulf between the 90th and 10th percentile Genji players is vastly larger. The 10th percentile Genji player is a glorified rock-slinger. Unable to consistently leverage dash resets or find high value reflects, he or she has far less game impact than the 10th percentile Junkrat player. When we reach the right tail of the skill distribution, however, exactly the opposite situation persists. Against strong opponents, Junkrat lacks high-level outplay options and is ultimately left to punish misplays or exploit weak links in the opposing team. At the very highest levels of professional play, this is why he is essentially unplayable outside a very small niche. For our high level Genji player, it is a different story. The design of the character yields exponential gains to game impact resultant from skill growth: as accuracy, speed, and aggression increase so do mobility and longevity in a positive feedback cycle.
Every character has a skill curve of some slope, that is, there is no hero which can honestly be said to require ‘no skill’. However, one can see the skill curve differential problem even embedded in core hero statistics. Ana and Mercy are both powerful single target healers (comparing two different heroes will always be comparing apples to oranges, but hopefully this example is nonetheless illustrative).
Heal Rates: (source: owinfinity.com)
ANA: 75 healed per shot * 1.3 shots per second * (% accuracy) = Effective Heals per second
MERCY: 60 = Effective Heals per second
These Effective Heals per second values equalize when the Ana player’s accuracy reaches ~61.5%. That is to say that an Ana player with accuracy lower than that value will heal less per second than a Mercy and an Ana with higher accuracy will heal more per second. The point of equalization isn’t particularly important, but the fact that Ana is able to do her central job as a healer (healing) faster and more reliably the higher her accuracy gets reveals that her skill curve is steeper than that of Mercy. This cuts both ways; at the far left tail of the skill distribution (where % accuracy values are generally much lower) Mercy outputs more heals per second than Ana. Mercy has important decisions to make in order to maximize her survivability, but Ana’s self defense options are no less complex or skill dense (in fact they, like her healing rate, are significantly more responsive to skill increases)
Ana and Mercy, Genji and Junkrat: contrasting these pairs reveals the central difficulty of simultaneously satisfying players across the entire skill distribution. Professional players lament that Junkrat is meme-trash-tier in organized competitive play while he simultaneously reigns as the uncontested King of Brawl Winnin’ and The Silver Division. Ana’s winrate, meanwhile, steadily climbs with skill tier from a tragic 38.9% in Bronze to a respectable 51.9% in Grandmaster.
Nowhere are the consequences of the skill curve differentials more apparent than when comparing Ranked Matchmaking (of any level) to organized professional play (hereinafter ‘eSports’). Mercy, statistically speaking, performs well (above 50% winrate) all the way up to the top few percentiles of Ranked Matchmaking with a remarkably high pick rate. In professional play, she goes virtually untouched outside of the Pharah + Mercy combo.
This difference is a consequence of an added challenge of balancing popular online games that are also eSports. True coordination (in composition choice and game style) radically changes the way the game is played. Because Resurrect is fundamentally reactive, high level teams will often simply not allow an unsupported Mercy to garner value from her ultimate. She will be hunted by flanking DPS while the rest of the team intentionally staggers kills or saves ultimates to reduce the effectiveness of any Resurrect the Mercy is able to cast. As someone who has spent every season of Ranked Matchmaking at the very highest level of play, I can attest that these sort of plays are rarely if ever made regardless of the skill level of players on either team. I contend that an important reason Ana is so weak in low-tier play is that she demands coordinated protection to fully leverage her abilities (coordination that is virtually nonexistent at low level play). Likewise Mercy is incredibly punishing of undirected or uncoordinated play. Fail to hunt her down at the proper time or forget to save a key ultimate to counterplay Resurrect and a teamfight is quickly lost.
So what can we do? How can Overwatch feel fresh and full of optionality in an eSports context while also remaining balanced and enjoyable to play for those further to the left on the skill distribution?
Skill curve differential isn’t going anywhere, and in my opinion it shouldn’t. Blizzard intelligently marketed Overwatch much more widely than the traditional first person shooter target audience. This wasn’t just a marketing strategy though; the game design purposely features heroes, for instance Mercy, that aren’t so demanding of traditional arena shooter skills and rather allow positioning and decision making to determine game impact. In the long run, I think that this is a good thing. Purity is the enemy of innovation while community stagnancy is in direct opposition to promotion to a wider audience (something absolutely critical to achieving a public perception of legitimacy for eSports and even gaming as a hobby).
The only important question that remains is how to rise to the challenge of balancing for diverse skill tiers simultaneously. The approach that I’d like to see taken more often is the differentiation of mechanical changes and statistical changes.
Sometimes a number gets into the game that is simply broken. Bastion’s 35% value for his Ironclad passive springs to mind as a classic example of “utterly fucking busted”. Sometimes a character just doesn’t have the stats to compare favorably against his/her/its closest substitutes; pre-buff Soldier 76 is a good example. I don’t have date-accurate statistics for the strength of these heroes across skill tiers, but I contend that pre-buff Soldier 76 was probably too weak at every point on the skill distribution and pre-nerf Bastion was probably too strong at every point on the skill distribution. For these kind of across-the-board balance issues, statistical adjustments are warranted as they will have similar impacts on players of all skill levels.
These are the easy variety of balance problems. For the more complex varieties, a mechanical change in isolation or a combination of mechanical & statistical changes is necessary.
A strong example of a very good combination buff is the recent (live) patch to Hanzo. Hanzo felt a little too weak across the board, but at a high level aggressive compositions came to render him nearly obsolete. The 10% charge time buff to Hanzo is significant, but I would argue that even more impactful for high level players is the ability to hold a charged arrow while wall-climbing and to spawn your Dragonstrike early if the arrow collides with a wall. These changes make the space of options for Hanzo players significantly wider and enable much more aggressive and independent play. However, this kind of freedom doesn’t aid those who aren’t ready to use it. The change in totality made Hanzo players of all skill levels slightly stronger but had a significantly greater impact on expert players who can most creatively leverage the new mechanics. Widening the space of options doesn’t make a big difference to players who weren’t already pushing the boundaries of how a hero can be effectively played.
We can use this same mechanical vs statistical differentiation to better examine the past Genji nerf that removed his ability to triple jump in one continuous airtime via wall climbing. For low tier players who weren’t even aware of this possibility, the change had virtually zero impact. For high tier players who were exploiting it as often as possible to maximize mobility and survivability, the change had real consequences to Genji’s overall strength and playability. My honest assessment of the Overwatch development team is that they never thought about these differential effects and instead saw the triple jump as just an unintended bug to be patched out. The ledge-dash-super-jump mechanic was probably thought of similarly, and patching it out only really affected the few hundred (I doubt it was really this many) players who could hit it reliably enough to implement it as part of their play style. The important lesson here is that these pure-mechanical patches had radically different impacts on players of different skill levels.
These two examples provide a powerful blueprint for the formulation of balance adjustments that demand different impacts upon different skill tiers:
If a hero is in a good place for low-skill players but too weak for high-skill players: widen the option space by loosening mechanical restrictions and let creativity and talent shine through as increased game impact by high-skill players.
If a hero is popular and strong in the hands of high-skill players but a bit weak when used by newer players, combine a statistical buff with a restriction of option space. Make the hero more narrowly defined and yet more powerful within that narrow role. This variety of change must be done most carefully, though, as elite players will always seek to exploit any statistical buffs to their maximum potential even if it requires playing the hero in a radically different way (see the most recent attempt at nerfing Lucio).
That’s the theory, but here are my resultant suggestions for real balances changes. Feel free to leave feedback on the article as a whole or just these ideas! My twitter is @jake_overwatch 🙂
Bastion is a worse choice than soldier 76 virtually 100% of the time in high level play, but has a comfortable niche at median and below skill. Remove the self-stun upon Tank Transformation and when returning to Recon mode to allow for more aggressive initiations and the option to use Tank Form as effective counterplay in a fight. Also remove or adjust the Self-Repair animations that block the crosshair (that shit’s just annoying, yo). Average players will play just like they always have, but those smart enough to leverage these adjustments into a much more aggressive style will reap the rewards.
Widowmaker has felt incredibly map-dependent across the skill spectrum even after her charge-time buff. Decrease Hookshot cooldown (I suggest by 2-3 seconds) to increase mobility and escape options versus the dive composition that has come to define the meta. It is very dangerous to buff this hero with pure DPS, but giving her a slightly less narrow role might help her pick and win rates with skilled players.
Junkrat is an effective spammer that applies a ton of pressure to slow team compositions. His ultimate is reasonably effective against newer players but rarely finds sufficient utility in high level play to justify what is very often a suicide play. Give the Rip Tire a new ability (activated with whatever key is bound to Ability 1) that allows it to hop into a drift (yes I do mean cart-racer style) with a short cooldown. This will give stronger players options to bait out counterplays and reasonably juke players with moderate aiming skill while being difficult to abuse by those lower-tier players that don’t have a precise understanding of which counterplays they need to bait and which enemies they need to juke.
(maybe I want to roleplay Junkenstien)