A look at advanced analytics in CS:GO

Author’s note: This was written to be viewed in RES so that you can see images within the post. Otherwise, reader experience may vary.

TLDR: I made some new cool statistics for CSGO. Check out all the pretty pictures.

I’m yohg, and I want to help change the way we think about statistics and their place in CSGO. Video games offer an amazing opportunity to do some statistical analysis that can’t be done in other sports. The “demo” in CSGO, for those unfamiliar, is a replay of the game that contains every action performed by a player throughout the game. The demo is basically an extremely good replay of the game (it might be more fair to call a video recording of a football game a “bad” demo file).

This wealth of well-formatted information means that it should be relatively easy to write a parser for the demofile which extracts all of the interesting information we might want to analyze. In fact, Valve wrote some rudimentary software which does just that. Unfortunately, this software is pretty immature in the sense that it naively dumps all game events in a psuedo-JSON format for some messages and a home-grown format for others. Basically, it’s a good starting place for understanding how to parse a demo, but it needs a bunch more work.

I used Valve’s starting place to write demo parser which fit my needs - i.e. dumped out all relevant grenade, inventory, movement, round, position, gunfire and damage events. Then I wrote a Python-based analysis utility which does some additional data processing and outputs various kinds of analytics. The hope is to continue to build the Python code and demo parser into an analytics platform for understanding CSGO strategy in a way that hasn’t been seen before.

I’ve built a wide ranging tool set already, but I want to focus on three different questions about Counter-Strike and examine how we can answer these questions with help from demos.

Hypothesis: Some flashes are better than others.

This is an uncontroversial statement on the whole. To take an extreme case, throwing away a flash in spawn is almost definitely a waste of utility relative to throwing the flash at almost any other place on the map. However, it’s currently quite difficult to quantify how effective a particular flash lineup is.

Why is this difficult? Aside from the problems obtaining the dataset, the main issue is that each thrown grenade is slightly different in terms of x, y, z, and thrown pitch & yaw. Even if someone religiously practices a lineup, the throw will be slightly different each time, since the game is far more precise than a player could hope to be. So we have to find a way to determine if two flashes are ultimately the “same flash”.

Statistical Aside: We can figure out “similar” flashes by borrowing some techniques in machine learning called “clustering” algorithms. Here’s a slide-show from Stanford introducing some of these techniques. Clustering works very well when you have some seemingly obvious structure in your data but don’t want to go through the process of hand defining it. I had to do some additional cleaning to remove some superfluous flashbangs. I used standard descriptive statistics to guess which flashbangs might be outliers.

Knowing this information isn’t just fun, it’s also incredibly useful. Since a team can throw a max of 10 flashes a round, you want to pick the flashes that are going to be most likely to blind your opponents. After we figure out which flashes are “the same” (belong to the same cluster), we can figure out the average aggregate opponent blind time. So we add up how many total seconds we blinded our opponents for, and then average across each flashbang in the cluster. This roughly tells us how effective a particular flashbang is.

It would also be possible to use some other metrics to determine flash effectiveness such as probability of getting a kill. I’ll explore alternative scoring mechanisms at some later date.

Here’s what all the flashbangs on a map look like. Pretty hard to see any structure, right?. The green here are any players that were blinded by a flashbang in the cluster, to give you an idea of where you might expect to blind someone with a particular flash. The green includes flashed teammates, but the blind time calculation does not.

What happens after we cluster it and rank by flashbang effectiveness?

Here are the top 40 flashbang clusters on Train, minimum 4 flashbangs.

This definitely gives us some insight into the best flashbangs on Train. Some particularly interesting insights include:

  • If you have to choose to flash close or far when throwing a flash to A site from the garbage bins, throw a short flash. You’ll get an additional half a second of flash on average (Cluster 0 vs 107). It’s even worse to throw a flash from cluster 241.

  • Cluster 7 is probably the best way to flash T-site if you just have a single flash.

  • There are better ways than others to pre-flash T connector. (Cluster 2 vs 28)

There are definitely some improvements that can be made here. First, the clustering algorithm isn’t perfect and could use some tidying up. There are occasionally flashes that depart more from the rest of the cluster than we’d like (consider cluster 54). Likewise, 4 flashbangs is probably too few to assume that we have an accurate measure on likely blind time.

However, I really think this is exciting. It should be possible to do something like load Adren’s flash script with the information from a particular flash, so an aspiring IGL could look at an effective flash, click it’s cluster, and see how the flash is actually thrown in game, all from the demo info. It would also be possible to build something like a flashbang lookup script, where a user clicks the start and end location of a flash and is shown a couple of effective ways to flash from their combination of locations.

It is, of course, possible to do the same thing for HE grenades relatively easily. If I were going to decide to nade stack somewhere, either the ramp entrance to B or Ivy at A would be good decisions (as opposed to say, nading T connector).

Hypothesis: Terrorists are more likely to win a round if they have more map control.

On the surface, most people probably believe this to be true. It’s certainly intuitive -- the more you know about the map, the more you know about the location of your enemy. The more you know about your enemy, the more likely you are to win the round!

But how do we evaluate a claim like this empirically? We have to have some notion of map control. While we have access to the demo data, the demo isn’t really complete without more information about the map. My approach to the problem is pretty basic. In theory, we have access to the entire map (via the Valve map .bsp), and so we could do something super fancy. But I haven’t gone that deep… yet!

Instead, I take a look at each map’s accompanying .nav file, which is used to help bots navigate the map. It also includes the callout names that are displayed in game (though these aren’t as fine-grained as I’d prefer). Matthew Razza released a .nav parser written in G which was extremely helpful for quickly getting at the data in these files. Each nav file represents a map as a list of places (which are typically named). Each place consists of a set of non-overlapping rectangular (forgetting the third dimension for a moment) areas.

Here’s what the map areas look like in Train, overlayed on top of a simpleradar minimap. As you can see, it isn’t perfect - there are some untracked spots, but largely it covers every important area on the map.

We’ll use these areas as a rough way of tracking the walkable area of the map. To determine map control at any particular tick, we can follow the player’s movement from the start of the round until the given tick. If a player comes within x units of an area, we assign that player’s team control of the area until another player comes within x units of it. If both teams are within x units, we don’t assign either team control. Aggregate team map control is then the sum of the areas of each team-controlled area.

This algorithm is a rough proxy for how map control actually works. This algorithm means, for example, that if a CT player pushes out ivy and walks halfway towards T-spawn, then turns around, he’ll be assigned map control of that area until a T comes back to Ivy. The idea here is that that CTs know Ivy will be clear, even if that CT doesn’t remain, so they have “map control” in that sense.

There’s many things we could do to improve this technique. We use our buffer of “x” units as kind of a proxy for vision. Ideally, we’d want to have a better idea of where a player can see and give players control based on vision. This really requires a knowledge of the objects on the map. It would be possible to either learn a vision representation based on many, many maps worth of data, or calculate it directly using the BSP file. However, this simple technique should allow us to say some interesting things about the value of map control.

Here’s some examples of assigned map control towards the end of a couple of rounds to give you some perspective on what this algorithm looks like in action.

To determine the utility of map control to a terrorist, we are going to examine map control 30 seconds prior to the bomb plant (or end of the round, if there was no plant). Map control is measured in square units of map controlled. We then calculate something I call the control ratio, which is (square units of area controlled by Ts) / (square units of area controlled by CTs). There’s often some contested area, which we don’t count for either team. I can then graph the distribution of plants vs no-plants to visualize some statistical tendencies and run some other statistical tests.

Statistical Aside: My goal here is to compare the likelihood of winning a round with the amount of map control the team had y seconds prior to planting the bomb (or winning the round). I use a tool called the Mann-Whitney U test. I don’t want to delve into the exact stats, but Mann-Whitney will tell us if it’s more likely that a winning terrorist team had more map control than less map control. It’s similar in method to t-testing for anyone who has taken basic statistics. This lets us evaluate if our evidence of deviation between the distributions is “statistically significant” in a particular way. I use a 95% confidence threshold for all of these results, so if the value in the table is < 0.05, it is statistically significant.

Here are the results. I tested a couple of time horizons and a couple of different types of situations. First, I thought it made sense to look at just buy rounds, since the dynamics of map control likely change a lot in an eco. Second, I wanted to exclude the possibility of huge man advantages leading to complete changes in play, so if at our time horizon, there was greater than a one man advantage, I ignored the round. I also did both together.

This table has links to the distributions, which include an estimated distribution by means of kernel density estimation, as well as an actual histogram. The histogram itself is a bit misleading as it displays raw frequencies. Red is any round in which the terrorists get a plant, and blue is any round in which they don’t get a plant. In the table, each cell contains the p-value, and SS if the difference was statistically significant and N if it was not.

In general, this is a good way to begin to evaluate the question of whether map control is important to getting a bomb plant. I’ve also done the same thing with round wins. Obviously, the terrorists can win the round be eliminating the opposition, so it’s also fair to ask if the difference in map control in rounds where terrorists win and rounds where they lose is statistically significant.

In general, we have approaching complete statistical significance on a ten second horizon and none on the twenty second. I think this is interesting as it suggests that some critical development may take place in the last 10 seconds before a plant, but not in the prior 10. However, I hesitate to draw any strong conclusions. This method is still in its infancy and requires quite a bit of calibration. It may be the case that a different model captures map control dynamics in a way which better captures the true dynamics of the game. For example, it might be the case that particular areas are far more important and should be weighted according to their importance as opposed to simple square area.

But this does demonstrate that we can capture some of the dynamics of map control using a automatic analysis of a demo. It also could serve as a good starting point for evaluating how good a particular team is at taking map control, but comparing them at various points throughout a game against the distribution of all teams playing the same map.

Hypothesis: “Better” players have more better crosshair placement

Any relatively serious CSGO player has probably spent some time working on his or her crosshair placement. It’s hard not to imagine that there’s some kind of relationship between a player’s crosshair placement and how many kills they get (or other metrics of player performance, such as rating). Let’s examine this hypothesis in depth.

But how do we quantify crosshair placement? Every player has a view-angle that can be described by two angle measurements, pitch and yaw. Roughly, a player with good crosshair placement is one that moves their crosshair less in the time right before a kill. So, roughly, we can track how far a player moves their crosshair in the seconds before a kill to get a distribution of how “good” a player’s crosshair placement is. Like golf, the lower the better.

The dataset I am using for this analysis is all of PGL Major Krakow, plus a couple of matches before that (Cologne Finals, some DH Valencia, as well as some ECS matches are included).

Here’s what crosshair placement looks like for all players across all of the demos in our dataset.. Relatively interesting, and the distribution makes a lot of sense. We have way larger flicks in the horizontal direction than the vertical, which is expected since CSGO players aren’t often shooting surprise enemies above or below them. But, I don’t think this plot really does professional CSGO players justice, as there are a ton more dots in the center… Here’s what this data looks like as a heatmap. Damn.

This data is super sensitive to outliers, so to get an idea of what the average crosshair degree delta is, we take the median instead of the mean. For this dataset, the median crosshair degree delta is just 19.66 degrees.

Statistical Aside: The mean here is 31.57 degrees, which is quite far from the median, so this is a very skewed distribution! This isn’t surprising. Most kills are on point, but you’ve got to have some outliers to get the people going. It’s provocative. The median does give us the better estimate of centrality in this case. Half of all kills in the scatterplots are within the grey circle.

Obviously, these numbers differ quite significantly based on which weapon is being used. To give you a better idea of what that looks like, I’ve generated the same scatters across three weapon types (snipers, rifles, pistols) and two engagement modes (duel, all engagements).

I wanted to understand how crosshair deltas changed when a player was forced to take an engagement, as opposed to getting an easy lurk kill. To do this, I looked to the view information in the demo. I considered a “duel” to be an death when the killer was in the victim’s POV, and a “lurk” kill to be any in which the killer was outside the victim’s POV. For example, here’s a duel and here’s a lurk. The displayed POV is that of the victim.

Here’s what all the scatters and heatmaps look like across all of our categories.

This is kind of interesting, but I’m more interested in how players compare across these different categories. To visualize this, I’ve taken each of these categories and built a plot which shows the median crosshair delta for all players in our dataset. It also shows the standard error, which tries to give us an approximation of our uncertainty in the data by combining sample standard deviation and sample size. I’m not using it here in a terrible scientific fashion, but it serves as a useful metric for understanding our certainty (based on data size) combined with a player’s consistency.

Here’s what all of those charts look like!

We see some interesting trends here. Let’s look to the weapon-specific charts, as I don’t think we can get too much data out of looking at all kills or all duels, since playstyles per weapon will likely make drawing a strong conclusion quite difficult. Each weapon specific chart includes the number of kills for each player to give you an idea of how vast the dataset was.

Looking at Sniper Rifles, /u/AdreNMostConsistent will be happy to see that AdreN has incredibly consistently good aim with sniper rifles. Given his weapon history this is perhaps a bit surprising! He only had about 10 sniper rifle kills in the dataset, but his aim when using them was quite consistently good. Looking to AWPers with greater than 30 kills, keev was certainly a standout, and GuardiaN also had relatively small crosshair deltas.

In the rifling world, kRYSTAL is king, but many mad-fraggers are close behind, like ScreaM, s1mple, and Xizt. AdreN, the HLTV MVP of the finals, is also pretty high up there (with a way larger dataset than most players here). Interestingly, we see that once we just look at duel kills, kRYSTAL drops significantly in the rating, suggesting that there is some statistical value in understanding how the situation around a kill.

For pistols, we see oskar, Xyp9x, Xizt, WorldEdit, and Dosia up there on the leaderboards.

Understanding how well this correlates with other metrics of player stats will require a bit more analysis, such as evaluating the correlation between crosshair delta and rating. I’m not convinced that high crosshair delta necessarily implies a weak player. It’s certainly possible that the best players are those that can flick very well, likely raising their average CHD. However, I think this foray into understanding crosshair deltas has served as an indicator that there is some value in quantifying this aspect of player performance.

Conclusion

I really think this is only the tip of the iceberg in terms of what can be done with demos and analytics in CSGO. I’ve already done some less in-depth things like conditional heatmaps, early round movement heatmaps, round GIF replays, and automatically detect executes from grenades and player movement. These tools have the potential to both make the spectator experience better (by providing better visualizations of strategies and metrics by which to evaluate players), but also make the game even more competitive than it really is. If you’re a professional team, why watch hundreds of hours of demos to learn opponent tendencies when a computer can figure out those things for you?

If you want, follow me at my brand new Twitter @yohgcsgo


Next entry

Similar entries