Expected Goals and Elite NHL Players

23
Jan

The craze over Expected Goals (or xG) in hockey analytics is one that close followers of the movement likely haven’t seen since tracking Corsi first became popular years ago. For those that are only generally aware of the calculation of expected goals, it is a formula that attempts to account for the goal expectancy of each shot based on factors such as shot type, distance, angle, situation, and strength (a more detailed write-up from Emmanual Perry can be found here).

xGF% has generally overtaken CF% as the most accurate predictor of future success in hockey and for good reason: rather than simply accounting for shot quantity it also accounts for shot quality, which undeniably affects the probability that an attempt will result in a goal. And although this concept has been proven to be relatively accurate when it comes to predicting future team success, the same cannot be said about its ability to predict future scoring by elite players.

When examining how xG performs in relation to star players it is important to first establish the relationship between xG and (actual) goals for the majority of players in the NHL. The below plot includes data from 2014 to present and shows that the relationship is generally 1:1, with more players under-performing based on their expected goals than over-performing (this is because scoring in the league is declining overall and xG is calculated using historic scoring expectancies that do not weight the recent scoring drought as heavily):

r-plot-xg-goals

The two variables yield a correlation coefficient of .913, meaning that they are almost perfectly correlated in this set. However, there are still many serious outliers that drastically out perform their expected goal totals (look to the area above and to the left of the red x,y line in the plot above to see the skaters that beat their projections).

A look at the top 30 of these overachievers provides us with the following list:

top-30-diff

The primary value in xG lies not in its use as an evaluative measure, but in its ability to predict future performance; with a player who outperforms their xG thus expected to regress in the future. But with players such as Kane and Kucherov exceeding expectations by over 50% (and hardly slowing down), we have to begin asking questions about what xG isn’t telling us.

The list above also raised another significant question for me based on the obvious skill level of most of its members (perhaps minus Shawn Matthias). Therefore, I was bound to wonder if there is a skill or set of skills that allows each of these players to exceed expectations so dramatically. For players with elite shooting abilities like Steven Stamkos, Brent Burns, and Alex Ovechkin, the answer to their success seems to lie in their ability to simply pick a corner and fire shots past goaltenders on a regular basis, but what about the rest of the players on this list?

David Johnson (@hockeyanalysis) cited Tom Awad in a recent piece, noting that a player’s “finishing ability” is a large part of their actual value, and it is possible that this non-quantifiable factor could be the one responsible for the variation in goals and xG in our list above, even though accounting for this is difficult to do.

@dtmaboutheart used past shooting percentage in his xG calculation (which is outlined here) in an attempt to account for shooter talent, but issues with this include over-sensitivity to high/low percentages in the short term and the fact that shooting percentage shouldn’t really factor into xG as a whole because that percentage could be skewed based on shot location. For instance, a player could have a low shooting percentage but still be exceeding their expected goals if they take only long shots from the point and score on a higher percentage of those shots than the average shooter.

One potential fix is using a player’s regressed average goals scored greater than expected in place of shooting percentage to account for shooter skill. One downside to such a technique would be the duplication of the effect of exceeding xG so that future regression appears less likely, which would likely make the model inaccurate for all but a handful of the league’s very elite players (it contains the fallacy that because a player exceeded xGoals in the past they will continue to do so in the future, which is untrue for the vast majority of players). My knowledge of statistics probably isn’t sufficient enough to conduct accurate tests on this idea, but I encourage anyone reading this with the skill to do so to give it a try.

However, the bottom line is that both methods ignore the factors that could cause a player to deliver results that exceed xGoals. For example, see the below comparison between Patrick Kane and Marian Hossa from our data set:

kane-and-hossa-comp

As you can see, Kane doubled Hossa’s actual goal total, but he was slightly behind his teammate in the xG category. So why the massive discrepancy? For one, Hossa lagged far behind his expected totals largely thanks to a dismal 2015-16 season when he couldn’t seem to buy a goal. But aside from the fact that he under-delivered based on xG, Kane’s huge margin between actual and expected goals also stands out.

This is obviously an extreme example that is made possible by the polar opposite seasons the two forwards experienced during 2015-16, but we can begin to explain the difference in results despite the closeness in expected goals by looking at the same table above.

To the right of the xG column is the iCF (individual Corsi For1) ) column, which shows how many shots each player took from 2014 to 2017. As you can see, Hossa outshot Kane by only 3 attempts during that time span, seemingly giving each of them a relatively equal number of opportunities to score. Because they each had a similar number of attempts and took them from mostly equal scoring areas, they both finished with around the same xG tally (remember the inputs to the xG equation I outlined at the start of this piece).

I say “seemingly” in that last paragraph because the truth is that scoring opportunities exist in more plays than only the ones resulting in shots. Watching Kane is a great way to pick up on this, especially since he has been joined by Artemi Panarin in Chicago.

The two are often likened to the Harlem Globetrotters because of their ability to pass the puck through tight windows from across the ice. The amount of time their line spends in the offensive zone is astounding, but their puck possession skills sometimes fly under the radar because of their pass-first style. They often forego shots in favor of riskier passes to open up better scoring angles. The result of them passing up shots is lower CF totals, thus skewing the best proxy for possession we currently have.

Kane is not the only player like this: Kucherov, Barkov, and Gaudreau are also excellent examples that appeared on the list of over performers above. Therefore, the task becomes quantifying factors other than shots in a way that allows the refining of xG calculations for individual players. I will admit that I am entirely unsure of how to do that, but I can offer a few ideas that may aid the improvement of expected goals equations in the future.

To begin with, utilizing passing data (like the kind that Ryan Stimson has helped popularize) can build a more inclusive xG calculation by weighting plays that involve a pass through the “royal road” (essentially from one side of the ice to the other) and plays with a shorter time between pass reception and shot release.

Additionally, as technology improves and more expansive automated tracking is popularized at the league level, the likelihood that goalie screens, player speed and  direction at the time of a shot can be taken into consideration will be increased as well, giving statisticians more of the information they need to perfect xG calculations. The best part about the potential in this tracking technology is that it might not be more than a year or two away from being implemented. All of this additional information will help quantify the factors that allow the best players to capitalize on opportunities at a greater rate than the rest of the league.

The repeatability of these statistics will play a large role in whether or not they add real value to any expected goals equation, but there is no denying that they could be useful under the right circumstances.2


  1. Individual Corsi – The number of corsi events a player takes (shots + shots attempts that missed the net or were blocked 

  2. Data manipulation and visualization with R, data from Corsica.Hockey 

About the author: David Tews

David is a sport management student at UMass Amherst who one day hopes to work in athlete representation. Keep up to date with his writing and other interesting sports news by following him on Twitter via @DavidTews13.

Hot Stories From Around The Web