The lesson that everyone who works with numbers learns early and often is that things do not always turn out as planned. My original intent for this post was to test a very simple model with Team Corsi Percent as the independent variable, success (measured in terms of how far a team goes in the playoffs) as the dependent variable, and save percentage of a team’s starting goaltender as a moderator. This would, I reasoned, nicely follow up to my post on mediators and moderators with a concrete example. The model looks like this:
Things did not go as planned. Save percentage did not moderate the relationship between Corsi% and making it into the playoffs, or winning in the first round. I decided to run the model on the second round even though I only had 88 cases (logistic regression works best with at least 50 cases per IV), and save percentage did act as a moderator. The problem was that relationship between Corsi% and success was no longer significant.
I knew that my model had failed about half way through coding the data. However, some very smart people on Twitter encouraged me to continue anyway. This makes sense, because a failed model can still contribute to our understanding. After all, the analytics movement is still fairly new. Any new information is good information.
I will start this post by discussing the variables that I used (Corsi %, success, and save %). I’ll then provide data from logistic regressions (dummy coded DV to isolate rounds, reverse likelihood ratio method for entry to avoid suppressor effects) for each round in which the model is significant. My focus will be on providing as much information on the chance that others may want to use this information in their own work. I will close it out with a discussion of the moderator that almost worked.
Before I start, I just want to note that I am not partial to logistic regression. It is clearly the most appropriate method to use with this data, which is why I selected it, but I find the logic somewhat counterintuitive and the way output is organized in SPSS leaves a lot to be desired. As a result it is very easy, at least for me, to fudge things up. I have tried to be exceptionally careful, and hopefully it is all clear and well laid out.
This analysis covers 11 NHL seasons: 2002/03 to 2013/14 (note the 2004/05 season was cancelled due to a labour dispute). Data of interest was collected for each team, leading to an initial total of 11 seasons * 30 teams=330 cases. The n of the sample dropped from 330 to 229 due to an outlier was identified on the basis of having a studentized residual greater than 2.0 (Josh Harding became starter of record for the 2012-13 Wild after an injury, and lost in first round of the playoffs after posted a standardized regular season save percentage of 86.1).
That fact that I am using a “complete” set of data for a given time frame has implications in terms of using inferential statistics. Without going into too much detail, the essence of the concern is that if a complete population is used then inferential statistics are inappropriate (i.e. the standard error no longer has meaning, and crosstabulations and correlations provide answers rather than estimates, etc.). However, a) season prior to 2002 are not considered, and b) I am hoping to uncover underlying processes in order to make inferences that can be used to help predict future cases. An analogy would be using a complete set of hands dealt during a long poker tournament, such as the World Series of Poker Main Event, in order to estimate the probabilities of making each type of hand (e.g. a pair, two pair, a flush, etc.). I just wanted to flag this, because some statisticians will disagree with this assumption and argue that drawing sets of subsamples is more appropriate. It is an issue that I believe needs to be discussed/debated more within hockey analytics.
Three main variables were used in this analysis: team Corsi %, team success, and save percentage of starting goaltender. The following is a brief overview of each of these variables.
Team Corsi Percent (IV): This data is drawn directly from War on Ice, which has team-by-team Corsi% statistics from 2002/03 forward. A value of 50% indicates an equality between total Corsi For and total Corsi Against in a given season. Scores above 50% are desirable. Corsi is an important variable within the analytics community because is viewed as being a proxy for possession. This is mostly correct. I prefer to refer to Corsi as “productive possession” because it counts positive events (directing shots toward the opposing net) rather than strictly possession, which may or may not lead to a shot directed toward the opposing net. Team Corsi % is a continuous variable that is normally distributed.
Team Success (DV): Team success was initially coded as follows: 0-did not make the playoffs (n=154), 1-lost in first playoff round (n=88), 2-lost in second playoff round (n=44), 3-lost in third playoff round (n=22), 4-lost in finals (n=11), 5-won Stanley Cup (n=11). My plan for this stage of the analysis was to use logistic regression, which is appropriate when the DV is binary or multinomial. For ease of interpretation, and to avoid false results stemming from 11 cases coded as 4 (finalist) or 5 (Stanley Cup champion), I focused on binomial analysis for each round. To this end, I dummy coded a new set of variables (0-missed playoffs, 1-made playoffs; or 0-lost a given playoff series, 1-won that playoff series). The dummy coding ceased at the third round due to concerns about the amount of cases per category. As a rule of thumb, logistic regressions become unreliable when there are less than 50 cases per IV in the analysis. I elected to run an analysis up to the second playoff round despite having only 88 cases. Faced with the choice between waiting two years to add the appropriate amount of cases (8 cases per season will be added) or running the analysis with disclaimers, I opted for the latter option. My justification is that this is meant to be exploratory at this point, and as much information (even “interpret with caution” information) as possible should be gleaned from each analysis.
Save Percentage of Starting Goaltender (Moderator): Coding of this variable began with identifying the starting goaltender for each team in the dataset. The first criterion was number of regular season games played. In instances where two goaltenders were close (close was defined as within 10 games played of each other) the number of playoff appearances was used to identify the starting goaltender. More than 90% of the cases were easily determined by these criteria. The cases that caused the most difficulty were instances when a goaltender who was clearly the starter was injured prior to the playoffs, and thus did not play a single playoff game in that season. I deferred to the hypothesis, which measured the impact of Corsi% upon team success (playoff rounds) moderated by starting goaltender regular season save %. The goaltender who became the starter in the playoffs was identified as the starting goaltender, and his statistics were used, in the low (less than 10) number of instances when this happened. Note that this is different from instances in which a starting goaltender plays in the playoffs but is pulled due to poor play. This latter example is viewed as simply being one dimension of the larger story of save percentages and team success.
Save percentages fluctuate from season to season, which could potentially confound the results. To correct this, overall save percentages were calculated for starting goaltenders in each given season. A mean average of the seasonal save percentages was calculated (total saves by starting goaltenders divided by total shots faced by starting goaltenders in a given season) and used to create an adjustment that standardized the seasonal fluctuations. Each standardization was validated using the following formula: save percentage * adjustment = validation figure which should equal the overall mean. The following shows the grouped save percentage, adjustment, and validation for each season.
The season by season trend in save percentages among starting goaltenders looks like this (green-unstandardized, purple=standardized):
After successful validation the save percentages for each starting goaltender in a given season were multiplied by the adjustment for that season, which led to an adjusted save percentage. For the sake of comparison, I will include descriptives for both unadjusted (Save%) and standardized (StSave%) save percentages. It should be noted, however, that for purposes of analysis only standardized figures were used. Removal of the outlier that way discussed in the start of this section meant that the distribution for save percentage became normal (using the Kolmogorov-Smirnov Test and the Shapiro-Wilk Test) after initially being moderately leptokurtic.
The standardized save percentage is a regularly distributed continuous variable.
The model used in this analysis was significant in two instances: whether a team made the playoffs (0-missed playoffs, 1-made playoffs), and whether a team won the opening (1st) round. I will start each section with a discussion of the results, which will be followed by tables that provide an overview of important statistics. Unfortunately, providing a description of what each does, and how to interpret it, would at least double and probably triple the length of this post. So if you are not familiar with interpreting logistic regression output I suggest that you read the written summaries and skip the tables.
One quick note on examining residuals before I move on to the results. In all cases Cook’s Distance was within range (less than 1), and after removal of the one case discussed earlier both studentized and standardized residuals were well within accepted standards. No leverage points were found. However, several DFBeta for Constant scores were above 1, which was a cause for concern. When considering options for how to approach this issue, I decided the best approach was to leave such variables in the analysis. In my view, samples of 330 cases for the first analysis and 176 cases in the second were large enough to minimize the effect of the “pull” of the variable, especially since no cases were identified as being leverage points.
Logistic Regression 1: DV=Make the Playoffs? (N=0/Y=1)
Summary: The logistic regression model used in this analysis was conducted to predict whether a team would make the playoffs using starting goaltender save percentage and team Corsi % as predictors. A test of the full model against a constant only model was statistically significant, indicating that the predictors as a set reliably distinguished between teams that made the playoffs and teams that did not (x² = 128.559, p < 0.001, DF = 2).
Nagelkerke’s R² of 0.432 suggests that the model performed moderately well with respect to fitting the data. This is supported by the Hosmer and Lemeshow test, which shows no significant difference between predicted and actual scores (x²=10.019, p=0.264, DF = 8), as well as the prediction success rate, which rose from 53.2% (baseline) to 73.9%.
The Wald criterion demonstrates that both save percentage (Wald=36.242, p<0.001, DF = 1) and Corsi percentage (Wald=59.752, p<0.001, DF = 1) are significant predictors of whether a team will make it to the playoffs. The EXP(B) value indicates that when save percentage is raised by one unit (e.g. 91.5% to 92.5%) the odds ratio is 2.688
times as large, which means that teams are 2.688 more times more likely to make the playoffs. Similarly, the EXP(B) value for Corsi% indicates that when Corsi% is raised by one unit (e.g. 50.6% to 51.6%) the odds ratio is 1.622, which means teams are 1.622 times more likely to make the playoffs.
It is important to recognize that the range of values for Corsi % is greater than the range of values for save %, which makes interpretation difficult. When running the same logistic regression with one IV at a time I found that using Corsi % alone leads to correct classification of cases 71.5% of the time, while using save percentage alone leads to correct classification of cases 65.0% of the time. This shows that the although save % improves the model to a statistically significant degree, the size of the effect of including save % in a model that has Corsi % is really quite small.
The following tables provide a summary of the model discussed in this section.
Logistic Regression 2: DV=Win 1st Round of the Playoffs? (N=0/Y=1)
Summary: The logistic regression model used in this analysis was conducted to predict whether a team would win or lose in the first round of the playoffs using starting goaltender save percentage and team Corsi % as predictors. A test of the full model against a constant only model was statistically significant, indicating that the predictors as a set reliably distinguished between teams that made the playoffs and teams that did not (x² = 11.556, p < 0.01, DF = 2). However, the level of significance and the chi square statistic were not as impressive as they were in the previous model. This can be partly attributed to the drop in sample size from 300 to 176.
Nagelkerke’s R² of 0.085 suggests that the model did not perform particularly well with respect to fitting the data. However, the model is still functional, as illustrated by the Hosmer and Lemeshow test which showed no significant difference between predicted and actual scores (x²=5.039, p=0.753, DF = 8). The prediction success rate was 60.0%, which was only a marginal improvement over the baseline of 50.3%.
The Wald criterion demonstrates that both save percentage (Wald=7.030, p<0.01, DF = 1) and Corsi percentage (Wald=5.426, p<0.05, DF = 1) are significant predictors of whether a team will make it to the playoffs. The EXP(B) value indicates that when save percentage is raised by one unit (e.g. 91.5% to 92.5%) the odds ratio is 2.247 times as large, which means that teams are 2.247 more times more likely to make the playoffs. Similarly, the EXP(B) value for Corsi% indicates that when Corsi% is raised by one unit (e.g. 50.6% to 51.6%) the odds ratio is 1.304, which means teams are 1.304 times more likely to make the playoffs.
By all measures, this model of success in the first round of the playoffs did not perform as well as the previous model, which focused on identifying whether or not a team would make the playoffs. The following tables provide a summary of the model discussed in this section.
The sample size was sufficient to test whether Corsi % and Save % were statistically significant predictors of success in the second round of the playoffs, but the model failed (p=0.270, ns.) and could not improve upon the 50% prediction success rate that one would have with a simple coin toss. Running a logistic regression with each IV separately also produced failed models.
One interesting result occurred when running a moderation model with second round playoff data (n=88), using Corsi % as the IV and success (series win) as the DV. In this case the moderation test was positive, which made me quite excited. The Johnson-Neyman test of significance regions identified a cutoff point of 91.2417 (please keep in mind that this is a standardized figure, and would have to be adjusted to give an accurate value for each season). My heart sank, however, when I realized that the IV was no longer significant, which meant that I was essentially running a correlation between the moderator and IV with no underlying meaning to it. At least none that I could statistically show.
When I looked more closely at the variables (recoding save % using the cutoff point), I was pleased to discover that a pattern was emerging. Teams with a Corsi % above 50% had an almost identical success rate in the second round of the playoffs regardless of the save % (please note once again that this is a standardized regular season save %, not playoff save %). However, teams with a Corsi% below 50% had a 1-4 series record in the second round when the save percentage was below 91.2417, while those equal to or above 91.2417 had a series record of 13-7. This is worth tracking this as 8 new cases are added each year.
Thank you for reading this post. Even though it is technically in the category of “failed research,” hopefully some of you will be able to take away something from this overview. Please note that there are many numbers in this post, and a lot of description and analysis of those numbers. If you see a mistake in either the numbers or my wording/the way I presented the information, please just let me know. The toughest part of writing this type of post is that I don’t have a second set of eyes handy to proofread and/or point out places where I got it wrong. So you are those eyes.