I started the Integrating Hockey Analysis project about a year ago. The core idea behind it was that both the burgeoning analytics movement, which focused on statistical analyses, and old school understandings of the game, which tend to rely on personal evaluations that are collected under the rubric of “the eye test,” can each bring valuable information to the table. From the outset, Integrating Hockey Analysis was about attempting to correct what I interpreted as unproductive discourse that was taking place across social media. It seemed to me that some were too invested in a particular approach, which led a dialogue that featured far too much “talking at” rather than “listening to.” Worse still, individuals with different perspectives were viewed as being personally flawed in some way (e.g. “loves spreadsheets more than watching the games” or “doesn’t know how to count”). The cherry on top were internet trolls on both sides of the debate who provided a steady stream of easy examples of how stereotypes of the “other side” were the rule rather than the exception. Hockey fans seemed to be divided between “pro-analytics” and “anti-analytics,” with little meaning dialogue taking place between the two camps.
I wanted to shift the terms of the debate to focused more upon an “and” rather than an “either/or” approach. To accomplish this, I started with inclusive set of assumptions: yes, members of the analytics crowd actually do watch hockey games, just as members of the old school hockey crowd know how to do math. I also started with the assumption that each side of the analytics/old school debate come up with valuable insights. The relationship between Corsi and winning, for example, is a real thing, just as the importance of leadership within hockey teams is a real thing. Finally, I drew upon my investment in triangulation (i.e. using multiple sources of data or methods for data analysis) as the best road to travel on if we want comprehensive data that we can trust. To use an analogy, focusing exclusively on one approach or another is, in my opinion, like putting a patch over one eye before looking at the Grand Canyon. You may get a lot of information that seems pretty good, but you also miss important parts of the larger picture.
With those starting assumptions in place, I set out to work through the challenges involved in bringing two complete distinct sets of knowledge, complete with differing assumptions about “what counts,” together into an integrated whole. If one accepts that the two distinct types of hockey knowledge each have something to offer, the next logical question is how to bring the information together in a way that makes sense. From the outset I planned to talk about three main approaches that I believe would work: mixed methods analysis, statistical models with latent variables, and complex systems. As I was writing I started to think about what is being integrated, and this became a secondary theme that ran through the last half of the articles in this series.
Mixed methods analysis combines qualitative and quantitative information. The core premise is that while statistics provide an excellent account of patterns and trends, they are weak in terms of identifying what social situation under observation mean to those involved. Conversely, qualitative analysis is good at identifying how individuals interpret given situations, but poor with respect to identifying larger trends and patterns. The advantage of mixed methods approaches is that the strengths of one method address the weaknesses of the other, which leads to a more nuanced and complete account of whatever is being studied. A secondary advantage is that it does not attempt to fundamentally change one type into another (i.e. it does not attempt to artificially quantify qualitative data).
A key limitation of mixed methods analysis is that is almost impossible to give equal weight to quantitative and qualitative information. In practice, it ends up either that quantitative data is used to double check what you know from qualitative observation and analysis (e.g. the argument that stats should be used in the same way that a drunk uses a lamppost: for support but not illumination), or qualitative information is used to provide examples or counterpoints to trends shown in the quantitative analysis. I strongly suspect that many NHL executives are firmly embedded in the first type of mixed methods analysis (using quantitative data to double check what your eye test is telling you). This would not be a problem if the qualitative analysis was rigorous and systematic. However, it is probably neither of these things. The most likely scenario is that accepted, well-worn narratives reign supreme over objective interpretations of qualitative information, while quantitative data is not used to its full potential. In the end, this is where old school approaches come off looking the worst.
Statistical Models with Latent Variables
Statistical models that feature latent variables combine manifest variables, such as shot attempts, with latent variables, such as leadership, within a single model. The main advantages of this type of approach are: it is empirical, it does not artificially favor one type of data over another, and it lends itself to mapping out complex interactions between the variables (e.g. moderating and mediating effects). The two major limitations of this approach are: 1) a large number of manifest variables that are currently used are based upon shot attempts, meaning care has to be taken to avoid collinearity and/or edogeneity; and 2) unless we are working directly with a team and collecting the right data, we do not have access to information that could measure latent variables such as leadership.
An analyst could put together an impressive, multi-faceted, and very useful model using this type of approach. However, the construction of such models is not possible for fans not directly connected to hockey team because much of the relevant information that could be used to construct solid latent variables is either not collected or, if it is collected, the information is proprietary team knowledge. I doubt that such models are being constructed on a team level, because analysts hired by teams specialize in measurement of manifest variables and most likely do not have the knowledge or skill set, or direction from the team, to put together models that include latent variables. I also want to quickly touch on an unfortunate spinoff of this lack of access to data, which is that many member of the contemporary analytics movement completely disregard, or deny the importance of, the “intangibles” that latent variables measure. This has led to outlandish statements like “a leader is just the best player on the ice,” or my favorite question: “if player x is such a great leader why does the Corsi of other players not go up when he is on the ice?” The fact they not bothered to look in scholarly books or journals that are readily available online is not acceptable in a movement based upon research and empirically-based evidence. This is where the cotemporary analytics movement comes off looking the worst.
Complex systems analyses typically use an interdisciplinary approach to construct models that highlight interconnection that exists between components of complex organizations such as hockey teams. A key insight of this approach is that changing one part of a complex system often leads to a ripple effect among the other components of that system that can lead to unanticipated (and often counterintuitive) results. One example of this approach is the Behavioral Engineering Model (BEM), which contains environmental and individual factors along one axis, and information, instrumentation, and motivation on another, to form a model of complex organizations that has six rank ordered components. Other models are far more complex either in terms of the number of components, or the types of paths and/or feedback loops within the model. This fully integrated approach (i.e. it uses manifest and latent variables) can provide insight into how and why hockey organization succeed and fail at particular points in time by eschewing simplistic answers that commonly fall along the lines of “having good possession numbers is a good thing, therefore a team will get better by replacing player x with player y who has better possession stats, lather-rinse-repeat, plan the Stanley Cup parade.” A second strength of this approach is that it includes analysis of important factors commonly missed by other approaches, such as team budget, the flow of information between coaches and players, the structure of player contracts, team leadership, having unified messaging throughout the organizational structure, and cohesion among individuals.
The promise of this approach rests in the fact that it has been used to great effect, measured in increased productivity and efficiency, by complex non-hockey organizations.
The starting position of adopting this approach would be that hockey organizations are a workplace, and can be examined using the same means as any other place of business. In practice the researcher needs to be fully embedded within an organization for a period of time, and to have full access to key pieces of information, in order to identify important interconnections between the core components of the organizations. Furthermore, there has to be complete buy-in on an organizational level to adopt suggested changes based on the research. There is no indication on any level that NHL teams are motivated to hire individuals with the highly specialized skill set required to do this type of analysis, and to provide the required level of internal information to analysts. Also, the track record of NHL teams for fully buying into empirical research, and making sweeping changes across the organization in light of empirically-based findings, is less than stellar to put it mildly. There may be a point in the future when dysfunctional hockey organizations are willing to bring in external analysts to help in this way, but that future is not even a small blip on the radar at this point in time.
Sources of Knowledge that Could, and Should, be Integrated
My plan for this blog was to outline different ways of integrating old school and contemporary analytics knowledge. I quickly realized that this framework was limiting, and that integration of other types of knowledge also had to happen. I’ll quickly list a few that became important as I was writing this blog in the order in which they came up.
It is common to discover things while you are in the middle of a project that you had not really considered at the outset. Far and away my biggest shock was the lack of reference to social science research and methods in contemporary hockey analytics. Sociologists commonly look at the impact of social forces, often in the form of culture, upon individuals. Psychologists tend to focus on individual factors, and have established metrics to measure latent variables that have been validated over hundreds, and sometimes thousands, of empirical trials. These perspectives are persona non grata within contemporary hockey analytics, where a pejorative “intangibles” label is commonly hung upon anything that is not directly, and externally, observed. Each discipline brings insights as well as limitations to the table, and ideally a mix of perspectives can correct for lingering disciplinary biases that add unnecessary limitations. The strong “hard sciences” and math biases in contemporary analytics really stand out in good ways, in the sense that the technique and methods have often been solid, and bad, in the sense that studying humans as rational actors capable of interpretation that exist in the context of social systems sometimes requires different techniques and starting assumptions.
A very important strength of statistics is the ability to look at trends and associations that are not always apparent to the naked eye. For this reason, it makes perfect sense that those who are a part of contemporary analytics movement favor manifest variables that are relatively easy to quantify measures such as shot counts, saves, and results of different methods of zone entry. However, valuing manifest measures above all else had the indirect consequence of marginalizing experience-based knowledge that comes from hockey players and/or coaches. The best, but far from only, example of this is the shot quality debate. I remember having very long and drawn out debates prior to starting this blog with individuals who insisted that shot quality did not matter. The core assumption was that over the course of a long season shot quality washed out. Players scoffed at the notion that shot quality does not matter (e.g. Bobby Ryan’s scathing comments about the difference between shooting from the boards and shooting from the slot). If shot quality is a thing then save quality is probably also a thing. To their credit, the analytics movement did eventually accept that shot quality matters. However, this acceptance was based on resources such as War On Ice that recently included shot location data to their database.
Women have been largely marginalized as both hockey fans in general, and within the contemporary analytics movement in particular. If I had decided to extend this project another year I would have addressed the integration of women into hockey analyses in more detail, and I apologize for giving such an important topic so little space in this blog. Getting back to the topic, women, as is so often the case, are caught in a bit of a catch-22. If the inclusion of women in the contemporary analytics movement is simply about fairness in representation, then it makes sense that having more female hockey bloggers, and more females running numbers, is the way to go. However, going down this road in criminology led to what is mockingly refer to as the “add women and stir” approach. The idea here is that simply “adding women” often does nothing to change the central narrative, methods, and conclusions. Instead, in an “add women and stir” approach the only thing that changes is that women were the ones producing a product that was essentially left intact (i.e. the topics, methods, and findings are identical regardless of whether the research is conducted by a male or a female). As an alternative, one can start by either the assuming that women have something different to say that can (and must) be included in the discussion, or to fundamentally change existing analyses by placing gender at the center. This makes sense of you consider the experience of growing up playing hockey, of going to hockey games, of participating in hockey discussions on social media, and even writing hockey commentary and analysis, is often completely different for men and women. However, this latter approach can and is also criticized because on the grounds that it serves to a) reinforce claims that men and women are essentially different, and b) opens the door to a niche sub-genre within hockey analytics that can be targeted, and often dismissed, by those in the mainstream. To make matters even more complicated, there is no (and will never be any) agreement, and the decision on which path to take is ultimately a personal one.
New sources of data:
The NHL is in the process of implementing a new player tracking system that promises to provide new types of data that is drawn from chips embedded in pucks and, possibly down the road if the NHLPA agrees, in player uniforms. The potential is exciting largely because it will lead to new metrics that are not shot based, and that will lend themselves to analyses of passes, speed (player and puck), as well as offensive and defensive systems. However, the amount and type of information that will be made available to fans is still very much an open question. In the best case scenario, we will have access to geospatial information that lends itself to types of analysis commonly used in fields such as geography. There is potential here to really make huge contributions with respect to player safety, and to contribute to the movement to decrease the amount of concussions experienced by players and to improve the protection offered by equipment worn by players.
I’ve come to the conclusion that there are two distinct finals notes that I can use to conclude this post and this project. On one hand I am proud of what I accomplished as a whole. Even though, if I had to be honest with myself, the posts were often uneven and my lack of experience with blogging really showed at times (particularly at the start), when I step back and look at the big picture of this project, I believe I made a decent case for what could be done if we decided to try to integrate different types of hockey knowledge. On the other hand, going through the process of collecting the type of data needed to make any of the approaches outlined here work would be arduous, and to the best of my knowledge no one is going down that road. If it is not applied, at the end of the day it was a theoretical, or epistemological, or occasionally even a critical exercise. I know full well though that practical wins at the end of the day.
I did not expect to, or try to, be heard over the cacophony of hundreds of other hockey blogs that are out there. It was always more of a personal project to me. In fact, I remember telling myself at the start that if I ever reached 100 Twitter followers it would be a clear sign that I sold out and abandoned my critical roots. Having said this, if I did not care at all about having others find and read my material I would have simply written it all down in a personal diary instead. The act of starting a blog or a website means that you want people to visit what you have created, and to react to it in some way. Over the course of the past year many people have read this blog, and I have had a decent number of great conversations rooted in some of the topics I have written about here. In the end, I think the biggest positive that came out of this project is that I met a lot of cool people along the way, many of whom provided much-needed moral support as I went through all of the ups and down you would expect with trying to sort through a project like this. The connections I made with people through this blog that will continue after this site goes dark are the most important thing I gained from the experience. But, then again, those types of connections are “just” intangibles, so what do I know.