Just a quick note before I get into the main topic of this post. When I look around at other blogs and other hockey sites, they pretty much all either provide data, analyze data, or provide some sort of commentary about a particular team. As this site progressed it kind of developed into a hockey theory/epistemology blog. To the best of my knowledge this is the only blog of its kind out there, which I think is kind of cool.
My original plan was to jump into a discussion of regression analysis, and I began a two part blog entry broken into how and why we use regression, followed by the importance and meaning of the error terms. While I was slowly plugging away at that topic, I stumbled across a couple of conversations featuring arguments that went something along the lines of: “intangibles aren’t real. If leadership was a real factor in hockey it would come out in the numbers.” This type of statement is usually followed by people patting one another on the back and poking fun at the dinosaurs who still believe that leadership and determination are “things.” I decided to add an extra entry on Latent variables, which is this entry. The next two entries will focus on regression analysis.
Observed (Manifest) Variables and Latent Variables (Factors)
I come from a social science background, which provides a very different starting point from what you may find from someone coming out of math, statistics, or engineering. One of the primary differences between the social and natural sciences is that social sciences deal with complex human phenomena that often cannot be directly measured. For example, psychology has concepts like resilience and coping, sociology has anomie and social facts, economics has socio-economic status and capitalism, and education has teacher quality and executive functioning.
I have to be up front and honest here. When I first read Tweets or messages in online hockey forums, where individuals blithely stated that leadership and determination are not real, I just wrote it off as the lunatic fringe saying their piece. I was blown away that individuals for whom I gained some measure of respect parroted those same nonsensical (from my perspective) arguments. The result was confusion on my end. Would those same people argue that socio-economic status does not really exist, and that income is the be all and end all? I mean, socio-economic status predicts outcomes far better than easily measureable variables such as income or wealth ever could.
It is of critical importance to distinguish between two very different types of variables. Latent variables, also known as manifest variables, are abstract phenomena that cannot be measured directly. Leadership and determination would be two examples of latent variables. Observed variables, also known as manifest variables, are those that can be observed and directly measured. Corsi, Fenwick, and all of the other fancy stats you see out there are manifest variables. Each of these types of variables may be used in quantitative analyses. The difference is that latent variables come into play when complex social factors are involved. Hockey is very much a social phenomenon on every discernible level, as players interact with each other, with coaches, with the outside world, and even with their inner motivations and drives.
If I had to take a guess (and this is only a guess) I believe that the analytics movement may have been started by individuals trained in the natural sciences. This matters, because the natural sciences do not tend to deal with latent variables, while social sciences do. For example, lets say an expensive car and a inexpensive beater weigh the same. When they drive over a bridge the bridge does not care about the cost of the respective cars. The only thing that matters is the load bearing qualities of the bridge materials and the design. However, the experience of driving those two cars would be completely different. That experience could not be measured directly. Instead, to try to get at this variable called “driving experience,” a researcher would have to look at a host of factors such as comfort, safety, fun, etc. By looking at all of those variable that tell a partial story in unison it is possible to uncover this abstract phenomenon and even use it in an analysis (e.g. does driving experience impact on the likelihood that a person will buy a particular car?).
Leadership and Determination
Bar none, the absolute worst argument I have seen in hockey discussions is “if leadership and determination are real, why don’t they show up in the numbers? Haha intangibles suck.” This shows an almost criminal lack of knowledge. There are, without exaggeration, thousands of quantitative studies that measure leadership in general, types of leadership styles, the impact of leadership upon outcomes, etc. Honestly, from an intellectual point of view arguing that leadership is not a real thing is similar to arguing that the Earth is flat. Furthermore, this mountain of peer-reviewed quantitative research spans across pretty much all of the disciplines. In all honesty, the question really is not whether leadership is a real thing in hockey. Instead, the question should be turned around to read: “why do you believe that hockey is the only social activity where leadership does not come into play in important ways?”
Moving beyond academic research, common sense should also come into play at some point. Imagine a kid waking up at 5 on weekends to play hockey. He competes from a young age, and hangs out less with kids at school because he is practicing and working out. He tries to eat as healthy as possible, and skips the junk food quite a bit, but not completely because he is still just a kid. As his peers fall away, he keeps plugging away and chasing his dream. His parents are walking zombies from the early mornings, and he feels bad about it, but he refuses to let the dream die. He hears increasingly harsh criticism from coaches, and gets snide remarks from peers, as he gets older and plays at increasingly high levels. As a junior he is forced to live away from his parents, and has to get education through tutors while he is on the road. When he finally makes it to the big leagues he knows there are many players out there who are more naturally skilled. He makes his way by continually working harder than the next guy. But oops, his RelCorsi is .47 so he kind of sucks, the more skilled guy who doesn’t give a shit has a RelCorsi of .52 so he must be better. Oh and by the way, determination does not exist. This is the exact point where criticism from people who play high level sports about “geek with a pocket calculator sitting in front of a computer” should hit home (and probably sting a bit), but in reality those criticisms are simply dismissed.
So Where are the Numbers?
I went off on a bit of a tangent to rant a bit, which doesn’t make me feel great because I promised myself I would avoid doing a lot of ranting in this blog. Onwards and upwards.
The question left behind is: if leadership and determination are real, why don’t they come out in the numbers? My answer is the impact of leadership and determination are already there in two spots (three if you count being a mediator, which I mentioned above). First off, current fancy stats that are commonly used are only ok at predicting wins and losses. In his MA thesis on forecasting success in NHL games, Josh Weissbrock wrote:
We found it interesting that regardless of which features we used in our model, we were not able to increase the accuracy much higher than 60%. We compared the observes win% of teams in the NHL to many simulated leagues and found that there appears to be a theoretical upper bound of approximately 62% for single game prediction in the NHL.
The 60% success rate for regular season games sounds pretty good until you realize that a random coin toss will correctly predict wins and losses 50% of the time. (Just a note so I do not misrepresent his work, he points out that predict success rates in playoff series is much better at about 75%.) If we start off with a random chance of predicting wins and losses that is set at 50/50, and currently analytics provides information to lift the prediction rate to about 60%, what is happening with the other 40%? Random luck makes up a part of this, to be sure, but there is clearly quite a bit missing in the current analysis. Perhaps leadership and determination make up a part of that missing 40%. The second possibility is that they already play into the current fancy stats. For example, a guy with great determination may have trained harder and honed his skills more, resulting in better possession numbers than he would have had if his level of determination was only average. To put it into stats terms, determination may mediate the relationship between skill and possession.
Absence of Evidence is not Evidence of Absence
As hockey fans, we are not privy to leadership qualities outside of what the media feeds us. We do not see what happens in the locker room, or how hard each player trains, or any other wonderful behind the scenes stuff that is a part of being a professional hockey player. With no access to this type of data, the analytics community has chosen to focus on the numbers they have. I have no beef with this at all. However, just because the data is not accessible does not mean it does not exist. And it absolutely does not mean that the variable the missing data measures does not come into play in important ways during the season. Over the past few weeks I think I have typed out this old maxim from law several dozen times because it applies here: “absence of evidence is not evidence of absence.” To use an example, we cannot measure player speed accurately throughout a game right now. We cannot reason that player speed does not exist because it is “intangible” or “immeasurable.” Instead, we just have to defer it until we get the right data. When SportVu comes around, speed will quickly become a part of everyone’s analysis. To add a last example, possession did not magically come into being the day the analytics crowd started to measure it. These phenomena are exist, and have a real impact, regardless of whether they are measured.
Leadership, determination, and other “intangibles” identified as important by people who have been around the game for a long time are actually real things that lead to real, and measureable, results. We just don’t have access to the right data at this point in time.
Coming up Next:
My next blog entry will be about regression analysis. Specifically, it will deal with what it is, what it does, and (most importantly) the assumptions you cannot violate if you want to end up with a meaningful answer.
The third entry in this series will focus on error terms, what they mean, and the implications for hockey analytics.
Thanks for reading!