The testing of one or more hypotheses is at the foundation of the scientific method. The general idea is that the researcher starts with a theory about a given phenomenon, and then goes about testing the theory by collecting and analyzing objective data relating to that phenomenon. Conclusions are not considered valid until they can be reproduced by others, and published research goes through a brutal process of validation and criticism from the scientific community. When does not work is tossed aside in favour of new theories. A standard of excellence is established.
I don’t want to make it sound as if the letter of the scientific method is followed in all scientific research. In reality, the role of theory is less pronounced in some fields than others. For example, criminology evolved as a theory-driven empirical discipline. In the past few decades a branch of criminology (particularly within criminal justice) has largely discarded theory in favour of systemic measures (i.e. who is being processed for what crime, processing times, and other systemic efficiency measures). Epidemiology is another example of a field that has largely become atheoretical. Instead of testing established theories, epidemiologists often measure prevalence of certain conditions and attempt to correlate those measures with demographic variables.
Theory and Hockey Analysis
How does theory fit in with contemporary hockey analytics? There is no easy answer to this question. I would argue that a current of basic theory is at the heart of quite a bit of the foundational part of the analytics movement. For example, an overt form of theory testing was used when Corsi was put forward as a useful predictor. The theory that goals were the most important stat for predicting wins was tested using Corsi as a competing theory, and it was found that Corsi was a more accurate predictor of wins and losses over a long period of time. The keys were that theory was tested in a way that could be reproduced by others, and an objective analysis of the data illustrated that an alternative theory was, in fact, more compelling.
Once the groundwork was laid out systematic theory-testing was often discarded in favour of cross sectional snapshots of data. For example, after a game it is common to read Corsi scores or other such measures for individual players. This process of collecting random bits of information and sorting it into systematic patterns is sometimes referred to as “data mining”. While such analyses may set out to answer simplistic questions such as “is Player X a good hockey player?”, and set an arbitrary measure for good players (e.g. a good player has a Corsi > x value), it is a stretch to include such analyses under the rubric of theory testing. The reason is that theory is not at the core of the analysis. Instead, numbers are crunched and then an ad hoc interpretation of those numbers is presented to the reader. It is fine, and hockey fans gobble it up, but it has serious limitations.
Why Bother with Theory?
At this point it is reasonable to question why we should bother with theory at all. If the numbers generated by bloggers on an ad hoc basis show who is playing well and who is not at a given moment, then why go through a larger process of theory development and testing? My view is that leaving out theory takes away both the potential to expand analytics, and to use certain type of analyses. Pedhazer (1982) , who is one of the foremost authorities in quantitative regression models and structural equation modeling (SEM), argues that:
It is the ideas about the data that count, it is they that provide the cement, the integration…No amount of fancy statistical acrobatics will undo the harm that may result by using an ill-conceived theory, or a caricature of a theory.
These very strong words reflect the very real fact that theory guides important decisions that are critical in many higher level forms of statistical analysis.
When you are running a regression analysis, for example, selecting a dependent variable (DV) and running every bit of data against it as independent variables (IVs) is considered shoddy and poor. The reason why such “data dredging” is frowned upon is that statistical correlations sometimes exist between variables that are not really related to each other. For example, a very good professor I once had told the story of when he was in grad school. Mainframe computers were in their infancy, and you had to book time to use the lone centralized computer that was housed on campus. He and his buddies decided to pluck random data from an encyclopedia and feed it via punchcards into the computer, which was set to run a correlation matrix (i.e. every variable was run against every other variable entered). The analysis was left to run, and because computers were so slow it took the entire weekend at an estimated cost (mostly electrical and tubing) that ran into thousands of dollars. The results showed that the most highly correlated variables, which ran at close to a perfect 1:1 relationship, were crime rates in Alberta and the expansion and contraction of a bridge in a large Chinese city. Even though the correlation was over .95, the variables clearly have nothing to do with each other. One of many clear advantages of running theory-testing models is that they avoid gibberish results such as this.
What Types of Analysis are Possible?
Let’s say we want to test whether old school measures of leadership are more closely associated with wins than new fancy stats that focus on measures of skill. (This of course assumes we can get data regarding leadership, which is individually collected by teams.) The following is completely hypothetical now that just shows what it looks like. I will pull all of this apart and re-build it for real over the next series of blog entries.
We can put together a latent growth curve model using 5 years of data that sorts through how closely associated winning is with “leadership” (grit, determination, points, defensive zone faceoff wins, and penalty minutes) and with being what analytics people refer to as “good at hockey” (corsi, shooting percentage, o-zone faceoff wins, PDO, and points per 60 minutes. Wins and losses become a time invariant predictor. “Leadership” and “Good at Hockey” are latent variables, which means they are measured indirectly through a series of other variables that are directly measured. Latent variables always have an error term attached to them, because measurement is never perfect and different types of error enter into the equation. The error term is outside of the latent variable, which leaves the latent variable error free. Covariances are shown using curved lines connecting the error terms. These indicate measures where being a good at hockey and leadership are correlated (we can test these and take out the ones that are not associated later on).
This model is for illustration only. If it was real, and we plugged in the appropriate data and did all of the necessary checks to ensure everything is cleaned and meets established criteria, we could test which of analytics versus old school measures is more closely associated (note linearity cannot be tested with SEM) with wins and losses, we can see if the accuracy levels are static or change over time, and we can check how well individual measures perform within the model. It is very cool stuff.
The Role of Theory
A structural equation model, like the one shown above, is driven by theory. Theory helps us select appropriate measures that are then tested using a confirmatory factor analysis. When things go weird with the model, and they almost always do, theory can help us to determine whether to add extra covariances between specific error terms. Keep in mind that this is only one type of advanced statistical procedure among many. Adopting a theory testing paradigm opens up a lot of doors to analyses that could really help our understanding of the game of hockey through use of numbers.
My next few blog entries will try to develop a more sound model. Once I am happy with my model I’ll collect the data I need and see how well it all works. Comments and criticisms are essential to this type of endeavor, so please feel free to knock what I am doing at any stage of this process.