Metrics are created to measure and evaluate the performance, progress, and success of a particular process or system. Metrics can provide valuable insights into different aspects of the process or system being measured, including its strengths, weaknesses, and in the case of athletes, opportunities for improvement. In the last DS4S post, we explored how we can deconstruct a metric to evaluate its accuracy and relevance. In this post, we will explore the process of formulating new metrics from a theoretical and analytical perspective.
There are three main approaches to constructing new metrics in sporting applications:
Simple aggregations
Deductive modelling
Inductive modelling.
The simplest approach to creating a metric is to aggregate events or outcomes across different time periods. Examples include Goals Scored and Points Per Game, these metrics can be useful for assessing the offensive performance of individual players or teams and making absolute comparisons to others but they often fail to capture important contextual detail and performance dynamics so they tend to be augmented or combined with other more detailed metrics.
These more detailed metrics are typically constructed using approaches 2 and 3. Deductive models, also known as top-down models, start with a set of general principles and then use logic or foundational knowledge to derive specific conclusions or predictions. These models begin with a theory or hypothesis and then use deductive reasoning and experimentation to test and validate the theory. A good example of this in sports science is the Banister Dose-Response model. This model was formulated to quantify the effects of physical training on changes in athletic performance working under the hypothesis that an athlete's response to a training stimulus (or lack of one) could be grossly modelled as a simple first-order differential equation:
where:
dp(t)/dt is the rate of change of p(t) with respect to time t, 1/τ is a constant term that represents the decay rate of p(t), where τ is a positive constant known as the time constant, p(t) is the changes in athlete performance over time. And w(t) is a training stimulus that drives the behaviour of p(t). The model is not currently used in this form but the general hypothesis has been held to this day and the model is still prominent (check out our paper for details on how we improved the Banister Dose-Response model if you are interested in this topic).
Inductive models, also known as bottom-up models, start with some set of specific observations or empirical data and use statistical learning methods to derive general principles or predictions. These models use inductive reasoning to identify patterns and trends in data. An example of inductive modelling in sports analytics can be seen in the development of the expected goals metric (xG). This metric was developed by training a model to predict the likelihood of a shot resulting in a goal, the model primarily uses information derived from spatial coordinates as its inputs and the likelihood of a score from zero to one is the model's output.
The main difference between inductive and deductive models is the direction of reasoning. Deductive models start with a general principle or theory and use logic to derive specific predictions, while inductive models start with specific observations and use statistical learning methods to derive general principles or predictions. In addition, deductive models are often more precise and accurate because they tend to be rooted in strong prior knowledge, while inductive models can be more flexible in their ability to capture complex relationships in the data.
Overall, the choice between inductive and deductive modelling approaches to construct new metrics in sporting applications depends on the specific use case, the available data, and the goals of the analysis. Both approaches have their strengths and limitations and can be used effectively in different contexts.
MC
Hi Mark, nice article! Really enjoy it. Thanks for sharing. I like to hear your take on the application of the bannister dose-response model for team sports.