FanGraphs Prep: Is Context King?
Overview: A short unit centered on understanding the difference between context-neutral stats and context-specific stats. Both tell us very different things about what happens on the field. What’s the difference between them and how do we use them?
- Identify and apply a run-expectancy matrix.
- Explain the difference between context-specific and context-neutral statistics.
- Evaluate which type of statistic to use in a given situation.
Target Grade-Level: 9-10
Daily Activities:
Day 1
At the end of 2019, Pete Alonso led all of baseball with 53 home runs. But all those home runs weren’t created equally. Thirty-one of them came with no runners on, while the remaining 22 were hit with at least one runner on base. Should those two- and three-run home runs count for more than all those solo shots? That’s the question at the center of our lesson today: Should we take the game context into account when evaluating players? Not to spoil anything, but the answer is both yes and no.
Before we dive into that question in earnest, we need to define what we mean by context. Every at-bat during a game has a different context based on how many outs there are in the inning and how many runners are on base. This is called the base-out state, and there are 24 different states (three out states multiplied by eight base states).
Using the base-out state, we can calculate the run expectancy for each of these states. In other words, how many runs can a team expect to score on average in any given base-out state. For any given base-out state, such as a runner on first with none out, we’ll take the number of runs scored from that state through the end of the inning in which the state first occurred. We’ll total up the runs scored in a season (or multiple seasons) then divide by the number of instances of that state. That’s a lot of math that we won’t ask you to do yourself. Instead, we’ll provide this basic run expectancy matrix.
Runners | 0 outs | 1 out | 2 outs |
---|---|---|---|
__ __ __ | 0.481 | 0.254 | 0.098 |
1B __ __ | 0.859 | 0.509 | 0.224 |
__ 2B __ | 1.100 | 0.664 | 0.319 |
1B 2B __ | 1.437 | 0.884 | 0.429 |
__ __ 3B | 1.350 | 0.950 | 0.353 |
1B __ 3B | 1.784 | 1.130 | 0.478 |
__ 2B 3B | 1.964 | 1.376 | 0.580 |
1B 2B 3B | 2.292 | 1.541 | 0.752 |
With a runner on first and none out, there’s a run expectancy of .859. Let’s say in the next at-bat, the batter hits a single and the runner advances to third. The run expectancy now changes to 1.784 — the batter generated .925 (1.784 – .859 = .925) in run expectancy with his at-bat. This change in run expectancy from at-bat to at-bat provides us with a stat called RE24, which can be shown as a formula:
RE24 = run expectancy end state – run expectancy beginning state + runs scored.
Given the following events in our example inning, try finding the new base-out state and RE24 for each batter below.
- Sacrifice fly, runner scores from third.
- Double, runner advances to third.
- Walk.
- Fielder’s choice, runner scores from third, runner advances to third, runner out at second.
- Flyout.
Day 2
The run expectancy matrix isn’t only used to calculate RE24, as it’s also the foundation for calculating wOBA. Rather than using run expectancy to determine how valuable single events are in a game, wOBA uses linear weights to estimate the average run value of a walk, hit, home run, etc. To do this, we take the run expectancy value of all home runs, for example, and divide by the total number of home runs in a season. That gives us a generalized value of a home run hit in any given situation.
Generally, linear weights for the most common batting events look something like this:
Event | Linear Weight |
---|---|
Walk | 0.29 |
HBP | 0.31 |
Single | 0.44 |
Double | 0.74 |
Triple | 1.01 |
Home Run | 1.39 |
This tells us that on average a single is worth around 0.44 runs and a home run is worth 1.39 runs. These linear weights use run expectancy like RE24, but instead of relying on the game context, they remove the context from the equation. If we take a batter’s total walks, HBP, singles, doubles, triples, and home runs and multiply them by their respective linear weight and sum it all up, we’d get the batter’s weighted runs above average (wRAA). wRAA is related to wOBA. wRAA is a context-neutral statistic while RE24 is a context-specific statistic. They are both useful but they tell us very different things.
Here is a play log for Pete Alonso showing all of his home runs in 2019. RE24 is on the far right. Try calculating his total RE24 he accumulated by belting 53 home runs. Now calculate his total wRAA he accumulated with those homers. What’s the difference? What does this tell us about the total value of his home runs? How would you explain a player whose total RE24 was much higher than his total wRAA? What if it was much lower?
Day 3
Here’s another way to look at the relationship between context-neutral and context-specific stats. When evaluating a batter’s contribution, should he get credit for the actions of his teammates, or do we try to isolate his individual contribution? If Pete Alonso comes up with two men on base and hits a home run, a context-specific stat will give him the credit for the actions of those two baserunners. He didn’t have any control over whether or not they would reach base before him, but he took advantage of the situation and drove them in.
But if we wanted to simply find Alonso’s individual contribution to the team, we’d have to ignore the actions of his teammates. We’d remove the context. Then each of his home runs would count the same towards his overall contribution.
Why would you want to use a context-specific stat? What would it tell you that a context-neutral stat wouldn’t? What are some of the limitations of a context-specific stat?
Why would you want to use a context-neutral stat? What would it tell you that a context-specific stat wouldn’t? What are some of the limitations of a context-neutral stat?
When we’re using different statistics to evaluate a player, we should remember that every statistic tells a different story. Context-specific statistics do an excellent job of telling us what happened in the past but do a very poor job of telling us how a player might perform in the future. Context-neutral statistics can help us evaluate a player without the influence of his teammates but don’t really satisfy our desire to know how a player did when he had a chance to drive in some runs.
For the following situations, explain why a context-specific or a context-neutral statistic would be more beneficial to use:
- Comparing the performance of two players.
- Evaluating a player’s performance in high-leverage situations.
- Creating a historical leaderboard.
- Showing which player contributed the most to a team win.
Jake Mailhot is a contributor to FanGraphs. A long-suffering Mariners fan, he also writes about them for Lookout Landing. Follow him on Twitter @jakemailhot.