FanGraphs Prep: How Many Runs Should Have Scored?
Overview: A short unit centered on understanding the concept of expected runs and sequencing. In one of our earlier lessons, we learned about the relationship between runs and wins. Now, we’ll take that concept a step further and learn about expected runs and how they can tell us more about a team’s true talent.
- Use logic to determine all possible sequences of given events.
- Use algebra to solve multiple equations.
- Identify the effects of event sequencing in baseball.
- Identify and apply the Pythagorean Expectation.
- Explain the relationship between expected runs and wins.
- Explain the uses of the Pythagorean Expectation using different inputs.
Target Grade-Level: 9-10
Daily Activities:
Day 1
In baseball, sequencing is the concept that the order of events on the field have an effect on run scoring results. Sometimes this concept is referred to as cluster luck because teams that cluster hits together appear more “lucky” than teams who don’t. This concept is pretty easy to demonstrate. Say a team collects three singles and one home run in a given inning. The order of those events will lead to very different outcomes. If the team hits the three singles before the home run, it will likely result in four runs. But if the home run is hit first with the three singles following, the likely result is fewer runs, perhaps as few as one.
Based on the following lists of events, determine all the possible sequences in a single inning and the number of runs you would expect to score for each sequence.
- 1 single, 1 double, 1 triple, 1 home run, 3 strikeouts
- 3 singles, 1 home run, 1 ground out, 2 fly outs
- 1 single, 1 double, 2 groundouts, 1 strikeout
- 2 singles, 1 walk, 1 home run, 1 ground out, 1 fly out, 1 strikeout
- 2 triples, 1 walk, 1 fly out, 2 strikeouts
Day 2
In the previous activity, did you find that you had to make some assumptions about how base runners acted once they reached base? For instance, we didn’t take into account stolen bases and we didn’t clarify if those fly outs could be counted as sacrifice flies. What if there had been a base runner on first and the next event was a groundout. Should we assume that the groundout resulted in a double play?
These sorts of assumptions are addressed with the BaseRuns formula. BaseRuns is a context-neutral statistic used to estimate the number of runs a team would be expected to score (or allow) given their underlying performance. The inputs are fairly simple but the formula has a few steps of algebra necessary for us to complete before coming to a final answer. The nice things about BaseRuns is that we can use the same formula to calculate expected runs scored and expected runs allowed. We’ll use the version of the BaseRuns formula used here at FanGraphs, but we’ll ignore the league adjustment for simplicity’s sake.
Here’s the formula:
A = H + BB + HBP – (0.5 * IBB) – HR
B = 1.1 * (1.4 * TB – 0.6 * H – 3 * HR + 0.1 * (BB + HBP – IBB) + 0.9 * (SB – CS – GDP))
C = PA – BB – SF – SH – HBP – H + CS + GDP
D = HR
BaseRuns = ((A * B) / (B + C)) + D
Based on their inputs, can you figure out what the A, B, C, and D terms represent? Are there certain events that aren’t covered by the BaseRuns formula?
Using the Box Scores from yesterday’s games, calculate the expected runs scored and expected runs allowed for each team using the BaseRuns formula (it may be easier to use a spreadsheet where you can enter the data and automatically calculate using a formula).
Which teams scored more runs than expected? Which teams scored fewer runs than expected? Based on the expected runs scored and allowed for each team, would the result of any of the games change? Go back and look at the play logs and see if you can find instances where teams sequenced their events in a beneficial way. What about instances where teams had poor sequencing?
Day 3
Now let’s use what we learned in our previous lesson about the relationship between runs and wins with our new ability to calculate expected runs scored and expected runs allowed. In that previous lesson, we learned that this relationship can be put into an equation: win% = runs scored2 / (runs scored2 + runs allowed2). If we substitute the expected runs scored and expected runs allowed calculated using the BaseRuns formula, we can find an expected win%.
Using the FanGraphs team leaderboards, calculate expected runs scored and expected runs allowed for every team, then use the Pythagorean Expectation to calculate an expected win% for every team.
Now compare your calculated expected win% with the actual win% of each team. Which teams had a higher expected win%? Which teams had a lower expected win%? How would our comparison change if we calculated each team’s Pythagorean Expectation using actual runs scored and runs allowed and compared it to our BaseRuns expected win%?
We now have the ability to calculate two different levels of expected win% to compare to actual win%. What are some of the strengths of using BaseRuns expected win%? What are some of the strengths of using the Pythagorean Expectation using actual runs scored and runs allowed?