baseball series probabilities

Mood: curious and mathy

Posted on 2006-06-30 09:36:00

Tags: baseball math

Words: 790

So when Rice was in the NCAA baseball tournament and College World Series, I thought about how one baseball game or even a three-game series weren't really enough to determine which team was better, which is why MLB playoff series are 5 or 7 games, which certainly seems enough to tell for sure which team is better. Trying to figure out how true that was, I modeled it mathematically:

So, let's be formal here: the Astros are playing the Blue Jays, and the Astros have an *x* chance of beating the Blue Jays in any given game. (for 0 <= *x* <= 1) Assume that we have no prior knowledge of *x* - i.e. the probability of *x* is evenly distributed from 0 to 1. Given that the Astros beat the Blue Jays, what is the probability that *x* >= .5? (that is, the Astros are "better" than the Blue Jays)

Since the probability that the Astros beat the Blue Jays in one game is simply *x*, we're looking at the area under the curve *y* = *x*, which is of course the integral of *x*. So the probability that *x* >= .5 is just (the integral of *x* from .5 to 1)/(the integral of *x* from 0 to 1) = (1/2 - 1/8)/(1/2) = 3/4.

(hmm, this would look a lot nicer in LaTeX)

The nice thing about this method is that it easily generalizes - let's say that the Astros beat the Blue Jays two times. Since the probability that the Astros win both games is *x^2*, the probability that *x* >= .5 is (the integral of *x^2* from .5 to 1)/(the integral of *x^2* from 0 to 1) = (1/3 - 1/24)/(1/3) = 7/8.

And let's say the Astros beat the Blue Jays in a best out of 3 series. The probability that the Astros win the series is *x^2* + *2*(1-x)*x^2* = *3*x^2* - *2*x^3*, so the probability *x* >= .5 is (1/2 - 3/32)/(1/2) = 13/16, which is .8125. (which is less than 7/8, which smells wrong to me, but sort of makes sense because a best of 3 series always chooses a winner, while "winning 2 games" doesn't. Feel free to start checking my math at this point, though :-) )

If the Astros beat the Blue Jays in a best out of 5 series, the probability that the Astros win the series is *x^3* + *3*(1-x)*x^3* + *6*(1-x)^2*x^3* = *10*x^3* - *15*x^4* + *6*x^5*. So the probability that *x* >= .5 is (1/2 - 5/64)/(1/2) = 27/32, which is about .844.

Finally, for a best out of 7 series, the probability that the Astros win the series is *x^4* + *4*(1-x)*x^4* + *10*(1-x)^2*x^4* + *20*(1-x)^3*x^4* = *35*x^4* - *84*x^5* + *70*x^6* - *20*x^7*. So the probability that *x* >= .5 is (1/2 - 35/512)/(1/2) = 221/256, which is about .863.

# games in series | Probability winner is "better" team |
---|---|

1 | .75 |

3 | .8125 |

5 | .844 |

7 | .863 |

So, this is all well and good, but the results seems a little unrealistic - I find it hard to believe that the best team wins 3 out of 4 times in just a single game. Let's try to remove some of the simplifying assumptions.

In the real world, the Astros beating the Blue Jays 100% of the time is just not going to happen. If we look at the final MLB standings from 2005, no team had a winning percentage below .3 or above .7, so let's try using .3 <=

# games in series | Probability winner is "better" team |
---|---|

1 | .6 |

3 | .646 |

5 | .678 |

7 | .702 |

These probabilities seems a bit more realistic.

Finally, we've been assuming that

# games in series | Probability winner is "better" team |
---|---|

1 | .567 |

3 | .598 |

5 | .621 |

7 | .639 |

So there you have it - under this model, even a 7 game series will only pick the better team 64% of the time. The probability function may have been a little too harsh here, so the 70% in table 2 might be a better guideline.

To make this more accurate, we should recognize that even if the Astros have an

Thanks for reading this far! Comments (especially pointing at mistakes) are most welcome.

1 comment

Comment from djedi:

2006-06-30T16:21:20+00:00

Good analysis! The thing to consider unsimplifying though is the definition of "better" team. How often is a team better by 1%? Maybe a better definition would include a larger spread in abilities (i.e. significantly better rather than within margin of error).

This backup was done by LJBackup.