Posts with mood curious and mathy (1)

baseball series probabilities

Mood: curious and mathy

Posted on 2006-06-30 09:36:00

Tags: baseball math

Words: 790

So when Rice was in the NCAA baseball tournament and College World Series, I thought about how one baseball game or even a three-game series weren't really enough to determine which team was better, which is why MLB playoff series are 5 or 7 games, which certainly seems enough to tell for sure which team is better. Trying to figure out how true that was, I modeled it mathematically:

So, let's be formal here: the Astros are playing the Blue Jays, and the Astros have an *x* chance of beating the Blue Jays in any given game. (for 0 <= *x* <= 1) Assume that we have no prior knowledge of *x* - i.e. the probability of *x* is evenly distributed from 0 to 1. Given that the Astros beat the Blue Jays, what is the probability that *x* >= .5? (that is, the Astros are "better" than the Blue Jays)

Since the probability that the Astros beat the Blue Jays in one game is simply *x*, we're looking at the area under the curve *y* = *x*, which is of course the integral of *x*. So the probability that *x* >= .5 is just (the integral of *x* from .5 to 1)/(the integral of *x* from 0 to 1) = (1/2 - 1/8)/(1/2) = 3/4.

(hmm, this would look a lot nicer in LaTeX)

The nice thing about this method is that it easily generalizes - let's say that the Astros beat the Blue Jays two times. Since the probability that the Astros win both games is *x^2*, the probability that *x* >= .5 is (the integral of *x^2* from .5 to 1)/(the integral of *x^2* from 0 to 1) = (1/3 - 1/24)/(1/3) = 7/8.

And let's say the Astros beat the Blue Jays in a best out of 3 series. The probability that the Astros win the series is *x^2* + *2*(1-x)*x^2* = *3*x^2* - *2*x^3*, so the probability *x* >= .5 is (1/2 - 3/32)/(1/2) = 13/16, which is .8125. (which is less than 7/8, which smells wrong to me, but sort of makes sense because a best of 3 series always chooses a winner, while "winning 2 games" doesn't. Feel free to start checking my math at this point, though :-) )

If the Astros beat the Blue Jays in a best out of 5 series, the probability that the Astros win the series is *x^3* + *3*(1-x)*x^3* + *6*(1-x)^2*x^3* = *10*x^3* - *15*x^4* + *6*x^5*. So the probability that *x* >= .5 is (1/2 - 5/64)/(1/2) = 27/32, which is about .844.

Finally, for a best out of 7 series, the probability that the Astros win the series is *x^4* + *4*(1-x)*x^4* + *10*(1-x)^2*x^4* + *20*(1-x)^3*x^4* = *35*x^4* - *84*x^5* + *70*x^6* - *20*x^7*. So the probability that *x* >= .5 is (1/2 - 35/512)/(1/2) = 221/256, which is about .863.

# games in series | Probability winner is "better" team |
---|---|

1 | .75 |

3 | .8125 |

5 | .844 |

7 | .863 |

So, this is all well and good, but the results seems a little unrealistic - I find it hard to believe that the best team wins 3 out of 4 times in just a single game. Let's try to remove some of the simplifying assumptions.

In the real world, the Astros beating the Blue Jays 100% of the time is just not going to happen. If we look at the final MLB standings from 2005, no team had a winning percentage below .3 or above .7, so let's try using .3 <=

# games in series | Probability winner is "better" team |
---|---|

1 | .6 |

3 | .646 |

5 | .678 |

7 | .702 |

These probabilities seems a bit more realistic.

Finally, we've been assuming that

# games in series | Probability winner is "better" team |
---|---|

1 | .567 |

3 | .598 |

5 | .621 |

7 | .639 |

So there you have it - under this model, even a 7 game series will only pick the better team 64% of the time. The probability function may have been a little too harsh here, so the 70% in table 2 might be a better guideline.

To make this more accurate, we should recognize that even if the Astros have an

Thanks for reading this far! Comments (especially pointing at mistakes) are most welcome.

This backup was done by LJBackup.