German math problem

Mood: geeky

Posted on 2010-11-20 23:39:00

Tags: math

Words: 992

As I mentioned during the Germany trip recap, visiting Allianz Arena inspired a math problem. Here it is!

Allianz Arena is home to two German soccer teams - Bayern München (hereafter "B") and TSV 1860 München (hereafter "T"). Let's assume there are 2n weeks in the season, and one game per week, and half a team's games are at home and half on the road. Presumably B and T would have to work out who gets to play at home which week, but let's say they didn't and just randomly picked which weeks they play where. What is the expected value of the number of "conflict weeks" - i.e. both B and T are scheduled to play at home?

My first step was to try this out with n=1. Then you get the following possibilities (here I just show just the weeks the team plays at home):

Week 1 | TB | T | B | |
---|---|---|---|---|

Week 2 | B | T | TB | |

# conflicts | 1 | 0 | 0 | 1 |

So the expected value of the number of conflicts is

Now, I took a guess at what the expected value was in general. As a rough guess, each team has to choose n weeks, so you might expected about half of them to overlap. But it seemed to me that once you do have a conflict week, the chances of another one would go down (since you've "used up" one game from both teams), so I thought it might be a bit less than n/2.

So - onwards to discovering the answer! It seemed fairly easy to write this out as a recurrence relation with the following parameters:

c(w,b,t) = expected number of conflict weeks if there are w weeks in the season, b home games for B left, t home games for T left.

Base cases: c(w,b,0) = c(w,0,t) = 0 [no games left for one team means no conflicts]

c(w,w,t) = t and c(w,b,w) = b [if one team has to play all their games in the remaining weeks, however many the other team has left is the number of conflicts]

And we write the induction step by looking at the first week:

c(w,b,t) for 0 < b,t < w is:

(w-b)/w * (w-t)/w * c(w-1,b,t) | [chance that b and t don't play a game this week times what happens for the rest of the season] |

+ b/w * (w-t)/w * c(w-1,b-1,t) | [chance b plays a game this week and t doesn't] |

+ (w-b)/w * t/w * c(w-1,b,t-1) | [chance t plays a game this week and b doesn't] |

+ b/w * t/w * (1 + c(w-1,b-1,t-1)) | [chance both play a game this week, so the expected value goes up by 1] |

This looked correct, but was a little discouraging because it seemed very not obvious what an explicit formula would be. But it was easy enough to code a quick Python script to get a hint to what an answer would be.

The results were highly suggestive: c(2,1,1) = .5 (this is the case we worked out by hand above), c(4,2,2) = 1, c(6,3,3) = 1.5, c(8,4,4) = 2, c(10,5,5) = 2.5, etc. So it sure looked like

But I really wanted to prove it, but dealing with this kinda ugly recurrence relation didn't seem like the way to go. Also, this seemed like a combinatorial problem and it seemed like there should be a nice combinatorial way to express c(2n,n,n).

After some thought I came up with a good way of thinking about it. Let's say without loss of generality that B's home games are the first n of the season. Then you're just choosing T's home games and seeing how many overlap with the first n. This leads fairly naturally to the explicit formula:

Here i represents the number of conflicts, and the chance of getting that many conflicts is the same as choosing i games in the first n weeks of the season (which will be conflicts), and the remaining n-i in the second n weeks of the season. In total, you're choosing n weeks out of 2n, which is where we get the denominator from.

This didn't seem like a huge improvement at first, but we can actually simplify this a lot. First of all, (n choose i) = (n choose (n-i)), so we get

c(2n,n,n) = (sum_{i=0}^n [(n choose i)^2 * i])/(2n choose n)

Now, notice that (sum_{i=0}^n [(n choose i)^2]) = (2n choose n)). The right side is the number of ways of counting choosing n objects out of 2n choices, and the left side is the same - first you choose which i are in the first half, and then which (n-i) are in the second half (since (n choose i)=(n choose (n-i)). So this will help out in a minute.

Now let's consider two cases:

Let's just look at the numerator of c(2n,n,n) and split it up in half to get

sum_{i=0}^k [(n choose i)^2 * i] + sum_{j=k+1}^n [(n choose j)^2 * j]

Here we can pair up i's and j's that sum to n - since 0+n=n and k+(k+1)=2k+1=n, this covers all of them. Since i+j=n, then (n choose i)=(n choose j), so we'll end up with

sum_{i=0}^k [(n choose i)^2 * i + (n choose i)^2 * (n-i)], or

sum_{i=0}^k [(n choose i)^2 * n]

Now we can unsplit this back into the i's and j's, giving n/2 to each side to get

sum_{i=0}^k [(n choose i)^2 * (n/2)] + sum_{j=k+1}^n [(n choose j)^2 * (n/2)]

which simplifies down to

sum_{i=0}^n [(n choose i)^2 * (n/2)]

(n/2) * sum_{i=0}^n [(n choose i)^2]

Since we showed above that this sum is equal to (2n choose n), this means that

This is the same as the previous case, except we have the additional value i=k which doesn't pair up with anything. But, the value that we're summing for i=k is ((n choose k)^2 * k) and k does equal n/2, so it reduces the same way!

In short, I love combinatorics!

2 comments

Comment from mathjoker:

2010-11-22T20:25:39+00:00

I spent a couple of days thinking that there must surely be an easier way to do this problem. I think I've found it.

By symmetry, the expected number of home conflicts is equal to the expected number of "road conflicts" (weeks in which both teams choose to play on the road). So it suffices to find the expected number of weeks in which both teams will choose the same thing (both home, or both road).

Suppose that each team represents its choices as a string of H's and A's (for Home and Away). I'll use n = 4 as an example.

B: HAAHAHHA

T: AAAHHHHA

In this case, there is a total of 6 conflicts. But there is a complementary, and equally likely, outcome in which T's choices lead to 8 - 6 conflicts:

B: HAAHAHHA

T: HHHAAAAH

More generally, if we fix B's choices, for any choice T makes that leads to k conflicts, there is a complementary choice that T can make that will lead to 2n - k conflicts. So the average over each pair of complementary choices is n conflicts, which means that the average over all pairs is n conflicts.

So the expected number of home conflicts is one-half that, or n/2.

Comment from gregstoll:

2010-11-22T23:46:34+00:00

Ah! I think the simplified algebra is doing essentially the same thing (pairing up schedules with i conflicts with those with n-i conflicts), but this is a much nicer combinatorial way of thinking about it. Thanks!

This backup was done by LJBackup.