Let’s do a problem together.
Problem. There are 100 pieces of string placed in front of you. You randomly grab two free ends and tie them together. You continue to do this until there are no free ends left. Now, there are just loops. How many loops do you expect to have in front of you?
I’ve said it before and I’ll say it again: scale your problems down. So let’s come up with an easier version of the same problem.
In this case, the obvious way to scale down is to work with fewer pieces of string.
So let’s consider the same problem but for one piece of string. This is trivial as there are only two free ends (the two different ends of the same string) and so the only thing you can do in this case is to tie these two ends together to make a single loop.
Therefore, the answer to our problem for the simple case of one string is one loop.
Being mathsy folk, we should create some notation to help with our record-keeping. Let’s write to mean the expected number of loops in the -string scenario.
So far, we have worked out that . The original problem is asking us to determine the value of .
The one-string scenario was not particularly illuminating, so let’s step it up.
What about if we have two pieces of string? Here, we can follow one of two paths:
- You grab two ends of the same piece of string and then tie them together. This makes a loop. The rest of this scenario is locked in as you have to grab the remaining two free ends of the other piece of string and tie them together to make a loop. This gives you two loops.
- You grab two free ends, each belonging to a different piece of string. You tie these ends together and this leaves you with a single long piece of string. Then your next and only move is to tie the ends of this long piece of string together. This gives you one loop.
There are two paths to follow in the two-string experiment and they each give a different result.
To calculate the expected value , we need to multiply the probability of each path with the result of that path. We could write something like this:
In the above equation, means the probability that we go down path 1, which is where we start by grabbing two ends of the same string. We use to denote the number of loops we end up with when we go down this path (we already know that .)
Similiarly, and denote the probability and the number of resulting loops when we go down the second path (we already know that .
Really, we just need to work out the probabilities and then we can calculate our expected value of loops for the two-string case.
What is the probability , that is, what is the probability that we grab two ends of the same piece of string?
I like to live my maths problems, so I’m going to imagine that my left hand has grabbed the end of a piece of string. I want to know the probability that my right hand grabs the other end of that same piece of string. There is a chance of this happening, as there are three free ends remaining and only one of those is the free end on the same piece of string. Therefore, .
Immediately, we also know that . And so we have that
The two-string problem is now solved!
At this point, it might feel like we are grasping at straws (or strings!) to find an idea that will help us solve the 100-string problem.
But it is there. Think about the two paths we could follow when we were sitting there with two free pieces of string. After one step, we would either have one long piece of string, or a loop and a piece of string.
This idea scales quite well to the case with 100 pieces of string.
Consider this. You are sitting there with 100 pieces of string. Just like in the two-string experiment, one of two things can happen when you tie two ends together.
- You grab two ends of the same piece of string and then tie them together. This makes a loop. Now you have one loop and 99 pieces of string.
- You grab two free ends, each belonging to a different piece of string. You tie these ends together to make a longer piece of string. Now you have 99 pieces of string.
Something smells like recursion. From looking at the above two paths, it’s clear that a solution to the 99 string problem will yield a solution to the 100 string problem.
As there are two paths we can go down, again we may write something like this:
Just like before, means the probability that we go down path and we use to denote the number of loops we end up with when we go down this path.
Note that for path 1, we end up with 1 loop plus 99 pieces of string. Therefore, we have that .
For path 2, we end up with 99 pieces of string, and so we have that .
We can calculate and in a similiar way to the 2-string problem. You can double check my working, but I get that and .
We now have all the information we need to substitute this into the equation
A smidge of expansion gives us that
Now, we need to find . It’s looking pretty likely that we can express in terms of . Indeed, if you go through all the working yourself, you will surely find that
We can throw the previous two equations into the oven together to get:
At this point, you are likely seeing the problem open up before your own eyes. You might even guess that
and certainly you would be correct.
You can do some mathematical induction, or you can just live on the wild side like me and jump to the bit where
Fortunately, we have already figured out , and so we have that
or, in shorthand:
So we have our answer in the form of a sum.
You can punch this into a computer or (if you think all maths should be done by hand) you can use the integral
as an approximation. The integral gives you an answer of 2.65 but the real answer from the sum is actually 3.28.
In any case, it’s quite a small number of expected loops!
Lovely problem, I certainly wouldn’t have guessed harmonic sums had much to do with it. I would like to point out that there is a lovely little unstated use of the tower rule of expectation somewhere in here, in a form I don’t think I’ve ever seen it before. Bonus points if you can spot it.
I will level with you. I had to Google the tower rule of expectation. When I saw the wiki page and definition I was like “oh it’s that thing that I just assume to be true”. Proof is very quick though, once you write everything as sums.
Yep, like most “laws” in probability and statistics, they’re true because they damn well should be, and the ‘should be’ usually pops out of a few basic properties of sums/integrals/measures(take your pick given the sample space at hand).