## Sunday, 1 May 2016

### The Maths of The Code - Part 1/2

Hello. We've been watching The Code (not this), the interesting new BBC quiz, with he-was-quite-good-on-the-exit-list, Matt "are you" Allwright, and she-wants-to-be-Richard-Osman-well-I-mean-who-on-earth-wouldn't Lesley-Anne "will probably read this" Brewis.
Our thoughts on the show: It's pretty good. But today on Sata we're not going to talk about The Code as Television, but as The Code maths.

(And I promise that's the last time I do that joke.)

A warning: some of the following will be a bit mathsy, and maybe not accessible to those without a bit of intuition for these things. However, I've tried to keep the amount of 'technical' language to a minimum, (readers who have actually studied maths may even find this annoying) and generally present as much of this as possible in a easy-to-follow way for a non-mathsy audience.

My reasons for doing this are twofold: it makes a (hopefully) more interesting read, and it means you are more likely to believe what I am saying (and this is good because what I am saying is true).

There are going to be two parts to this analysis. In the first I'll focus on the small scale; the sets of 3 questions that appear on the Code. My main aim is to produce a formula for the chance of surviving a set, (which we'll need for part 2) and along the way I'll show (surprisingly) that Round 2 is no harder than Round 1. Then in the second, I'll talk about the game as a whole, the lengths of the rounds, and a team's overall chance of victory.

I'm not going to recite the code's rules; one can watch an episode or check the comments here. I'll refer to the sets of 3 questions and 3 answers as "sets", and the time between before getting each number of the code as the 1st, 2nd and 3rd "round".

For the sake of reasonable maths, I have made the following assumptions, some of which may be dubious.

1.      Aside from the fact that "you don't see all the questions", the sets don't get more difficult from round to round. (NB: this is confirmed untrue)
2.      Identifying a correct q/a pair is equally difficult to ruling out an incorrect pair.
3.      For each question, "you either know it or you don't". There are no hunches, if you don't know 2 questions (or 3), you have a 1/2 (or 1/3) chance of picking the right one - there is no advantage to comparing two uncertain questions.
4.      On round 3, there is no advantage to "picking the order" - that is, the contestants are no more likely to find a question which they know first in the running than last.

Under these assumptions, I would like to present the claim that Round 2 is no harder than Round 1 (and round 3 only slightly harder). We start with a wordsy argument to that effect:

If you know the correct question/answer pair, (regardless of what else you know) you will certainly clear a set in R1. In R2 you'll either see the correct pair immediately, or discard a wrong pair and then see it. So in this case R2=R1.
If you don't know the correct pair, but do know that both of the other pairs are wrong, then by elimination you're also certain to clear a set in R1, and again in R2 - you'll discard a wrong pair, then be left with a wrong pair and a mystery pair (which you deduce to be correct).
Suppose you know one of the wrong pairs, but nothing else. In R1 you have a 1/2 chance of guessing correctly, in R2 you might eliminate this wrong pair (if you see it), leaving a 1/2 guess, or you might have a 1/2 guess which pair to eliminate (if you don't see it), but then a safe choice between the remaining two options if you're correct.
Finally, suppose you know nothing about any of the pairs. Then you have a 1/3 chance of guessing correctly in either round. Hence, in all cases R1=R2.
Next, we'll briefly consider the best tactics for Round 3. When faced with the first pair in round 3, if the players have no idea whether it is right or wrong, we say they should reject it and move along. The best argument for this is to say that if the contestants take the answer they have a 1/3 chance of being correct, whereas if they reject it they have 2/3 chance of surviving, then a better-than-1/2 chance of either knowing or guessing the remaining pairs, making a better than 1/3 chance overall. If they find they don't know on the second pair, it then doesn't matter - they have a 1/2 chance whatever.

To justify these two arguments more carefully, consider the following work through of all 24 combinations of question order and knowledge. This will also give us the formula for the chances of getting a set correct in each round which we'll need later on.

Since we might as well (and it lines up with optimal tactics for R3), we'll number the questions in each set 1,2,3 based on the order they are opened, and assume that if contestants are reduced to guessing, they'll always reject the earliest question and accept the latest question that they don't know about.

We use a x to represent a wrong pair, and a tick (ok, squareroot symbol) to represent the right pair. We use brackets (eg: (x)) to mean the contestants know what the pair is, and the lack of brakets to mean "no idea". These six cases are obvious, if the contestants know everything, they always clear the set, and if the know nothing they need to hope they guess correctly.

Cases 7 through 12 cover all the cases where the contestant has partial information, but will still guess correctly. In these cases all 3 rounds look the same, because luck favours the contestant.

In cases 15 and 18, being unable to open the last question means the R3 contestant fails, whereas the R1 and R2 contestants deduce (or guess, in the case of 15) the correct answer. This is the first time R3 looks harder than R1 or R2.
Notice in case 15 that the R2 contestant only survives to see by luck - guessing to reject Q1 rather than Q2. However, the R1 contestant only picks Q2 over Q1 by luck anyway.

In cases 20, 21 and 23 the R3 contestant's tactics cause them to reject the correct pair from the start. However, in 20 and 21 the R1 and R2 contestants guess wrongly anyway - R3 does however see a disadvantage when they would have known both the following pairs were wrong (19).

Notice in all the cases, R1=R2. It seems R3 is a bit more difficult than the others, though, there are 3/24 more cases where R3 loses, although the 24 cases aren't all equally likely.

Now we have to break out more proper maths to asses the likelyhood of each of these 24 cases. In the diagram below I've written the probability of each state, based on a contestant knowing a proportion p, of all the questions in the show's repository. We use this to calculate a total probability for surviving a single set, as a function of p.

We also include the odds of survival if the contestant guesses randomly rather than for the last question. One can check this gives the same total in each case. (*The R3 'reject first one if unknown' tactic is still necessarily used, leading to some of the strangeness in this column.)

We use our table above to calculate a total probability for surviving a set, as a function of p.

Ok, so, what does this mean in terms of real numbers? Here's your chances of clearing a set in a certain round as a function of your question rate, p.

"Oh, that's surprising!" says the mathematician in the audience. "The R3 one looks suspiciously like a straight line to me." Indeed it does, and upon some algebra, we can confirm:

Is there a good explanation for this? Yes, and in fact we have one. Suppose the R3 player commits to take the last question if they don't spot one they know is correct among the first 2. With probability 1/3 the correct pair lurk at the end, and regardless the player is going to pick it. Otherwise, with probability 2/3, the contestant needs to notice the correct question when it appears - they do so with probability p.

Next time, we'll look at the overall game, including the argument that guessing the correct numbers is not always good for you.

----Part 2 ----