Breaking Codes with Excel (Part 1)

Breaking Codes with Excel (Part 1)

As a young teenager (many decades ago, and long before the internet of anything) one of ways I'd while away a dull rainy afternoon with friends would be to try to break their "secret codes" (and they mine). We'd each invent a secret symbol or new letter to represent each letter of the alphabet, and use these to re-write (encode) a private message. The message was often a paragraph or two from a book or newspaper. We'd pass on the encoded messages to each other, but keep the details of the code we used to ourselves.

The challenge was then to figure out what the message originally said...

---

This may sound impossible, but it's merely difficult. The English language has patterns and statistics which aren't altered by this simple letter or symbol-replacement method (more correctly known as a substitution cipher), and we can use this to our advantage.

To solve one of these ciphers, you start by listing each symbol in a table and counting how many times it occurs. The more frequently occurring symbols in the message are likely to be one of the more frequently occurring letters in the English language - such as E,T or A. Getting started then involves a little guesswork and a lot of trial and error. You might try replacing that square symbol with an E, and the circle symbol with a T everywhere in your message. If you're lucky, you might then see a couple of 3-letter words like T ? E which might be THE. You might also see some 1-letter words, which are very likely to be an I or an A, which might lead you to find words like IS, AS, IN, etc. And if you see a word ending in IN? then it might suggest the word ends with ING. And so on.

Usually however, you make lots of wrong guesses, and have to backtrack. A pencil and eraser were once indispensable tools for a young code-breaker.

I only mention the above because I was recently challenged to break a code like this by someone on Facebook. I haven't done anything like this for a long time, but it sounded like fun. It struck me that in 2020 you'd be crazy to do this with a pencil and paper, and so i set about building an Excel spreadsheet to do the grunt-work.

Here's the approach I took.

So let's assume I was given this message to decode....

The first thing I would need would be a table of all the symbols in the message, some space where I could write my guesses at the plain text equivalent, and a count of the number of times each letter occurs. Each of these things can be prepared manually, but it's useful to write some code (which I did) to perform the counting automatically.

The next thing I'd want is some workspace where I can view the original message and see my deciphered version develop. One simple way would be to split the message up into separate letters, and to use an HLOOKUP() function on the upper ciphertext rows to reveal plaintext letters in the row underneath. Again this can be done manually, but it's a lot easier to use code to generate this table to the right size, and to pre-populate it with letters from the given cipher.

You can see that I've guessed that G=e and I=t in the above as these are the most commonly occurring letters in the coded message, and in the English language. It's difficult to tell whether those guesses are right or wrong at the moment, but the lone P in the top row, strongly suggests it might be the letter 'a' or possibily the personal pronoun 'I', but of course it might just be a T cell or a B grade too. Life isn't always fair.

I didn't say this was easy, but at least we have a tool to take some of the more tedious work out of the decoding. Any guesses or changes we want to make are applied without effort.

To add a little finesse, I then added some sort buttons and code to the alphabet look-ups to allow them to be viewed either in alphabetical order (to help spot any obvious patterns), or indeed in letter frequency order (so we can more easily work with the most popular letters first), as shown here. I've also added a visual reminder (in italics) of the ordering of the most common to least common letters in English text as an aide-memoire.

If we're right about G=e, then perhaps we should consider which letters might follow this. Statistically, the most likely letter to follow an e is an r. (19.6% of the time) or an n (13.9%). So what follows G most often in the ciphertext?

That's what the next addition supports (and yes we're heavily into code now). Here are the 2-letter patterns (Bigrams) in the ciphertext, listed in descending frequency order. We can see that G is most often followed by E, so that if we're right about G=e, then there's a reasonable chance that E=r (or possibly E=n) too.

So you get the idea. I've also added similar analyses for 3 and 4 letter sequences, from which you may gather other ideas to try - for example is ICG=the? or is that too far down the order, when UCG (also ending in e) is right near the top - should we reconsider I=t?

So that's basically how you start, and how a spreadsheet (particularly with some useful code support) can help you on your way. With persistence and care, things should progress to the point where you can begin to make out words and grammar, at which point the problem begins to almost solve itself.

So you may be asking - where did I get all this data about bigram, trigram and quadgram frequencies? The answer is from Google computer scientist - Peter Norvig, who has  analysed gigabytes of Google Ngram data, produced extensive and detailed statistics on such things, and made it all available here - http://norvig.com/mayzner.html. This project wouldn't have been anywhere near as 'well informed' without his unwitting help.

So what's the decoded message (I hear you ask)? Well I'm going to keep you in suspense until my next blog article, when I consider whether all this manual experimentation and guesswork could be done, you know - automatically?

Kevin Pretorius

August 2020

Excel Wizardry
Excel Wizardry Logo
Working Magic with MS Excel

Excel Wizardry

The Laurels, Meadow Close,
Blackwater,
CAMBERLEY,
Surrey
GU17 9DB

Tel: 07766 492 991
Email: kevin@excel-wizardry.co.uk
We use cookies

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.