# Decrypting substitution ciphers using MCMC

Table of Contents

My obsession with Cryptography and MCMC algorithms led me across several articles regarding applications of MCMC to decrypt different kinds of cyphers [1]. Among these, substitution cyphers are the simplest. A substitution cypher is basically every symbol in some arbitrary text (alphabets, numerals, punctuation marks, blank spaces) mapped bijectively over a set of arbitrary symbols. This set can be the same as the domain set or can be different. If the sets are same, the text would look scrambled. Depending on the cardinality C of the domain set, the actual key can be any of the C! combinations. Finding a key by brute force on a computer, would still require some sort of evaluation function to check whether the key is correct.

As you could have guessed, these challenges are insurmountable if one goes about doing it naively. Statistics comes to our rescue in almost a magical way. Here is a Shiny application I built, that demonstrates decryption of a demo text using a MCMC algorithm [2]. This app could also be used to decrypt your own text, however you would need to make sure the text is more than \(1500\) words to have sufficient sample size (Weak law of large numbers, blah blah…) and uses only letters, numbers and commonly used punctuations. Also decryption with reasonable accuracy takes 15K iterations at minimum, so you may need to be patient. To know more about how this algorithm works, keep reading.

#### Shiny Application

If the embedded application below is invisible, you may try this link.

#### Idea and framework

Let’s define the true key as a bijective function \(f\) : {code space} \(\rightarrow\) {usual alphabet} and the space of all possible keys as \(\mathcal{F}\) such that \(f \in \mathcal{F}\). Here I use a very standard approach [3, 4] to decrypting by using statistics of written English to guess at probable choices for \(f\).

To get the statistics, I downloaded a pdf of “War and Peace” and recorded the first order transitions i.e. the proportion of consecutive text symbols from one character to the consecutive next one. This gives a matrix \(M(x, y)\) of

transitions for all possible pairs in the alphabet space. To define an evaluation criteria, I use plausibility of \(f\) being the right key as:

$$Pl(f) = \prod_{i\in \mathcal{S}}M (f(s_i), f(s_i+1))$$

where \(s_i\) runs over the space of consecutive symbols \((\mathcal{S})\) in the encrypted text. Functions \(f\) which have high values of \(Pl(f)\)

are good candidates for decryption. Space \(\mathcal{F}\) can be searched for \(f’s\) that maximizes \(Pl(f)\) iteratively by running the following Markov chain Monte Carlo (MCMC) algorithm:

#### Algorithm

- Start with a preliminary guess, say \(f\).
- Compute \(Pl(f)\).
- Change to \(f^*\) by making a random transposition of the values \(f\) assigns to two symbols.
- Compute \(Pl(f^∗)\); if this is larger than \(Pl(f)\), accept \(f^∗\).
- If not, flip a \(Pl(f^∗)/Pl(f)\) coin; if it comes up heads, accept \(f^∗\).
- If the coin toss comes up tails, stay at \(f\).
- Keep repeating steps 1 to 6 until \(Pl(f)\) stops increasing.

The algorithm tries to improve the current \(f\) by making random transpositions. The coin tosses allow it to go to less plausible \(f’s\), and keep it from getting stuck in local maxima.

#### References

```
@book{stinson2005cryptography,
title={Cryptography: theory and practice},
author={Stinson, Douglas R},
year={2005},
publisher={CRC press}
}
```

[Bibtex]

```
@article{chen2012decrypting,
title={Decrypting classical cipher text using Markov chain Monte Carlo},
author={Chen, Jian and Rosenthal, Jeffrey S},
journal={Statistics and Computing},
volume={22},
number={2},
pages={397--413},
year={2012},
publisher={Springer}
}
```

[Bibtex]

```
@article{diaconis2009markov,
title={The markov chain monte carlo revolution},
author={Diaconis, Persi},
journal={Bulletin of the American Mathematical Society},
volume={46},
number={2},
pages={179--205},
year={2009}
}
```

[Bibtex]

```
@article{conner2003simulation,
title={Simulation and solving substitution codes},
author={Conner, S},
journal={Master's thesis, Department of Statistics, University of Warwick},
year={2003}
}
```