Cryptanalysis Using n-Gram Probabilities
May 1st, 2010 | Published in Uncategorized
One of my favorite programmers is Peter Norvig who is currently Director of Research at Google. This summer I picked up a book called Beautiful Data in which Norvig contributed a chapter called “Natural Language Corpus Data” in which he outlined a number of very cool things you can do with n-grams in the google corpus. It covers some of the things you’d imagine that it would cover: spelling correction, word segmentation, etc. The one item covered that I had never considered was in the area of cryptanalysis.
The cool thing is that Google will give you their corpus to download. The only problem is that it contains “1,024,908,267,229 words of running text” and is 24 GB compressed in size. This is a bit impractical to run on your dev box. Enter Microsoft – the Microsoft Web N-gram Service just went Beta and is now available to Professors and Students so I immediately signed up and I have to say that it pretty cool!
So I wanted to try out the new service using one of Norvig’s examples in his book – specifically using n-gram probabilities and character shifting. This is a very simple example and fairly basic type of encryption where the if the user types an ‘a’ it gets shifted to ‘n’ or whatever. So you simply run through all 26 possibilities and use the individual words combined probabilities to determine the answer to the encoded message.
This project has a Service Refrence connected to Microsoft’s n-Gram Service. The service requires an n-gram model and a user id which you get when you sign up (see their quickstart tutorial). So let’s take a look at some code:
using System; using System.Collections.Generic; using System.Configuration; using System.Linq; using MicrosoftNGramTest.NGramService; namespace MicrosoftNGramTest.classes { internal class Shift { #region Variables private readonly string _alphabet = "abcdefghijklmnopqrstuvwxyz"; private readonly string _ngramModel = ConfigurationManager.AppSettings.Get("ngramModel"); private readonly string _userToken = ConfigurationManager.AppSettings.Get("userToken"); #endregion #region Run The Test /// <summary> /// Runs the test /// </summary> public void Test() { // Print title Console.WriteLine("Character Shift Cryptanalysis"); Console.WriteLine("#############################"); // Local Variables const string phrase = "Yvfgra, qb lbh jnag gb xabj n frperg?"; string[] words = phrase.ToLower().Split(' '); var newPhrase = new string[26]; var client = new LookupServiceClient(); var result = new Dictionary<string, int>(); try { // Loop the word variations foreach (string s in words) { char[] currentWord = s.ToCharArray(); foreach (char c in currentWord) { for (int i = 0; i < 26; i++) { newPhrase[i] += CharShift(c, i); } } for (int i = 0; i < newPhrase.Count(); i++) { newPhrase[i] += " "; } } // Print phrases with probabilities foreach (string s in newPhrase) { string[] newWords = s.Split(' '); double prob = 0; foreach (string word in newWords) { prob += client.GetProbability(_userToken, _ngramModel, word); } Console.WriteLine(s + " " + Convert.ToInt32(prob)); result.Add(s, Convert.ToInt32(prob)); } // Print answer Console.WriteLine(); Console.WriteLine("The answer is:"); KeyValuePair<string, int> q = (from t in result orderby t.Value descending select t).FirstOrDefault(); Console.WriteLine(q.Key + " " + q.Value); } finally { client.Close(); } } #endregion #region Shifting /// <summary> /// Gets the alphabet array. /// </summary> /// <returns></returns> private char[] GetAlphabetArray() { return _alphabet.ToCharArray(); } /// <summary> /// Gets the current char array position. /// </summary> /// <param name="c">The c.</param> /// <returns></returns> private int GetCurrentCharArrayPosition(char c) { int position = 0; int count = 0; foreach (char letter in GetAlphabetArray()) { if (letter == c) { position = count; } count++; } return position; } /// <summary> /// Shifts the character. /// </summary> /// <param name="c">The c.</param> /// <param name="increase">The increase.</param> /// <returns></returns> private char CharShift(char c, int increase) { const int numOfLetters = 26; char[] alphabet = GetAlphabetArray(); int currentArrayPosition = GetCurrentCharArrayPosition(c); char letter = c; if (IsCharInArray(c)) { if ((currentArrayPosition + increase) < numOfLetters) { letter = alphabet[currentArrayPosition + increase]; } else { int newPosition = (currentArrayPosition + increase) - numOfLetters; letter = alphabet[newPosition]; } } return letter; } /// <summary> /// Determines whether the char is in the array. /// </summary> /// <param name="c">The c.</param> /// <returns> /// <c>true</c> if [is char in array] [the specified c]; otherwise, <c>false</c>. /// </returns> private bool IsCharInArray(char c) { bool isCharInArray = false; IEnumerable<char> q = (from t in GetAlphabetArray() where t == c select t); if (q.Count() > 0) { isCharInArray = true; } return isCharInArray; } #endregion } }
And here is the result!
Related Posts
Starting a Ph.D. in Computer ScienceMonte Carlo Simulations in C#
Apriori Algorithm
Benford’s Law and Trailing Digit Tests
K-Means Document Clustering