<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>eric.ness.net &#187; Natural Language Processing</title>
	<atom:link href="http://eric.ness.net/archives/tag/natural-language-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://eric.ness.net</link>
	<description>...I never learned to read.</description>
	<lastBuildDate>Sat, 21 Jan 2012 05:27:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Cryptanalysis Using n-Gram Probabilities</title>
		<link>http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/</link>
		<comments>http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/#comments</comments>
		<pubDate>Sat, 01 May 2010 09:35:31 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=472</guid>
		<description><![CDATA[Cryptanalysis Using Microsoft Web N-Gram Service]]></description>
			<content:encoded><![CDATA[<!-- Start Shareaholic LikeButtonSetTop Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-fblike' data-shr_layout='button_count' data-shr_showfaces='false' data-shr_href='http%3A%2F%2Feric.ness.net%2Farchives%2Fcryptanalysis-using-n-gram-probabilities%2F' data-shr_title='Cryptanalysis+Using+n-Gram+Probabilities'></a><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='true' data-shr_href='http%3A%2F%2Feric.ness.net%2Farchives%2Fcryptanalysis-using-n-gram-probabilities%2F' data-shr_title='Cryptanalysis+Using+n-Gram+Probabilities'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetTop Automatic --><p><img class="alignnone" src="/wp-content/uploads/2010/05/cryptanalysis.jpg" alt="" width="577" height="360" /></p>
<p>One of my favorite programmers is <a href="http://norvig.com/">Peter Norvig</a> who is currently Director of Research at Google. This summer I picked up a book called <a href="http://oreilly.com/catalog/9780596157128">Beautiful Data</a> in which Norvig contributed a chapter called &#8220;Natural Language Corpus Data&#8221; in which he outlined a number of very cool things you can do with n-grams in the google  corpus. It covers some of the things you&#8217;d imagine that it would cover: spelling correction, word segmentation, etc. The one item covered that I had never considered was in the area of cryptanalysis.</p>
<p>The cool thing is that Google will give you their corpus to <a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">download</a>. The only problem is that it contains &#8220;1,024,908,267,229 words of running text&#8221; and is 24 GB compressed in size. This is a bit impractical to run on your dev box. Enter Microsoft &#8211; the <a href="http://web-ngram.research.microsoft.com/info/">Microsoft Web N-gram Service </a>just went Beta and is now available to Professors and Students so I immediately signed up and I have to say that it pretty cool!</p>
<p>So I wanted to try out the new service using one of Norvig&#8217;s examples in his book &#8211; specifically using n-gram probabilities and character shifting. This is a very simple example and fairly basic type of encryption where the if the user types an &#8216;a&#8217; it gets shifted to &#8216;n&#8217; or whatever. So you simply run through all 26 possibilities and use the individual words combined probabilities to determine the answer to the encoded message.</p>
<p>This project has a Service Refrence connected to <a href="http://web-ngram.research.microsoft.com/info/">Microsoft&#8217;s n-Gram Service</a>. The service requires an n-gram model and a user id which you get when you sign up (<a href="http://web-ngram.research.microsoft.com/info/quickstart.htm">see their quickstart tutorial</a>). So let&#8217;s take a look at some code:</p>
<pre class="brush: jscript; title: ; notranslate">
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Linq;
using MicrosoftNGramTest.NGramService;

namespace MicrosoftNGramTest.classes
{
    internal class Shift
    {
        #region Variables

        private readonly string _alphabet = &quot;abcdefghijklmnopqrstuvwxyz&quot;;
        private readonly string _ngramModel = ConfigurationManager.AppSettings.Get(&quot;ngramModel&quot;);
        private readonly string _userToken = ConfigurationManager.AppSettings.Get(&quot;userToken&quot;);

        #endregion

        #region Run The Test

        /// &lt;summary&gt;
        /// Runs the test
        /// &lt;/summary&gt;
        public void Test()
        {
            // Print title
            Console.WriteLine(&quot;Character Shift Cryptanalysis&quot;);
            Console.WriteLine(&quot;#############################&quot;);

            // Local Variables
            const string phrase = &quot;Yvfgra, qb lbh jnag gb xabj n frperg?&quot;;
            string[] words = phrase.ToLower().Split(' ');
            var newPhrase = new string[26];
            var client = new LookupServiceClient();
            var result = new Dictionary&lt;string, int&gt;();

            try
            {
                // Loop the word variations
                foreach (string s in words)
                {
                    char[] currentWord = s.ToCharArray();

                    foreach (char c in currentWord)
                    {
                        for (int i = 0; i &lt; 26; i++)
                        {
                            newPhrase[i] += CharShift(c, i);
                        }
                    }

                    for (int i = 0; i &lt; newPhrase.Count(); i++)
                    {
                        newPhrase[i] += &quot; &quot;;
                    }
                }

                // Print phrases with probabilities
                foreach (string s in newPhrase)
                {
                    string[] newWords = s.Split(' ');
                    double prob = 0;
                    foreach (string word in newWords)
                    {
                        prob += client.GetProbability(_userToken, _ngramModel, word);
                    }
                    Console.WriteLine(s + &quot; &quot; + Convert.ToInt32(prob));
                    result.Add(s, Convert.ToInt32(prob));
                }

                // Print answer
                Console.WriteLine();
                Console.WriteLine(&quot;The answer is:&quot;);
                KeyValuePair&lt;string, int&gt; q = (from t in result
                                               orderby t.Value descending
                                               select t).FirstOrDefault();
                Console.WriteLine(q.Key + &quot; &quot; + q.Value);
            }
            finally
            {
                client.Close();
            }
        }

        #endregion

        #region Shifting

        /// &lt;summary&gt;
        /// Gets the alphabet array.
        /// &lt;/summary&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private char[] GetAlphabetArray()
        {
            return _alphabet.ToCharArray();
        }

        /// &lt;summary&gt;
        /// Gets the current char array position.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;c&quot;&gt;The c.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private int GetCurrentCharArrayPosition(char c)
        {
            int position = 0;
            int count = 0;

            foreach (char letter in GetAlphabetArray())
            {
                if (letter == c)
                {
                    position = count;
                }
                count++;
            }
            return position;
        }

        /// &lt;summary&gt;
        /// Shifts the character.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;c&quot;&gt;The c.&lt;/param&gt;
        /// &lt;param name=&quot;increase&quot;&gt;The increase.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private char CharShift(char c, int increase)
        {
            const int numOfLetters = 26;
            char[] alphabet = GetAlphabetArray();
            int currentArrayPosition = GetCurrentCharArrayPosition(c);
            char letter = c;

            if (IsCharInArray(c))
            {
                if ((currentArrayPosition + increase) &lt; numOfLetters)
                {
                    letter = alphabet[currentArrayPosition + increase];
                }
                else
                {
                    int newPosition = (currentArrayPosition + increase) - numOfLetters;
                    letter = alphabet[newPosition];
                }
            }
            return letter;
        }

        /// &lt;summary&gt;
        /// Determines whether the char is in the array.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;c&quot;&gt;The c.&lt;/param&gt;
        /// &lt;returns&gt;
        /// 	&lt;c&gt;true&lt;/c&gt; if [is char in array] [the specified c]; otherwise, &lt;c&gt;false&lt;/c&gt;.
        /// &lt;/returns&gt;
        private bool IsCharInArray(char c)
        {
            bool isCharInArray = false;
            IEnumerable&lt;char&gt; q = (from t in GetAlphabetArray()
                                   where t == c
                                   select t);
            if (q.Count() &gt; 0)
            {
                isCharInArray = true;
            }
            return isCharInArray;
        }

        #endregion
    }
}
</pre>
<p>And here is the result!<br />
<img src="/wp-content/uploads/2010/05/crypt_results.jpg" alt="Results" width="577" /></p>
<div class="shr-publisher-472"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><!-- End Shareaholic LikeButtonSetBottom Automatic -->]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Empirical Methods in Natural Language Processing Course</title>
		<link>http://eric.ness.net/archives/empirical-methods-in-natural-language-processing-course/</link>
		<comments>http://eric.ness.net/archives/empirical-methods-in-natural-language-processing-course/#comments</comments>
		<pubDate>Sun, 12 Apr 2009 19:05:11 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=170</guid>
		<description><![CDATA[I came across this course in Natural Language Processing today while doing some research.]]></description>
			<content:encoded><![CDATA[<!-- Start Shareaholic LikeButtonSetTop Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-fblike' data-shr_layout='button_count' data-shr_showfaces='false' data-shr_href='http%3A%2F%2Feric.ness.net%2Farchives%2Fempirical-methods-in-natural-language-processing-course%2F' data-shr_title='Empirical+Methods+in+Natural+Language+Processing+Course'></a><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='true' data-shr_href='http%3A%2F%2Feric.ness.net%2Farchives%2Fempirical-methods-in-natural-language-processing-course%2F' data-shr_title='Empirical+Methods+in+Natural+Language+Processing+Course'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetTop Automatic --><p><a href="http://eric.ness.net/wp-content/uploads/2009/04/nlpcourse.jpg"><img class="alignnone size-full wp-image-171" title="nlpcourse" src="http://eric.ness.net/wp-content/uploads/2009/04/nlpcourse.jpg" alt="" width="577" height="360" /></a></p>
<p>I came across this course in Natural Language Processing today while doing some research. It is currently being taught at The University of Edinburgh School of Informatics.</p>
<blockquote><p>This course is an introduction to data-driven methods applied to natural language processing. The emphasis is on methods, but we will survey applications such as syntactic parsing, text classification, information extraction, tagging, summarization. The final lectures will deal with statistical machine translation.</p></blockquote>
<p>See the lecture notes <a href="http://www.inf.ed.ac.uk/teaching/courses/emnlp/">here</a>.</p>
<div class="shr-publisher-170"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><!-- End Shareaholic LikeButtonSetBottom Automatic -->]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/empirical-methods-in-natural-language-processing-course/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Contextual Valence Shifting</title>
		<link>http://eric.ness.net/archives/contextual-valence-shifting/</link>
		<comments>http://eric.ness.net/archives/contextual-valence-shifting/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 20:45:41 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=154</guid>
		<description><![CDATA[I have been reading up on Contextual Valence Shifting and I came across two interesting papers I thought I would share.]]></description>
			<content:encoded><![CDATA[<!-- Start Shareaholic LikeButtonSetTop Automatic --><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><div class='shareaholic-like-buttonset' style='float:none;height:30px;'><a class='shareaholic-fblike' data-shr_layout='button_count' data-shr_showfaces='false' data-shr_href='http%3A%2F%2Feric.ness.net%2Farchives%2Fcontextual-valence-shifting%2F' data-shr_title='Contextual+Valence+Shifting'></a><a class='shareaholic-googleplusone' data-shr_size='medium' data-shr_count='true' data-shr_href='http%3A%2F%2Feric.ness.net%2Farchives%2Fcontextual-valence-shifting%2F' data-shr_title='Contextual+Valence+Shifting'></a></div><div style="clear: both; min-height: 1px; height: 3px; width: 100%;"></div><!-- End Shareaholic LikeButtonSetTop Automatic --><p><a href="http://eric.ness.net/wp-content/uploads/2009/03/contextualvalenceshifting.jpg"><img class="alignnone size-full wp-image-156" title="contextualvalenceshifting" src="http://eric.ness.net/wp-content/uploads/2009/03/contextualvalenceshifting.jpg" alt="" width="577" height="360" /></a></p>
<p>I have been reading up on Contextual Valence Shifting and I came across two interesting papers I thought I would share.</p>
<p>In a nutshell Valence Shifting helps determine if a given sentence has a positive or negative tone.</p>
<blockquote><p>In addition to describing facts and events, texts often communicate information about the attitude of the writer or various participants towards an event being described. Salient clues about attitude are provided by the lexical choice of the writer but, as discussed below, the organization of the text also contributes critical information for attitude assessment.</p></blockquote>
<p><a href="http://www.aaai.org/Papers/Symposia/Spring/2004/SS-04-07/SS04-07-020.pdf">Contextual Valence Shifters</a> by Livia Polanyi and Annie Zaenen [pdf]<br />
<a href="http://www.tacoma.washington.edu/tech/docs/research/gradresearch/ldillard.pdf">â€œI Canâ€™t Recommend This Paper Highly Enoughâ€: Valence-Shifted Sentences in Sentiment Classification</a> by Logan Dillard</p>
<div class="shr-publisher-154"></div><!-- Start Shareaholic LikeButtonSetBottom Automatic --><!-- End Shareaholic LikeButtonSetBottom Automatic -->]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/contextual-valence-shifting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

