<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>eric.ness.net &#187; Programming</title>
	<atom:link href="http://eric.ness.net/archives/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://eric.ness.net</link>
	<description>...I never learned to read.</description>
	<lastBuildDate>Fri, 23 Jul 2010 05:22:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Cryptanalysis Using n-Gram Probabilities</title>
		<link>http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/</link>
		<comments>http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/#comments</comments>
		<pubDate>Sat, 01 May 2010 09:35:31 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=472</guid>
		<description><![CDATA[Cryptanalysis Using Microsoft Web N-Gram Service]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;n=Cryptanalysis+Using+n-Gram+Probabilities&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;title=Cryptanalysis+Using+n-Gram+Probabilities" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;title=Cryptanalysis+Using+n-Gram+Probabilities" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;t=Cryptanalysis+Using+n-Gram+Probabilities" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;title=Cryptanalysis+Using+n-Gram+Probabilities" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/&amp;title=Cryptanalysis+Using+n-Gram+Probabilities" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Cryptanalysis+Using+n-Gram+Probabilities+-+http://b2l.me/r867k&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>One of my favorite programmers is <a href="http://norvig.com/">Peter Norvig</a> who is currently Director of Research at Google. This summer I picked up a book called <a href="http://oreilly.com/catalog/9780596157128">Beautiful Data</a> in which Norvig contributed a chapter called &#8220;Natural Language Corpus Data&#8221; in which he outlined a number of very cool things you can do with n-grams in the google  corpus. It covers some of the things you&#8217;d imagine that it would cover: spelling correction, word segmentation, etc. The one item covered that I had never considered was in the area of cryptanalysis.</p>
<p>The cool thing is that Google will give you their corpus to <a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">download</a>. The only problem is that it contains &#8220;1,024,908,267,229 words of running text&#8221; and is 24 GB compressed in size. This is a bit impractical to run on your dev box. Enter Microsoft &#8211; the <a href="http://web-ngram.research.microsoft.com/info/">Microsoft Web N-gram Service </a>just went Beta and is now available to Professors and Students so I immediately signed up and I have to say that it pretty cool!</p>
<p>So I wanted to try out the new service using one of Norvig&#8217;s examples in his book &#8211; specifically using n-gram probabilities and character shifting. This is a very simple example and fairly basic type of encryption where the if the user types an &#8216;a&#8217; it gets shifted to &#8216;n&#8217; or whatever. So you simply run through all 26 possibilities and use the individual words combined probabilities to determine the answer to the encoded message.</p>
<p>This project has a Service Refrence connected to <a href="http://web-ngram.research.microsoft.com/info/">Microsoft&#8217;s n-Gram Service</a>. The service requires an n-gram model and a user id which you get when you sign up (<a href="http://web-ngram.research.microsoft.com/info/quickstart.htm">see their quickstart tutorial</a>). So let&#8217;s take a look at some code:</p>
<pre class="brush: jscript;">
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Linq;
using MicrosoftNGramTest.NGramService;

namespace MicrosoftNGramTest.classes
{
    internal class Shift
    {
        #region Variables

        private readonly string _alphabet = &quot;abcdefghijklmnopqrstuvwxyz&quot;;
        private readonly string _ngramModel = ConfigurationManager.AppSettings.Get(&quot;ngramModel&quot;);
        private readonly string _userToken = ConfigurationManager.AppSettings.Get(&quot;userToken&quot;);

        #endregion

        #region Run The Test

        /// &lt;summary&gt;
        /// Runs the test
        /// &lt;/summary&gt;
        public void Test()
        {
            // Print title
            Console.WriteLine(&quot;Character Shift Cryptanalysis&quot;);
            Console.WriteLine(&quot;#############################&quot;);

            // Local Variables
            const string phrase = &quot;Yvfgra, qb lbh jnag gb xabj n frperg?&quot;;
            string[] words = phrase.ToLower().Split(' ');
            var newPhrase = new string[26];
            var client = new LookupServiceClient();
            var result = new Dictionary&lt;string, int&gt;();

            try
            {
                // Loop the word variations
                foreach (string s in words)
                {
                    char[] currentWord = s.ToCharArray();

                    foreach (char c in currentWord)
                    {
                        for (int i = 0; i &lt; 26; i++)
                        {
                            newPhrase[i] += CharShift(c, i);
                        }
                    }

                    for (int i = 0; i &lt; newPhrase.Count(); i++)
                    {
                        newPhrase[i] += &quot; &quot;;
                    }
                }

                // Print phrases with probabilities
                foreach (string s in newPhrase)
                {
                    string[] newWords = s.Split(' ');
                    double prob = 0;
                    foreach (string word in newWords)
                    {
                        prob += client.GetProbability(_userToken, _ngramModel, word);
                    }
                    Console.WriteLine(s + &quot; &quot; + Convert.ToInt32(prob));
                    result.Add(s, Convert.ToInt32(prob));
                }

                // Print answer
                Console.WriteLine();
                Console.WriteLine(&quot;The answer is:&quot;);
                KeyValuePair&lt;string, int&gt; q = (from t in result
                                               orderby t.Value descending
                                               select t).FirstOrDefault();
                Console.WriteLine(q.Key + &quot; &quot; + q.Value);
            }
            finally
            {
                client.Close();
            }
        }

        #endregion

        #region Shifting

        /// &lt;summary&gt;
        /// Gets the alphabet array.
        /// &lt;/summary&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private char[] GetAlphabetArray()
        {
            return _alphabet.ToCharArray();
        }

        /// &lt;summary&gt;
        /// Gets the current char array position.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;c&quot;&gt;The c.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private int GetCurrentCharArrayPosition(char c)
        {
            int position = 0;
            int count = 0;

            foreach (char letter in GetAlphabetArray())
            {
                if (letter == c)
                {
                    position = count;
                }
                count++;
            }
            return position;
        }

        /// &lt;summary&gt;
        /// Shifts the character.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;c&quot;&gt;The c.&lt;/param&gt;
        /// &lt;param name=&quot;increase&quot;&gt;The increase.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private char CharShift(char c, int increase)
        {
            const int numOfLetters = 26;
            char[] alphabet = GetAlphabetArray();
            int currentArrayPosition = GetCurrentCharArrayPosition(c);
            char letter = c;

            if (IsCharInArray(c))
            {
                if ((currentArrayPosition + increase) &lt; numOfLetters)
                {
                    letter = alphabet[currentArrayPosition + increase];
                }
                else
                {
                    int newPosition = (currentArrayPosition + increase) - numOfLetters;
                    letter = alphabet[newPosition];
                }
            }
            return letter;
        }

        /// &lt;summary&gt;
        /// Determines whether the char is in the array.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;c&quot;&gt;The c.&lt;/param&gt;
        /// &lt;returns&gt;
        /// 	&lt;c&gt;true&lt;/c&gt; if [is char in array] [the specified c]; otherwise, &lt;c&gt;false&lt;/c&gt;.
        /// &lt;/returns&gt;
        private bool IsCharInArray(char c)
        {
            bool isCharInArray = false;
            IEnumerable&lt;char&gt; q = (from t in GetAlphabetArray()
                                   where t == c
                                   select t);
            if (q.Count() &gt; 0)
            {
                isCharInArray = true;
            }
            return isCharInArray;
        }

        #endregion
    }
}
</pre>
<p>And here is the result!<br />
<img src="/wp-content/uploads/2010/05/crypt_results.jpg" width="577" alt="Results" /></p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/cryptanalysis-using-n-gram-probabilities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apriori Algorithm</title>
		<link>http://eric.ness.net/archives/apriori-algorithm/</link>
		<comments>http://eric.ness.net/archives/apriori-algorithm/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 00:43:31 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=445</guid>
		<description><![CDATA[Review of Apriori algorithm and changes.]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/apriori-algorithm/&amp;n=Apriori+Algorithm&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/apriori-algorithm/&amp;title=Apriori+Algorithm" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/apriori-algorithm/&amp;title=Apriori+Algorithm" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/apriori-algorithm/&amp;t=Apriori+Algorithm" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/apriori-algorithm/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/apriori-algorithm/&amp;title=Apriori+Algorithm" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/apriori-algorithm/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/apriori-algorithm/&amp;title=Apriori+Algorithm" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Apriori+Algorithm+-+http://b2l.me/kwmmq&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>I&#8217;ve been meaning to get in to the <a href="http://datamining.codeplex.com/">Data Mining SDK</a> at code plex for a while as it has a couple of good items in it. The one item I was really interested in was the <a href="http://en.wikipedia.org/wiki/Apriori_algorithm">apriori algorithm</a>.</p>
<p>Wikipedia describes Apriori:</p>
<blockquote><p>In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).</p></blockquote>
<p>The classic example is if you own a store and someone buys milk what is the probability that he will also buy bread and eggs or if voters in one state voted for one issue what is the chance he voted for something else. The applications for this approach are pretty much limitless.</p>
<p>The code in the SDK is pretty good with a couple of exceptions: there is little documentation and it only supports XML files and OleDb data connections. I have reworked it so it will also connect to a MSSQL database.</p>
<p>For this test application I created a simple C# Console Application and imported the &#8220;APriori&#8221; project in to the solution. In the APriori project you will to add these two bits of code to classes to the APriori project:</p>
<p>Add this method to DataAccessLayer.cs</p>
<pre class="brush: jscript;">
	public Data GetTransactionsData(string rdbmsConnectionString, string dataSource)
        {
            myDatabase = new Data();
            string query = &quot;SELECT * FROM &quot; + dataSource;
            var myConn = new SqlConnection(rdbmsConnectionString);
            var myDBAdapter = new SqlDataAdapter(query, myConn);

            myConn.Open();
            try
            {
                myDBAdapter.Fill(myDatabase, &quot;TransactionTable&quot;);
            }
            finally
            {
                myConn.Close();
            }
            return myDatabase;
        }
</pre>
<p>Add this method to DataMining.cs</p>
<pre class="brush: jscript;">
public Data MarketBasedAnalysis(double supportCount, double minimumConfidence, string connectionString, string dataSource)
        {

            Database database = new Database();
            ItemsetCandidate Item = new ItemsetCandidate();

            this.AP = new APriori.Apriori();
            this.AP.ProgressMonitorEvent += new ProgressMonitorEventHandler(this.OnProgressMonitoringCompletedEvent);
            this.dataBase = database.GetTransactionsData(connectionString, dataSource);
            database.Transactions = this.dataBase;
            this.transactionsCount = this.dataBase.TransactionTable.Count;

            supportCount = ((supportCount / 100) * this.transactionsCount);

            minimumConfidence = (minimumConfidence / 100);

            string support = &quot;SupportCount &gt;= &quot; + supportCount + &quot; AND Level &gt; 1&quot;;

            string sort = &quot;SupportCount, Level&quot;;
            ItemsetCandidate uniqueItems = AP.CreateOneItemsets(database);
            AP.AprioriGenerator(uniqueItems, database, Convert.ToInt32(supportCount));
            ItemsetArrayList[] keys = database.GetItemset(support, sort);
            string msg = &quot;Creating Frequent Subsets for Items&quot;;
            ProgressMonitorEventArgs e = new ProgressMonitorEventArgs(1, 100, 95, &quot;DataMining.MarketBasedAnalysis(3)&quot;, msg);
            this.OnProgressMonitorEvent(e);

            for (int counter = 0; counter &lt; keys.Length; counter++)
            {
                AP.CreateItemsetSubsets(0, keys[counter], null, database);
            }

            msg = &quot;Completed C#.NET Data Mining Market Based Analysis&quot;;
            e = new ProgressMonitorEventArgs(1, 100, 100, &quot;DataMining.MarketBasedAnalysis(3)&quot;, msg);
            this.OnProgressMonitorEvent(e);

            //Set the public properties of the class
            this.minimumSupportCount = supportCount;
            this.minimumConfidence = minimumConfidence;
            this.connectionString = connectionString;
            this.dataSource = dataSource;
            this.dataSourceCommand = dataSourceCommand;

            //return the database of transactions
            return this.dataBase;

        }
</pre>
<p>Here is my class in my console application</p>
<pre class="brush: jscript;">
using System;
using System.Data;
using VISUAL_BASIC_DATA_MINING_NET;
using VISUAL_BASIC_DATA_MINING_NET.CustomEvents;

namespace APr2.classes
{
    internal class testrun
    {
        private Data _dataAnalysis;
        public event ProgressMonitorEventHandler ProgressMonitorEvent;

        /// &lt;summary&gt;
        /// Runs the Apriori.
        /// &lt;/summary&gt;
        public void RunApriori()
        {
            // Create Data Mining Object
            var myDM = new DataMining();

            // Register Event
            myDM.ProgressMonitorEvent += OnProgressMonitorEvent;

            // Connect To Data Base &amp; Process Items
            _dataAnalysis = myDM.MarketBasedAnalysis(2,             // Support Count
                                                     2,             // Minimum Confidence
                                                     @&quot;Data Source=(local);Initial Catalog=Apriori;Integrated Security=True;&quot;, // Connection String
                                                     &quot;Example&quot;);    // Table in db

            // Copy to Data View
            var dataView = new ViewData();
            _dataAnalysis.Tables.Add(dataView.CreateViewRulesTable(2, _dataAnalysis).Copy());
            _dataAnalysis.Tables.Add(dataView.CreateViewSubsetTable(_dataAnalysis).Copy());

            // Spacer Line
            Console.WriteLine();

            // Print Items
            foreach (DataRow row in dataView.ViewDataSet.Tables[1].Rows)
            {
                double per = Convert.ToDouble(row.ItemArray[2].ToString().Substring(0, (row.ItemArray[2].ToString().Length -1)));
                Console.WriteLine(row.ItemArray[0] + &quot;\t&quot; + row.ItemArray[1] + &quot;\t&quot; + String.Format(&quot;{0:###.##%}&quot;, (per/100)));
            }
        }

        /// &lt;summary&gt;
        /// Called when [progress monitor event].
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;sender&quot;&gt;The sender.&lt;/param&gt;
        /// &lt;param name=&quot;e&quot;&gt;The &lt;see cref=&quot;VISUAL_BASIC_DATA_MINING_NET.CustomEvents.ProgressMonitorEventArgs&quot;/&gt; instance containing the event data.&lt;/param&gt;
        public void OnProgressMonitorEvent(object sender, ProgressMonitorEventArgs e)
        {
            // Prints Event Messages
            Console.Write(&quot;\r&quot; + e.EventMessage);
        }
    }
}
</pre>
<p>Your MSSQL Code will be this</p>
<pre class="brush: jscript;">
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Example](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL,
	[Transactions] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED
(
	[TransactionID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
</pre>
<p>And these records:</p>
<pre class="brush: jscript;">
1	Books, CD, Video
2	CD, Games
3	CD, DVD
4	Books, CD, Games
5	Books, DVD
6	CD, DVD
7	Books, DVD
8	Books, CD, DVD, Video
9	Books, CD, DVD
10	Books, Games
11	Games, Lasers
</pre>
<p>Run the RunApriori() method in my class and it will yield you the correct results. Have fun.</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2010/03/ap_full.jpg"><img class="alignnone size-full wp-image-448" title="ap_full" src="http://eric.ness.net/wp-content/uploads/2010/03/ap_full.jpg" alt="" width="577" height="369" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/apriori-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Benford&#8217;s Law and Trailing Digit Tests</title>
		<link>http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/</link>
		<comments>http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 14:23:50 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=371</guid>
		<description><![CDATA[Looking at Benford's Law and Trailing Digit Test]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;n=Benford%27s+Law+and+Trailing+Digit+Tests&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;title=Benford%27s+Law+and+Trailing+Digit+Tests" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;title=Benford%27s+Law+and+Trailing+Digit+Tests" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;t=Benford%27s+Law+and+Trailing+Digit+Tests" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;title=Benford%27s+Law+and+Trailing+Digit+Tests" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/&amp;title=Benford%27s+Law+and+Trailing+Digit+Tests" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Benford%27s+Law+and+Trailing+Digit+Tests+-+http://b2l.me/kyhps&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>As of late I&#8217;ve been coming across <a href="http://eric.ness.net/archives/benfords-law/">Benford&#8217;s Law</a> all over the place and so I thought I would revisit the topic. Beford&#8217;s Law essentially states &#8220;that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way&#8221;. More specifically you should expect the leading digit &#8217;1&#8242; to appear about 30.1% fo the time, &#8217;2&#8242; about 17.6% and so on [<a href="http://en.wikipedia.org/wiki/Benford%27s_law#Mathematical_statement">see table</a>]. One classic example for the use of Benford&#8217;s Law is in fraud detection.</p>
<p>One of the reasons I wanted to revisit this topic is that after having a conversation with a friend about this very topic, I came across a series of articles written by Nate Silver regarding the polling firm Strategic Vision. Strategic Vision released a poll focusing on <a href="http://www.fivethirtyeight.com/2009/09/are-oklahoma-students-really-this-dumb.html">Oklahoma students</a> and it <a href="http://www.fivethirtyeight.com/2009/09/strategic-vision-polls-exhibit-unusual.html">motivated Silver to ask a series of questions about the firm</a>.</p>
<p>Now I don&#8217;t want to really focus in on all the Strategic Vision stuff but, I do want to talk about a method Silver used to detect anomalies in their polling. As Silver correctly suggests, polling would not be a good candidate to using Benford&#8217;s Law (i.e. take the last Presidential race where for the most part the two candidates were going back and forth in the 40~50% range &#8211; Benford&#8217;s wouldn&#8217;t work). However, there is another method that might give you a little insight as Silver explains:</p>
<blockquote><p><span id="fullpost">For each question, I recorded the <span style="font-style: italic;">trailing digit</span> for each candidate or line item. For instance, if Strategic Vision had Barack Obama beating John McCain 48-43 in a particular state, I&#8217;d record a tally in the 8 column and another in the 3 column. Or if they had voters opposing a particular policy 50-45, I&#8217;d record a tally in the 0 column (for 50) and another in the 5 column (for 45). </span></p></blockquote>
<p><span>And what Silver says essentially is, that if you look at the last digit you should have roughly a </span>uniform distribution. Put it another way, that if I have roughly 200 4&#8242;s I would expect roughly 200 8&#8242;s too. Silver also says that when using the trailing digit method that in some cases, you might find deviations from this distribution might be due to rounding error&#8217;s or a specific mathematical method. The trailing digit test is clearly not a sure fire way to detect fraud or anything, but just another useful tool to see if the data passes the smell test.</p>
<p>Silver&#8217;s insight prompted me to write some code and play around with these two methods just to see what comes up. Because I am fairly familiar with the the United Nation&#8217;s World Health Database, I thought I would run some tests using these methods. Here are the results:</p>
<p><strong>Gross Domestic Product (GDP)</strong></p>
<p>As you can see, GDP generally follows Benford&#8217;s Law and the trailing digit test. Something to note on the trailing digit results &#8211; the average for each number (bin) is 23.7 so the number of 1&#8242;s and the number of 5&#8242;s are roughly equidistant from the average, even though it seems a little odd finding a GDP with a value ending in 1 is almost twice as likely as finding a value ending in 5.</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/gdp.jpg"><img class="alignnone size-full wp-image-374" title="gdp" src="http://eric.ness.net/wp-content/uploads/2009/11/gdp.jpg" alt="gdp" width="600" height="250" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/gdp-trail.jpg"><img class="alignnone size-full wp-image-375" title="gdp trail" src="http://eric.ness.net/wp-content/uploads/2009/11/gdp-trail.jpg" alt="gdp trail" width="600" height="250" /></a></p>
<p><strong>Life expectancy at birth (years)</strong></p>
<p>Now here is an example where Benford&#8217;s Law will not work. The reason is because the range of life spans for all countries is from 40-83 years of age, so we are going to have to focus in on the trailing digit test.</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/life-exp.jpg"><img class="alignnone size-full wp-image-378" title="life exp" src="http://eric.ness.net/wp-content/uploads/2009/11/life-exp.jpg" alt="life exp" width="600" height="250" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/life-exp-trail.jpg"><img class="alignnone size-full wp-image-379" title="life exp trail" src="http://eric.ness.net/wp-content/uploads/2009/11/life-exp-trail.jpg" alt="life exp trail" width="600" height="250" /></a></p>
<p>You will notice that there is something weird here with the trailing digits. First, the average value is 19.3 and we have 0&#8242;s and 2&#8242;s appearing almost 3 times more than 7&#8242;s. One thing that might account for this disparity is the fact that we have a little over 20 countries that fall in the range of 80-83, which would tilt the values of 0,1,2,3 a little higher than normal. And the number of countries in the 40&#8242;s is also a little sparse. So what I did was remove these from the set and re-ran the test. Here are the following results.</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/life-exp-norm-benford.jpg"><img class="alignnone size-full wp-image-382" title="life exp norm benford" src="http://eric.ness.net/wp-content/uploads/2009/11/life-exp-norm-benford.jpg" alt="life exp norm benford" width="600" height="250" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/life-exp-norm-trail.jpg"><img class="alignnone size-full wp-image-383" title="life exp norm trail" src="http://eric.ness.net/wp-content/uploads/2009/11/life-exp-norm-trail.jpg" alt="life exp norm trail" width="600" height="250" /></a></p>
<p>You will notice that after cleaning up the data a 3 is still almost 3 times as likely to appear as a 7 in what should be some fairly naturally occurring numbers. The average for this set is <span id="Label1">14.8.</span></p>
<p><span>Now I think it would be inappropriate for me to draw any hard conclusions about the Life Expectancy data other than to say that some form of rounding has likely occurred due to the fact that the data are all whole numbers.<br />
</span></p>
<p><span>That said though, you can see how these two simple/practical tests can assist in determining whether there has been some human manipulation of the data. Here is the code that I used:</span></p>
<pre class="brush: jscript;">
using System;
using System.Linq;

namespace BenfordsLaw
{
    /// &lt;summary&gt;
    /// Benfords Law Class
    /// &lt;/summary&gt;
    public class Benfords
    {
        /// &lt;summary&gt;
        /// Adds the data.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;data&quot;&gt;The data.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public double[] CalculateBenfordsDistribution(double[] data)
        {
            if (data.Count() == 0)
            {
                throw new ArgumentException(&quot;Error: There are no items in your data array.&quot;);
            }

            //Create benford bins to hold counts
            var benfordsContainer = new double[9];

            // Loop through array
            foreach (double number in data)
            {
                // Get absolute value of number
                double currentNumber = Math.Abs(number);

                // for items smaller than 1 * multiply it so you
                // can find the first number
                if ((currentNumber &lt; 1) &amp;&amp; (currentNumber &gt; 0))
                {
                    double num = (currentNumber*10000);
                    while (num &gt;= 10)
                        num /= 10;
                    PackageNumberInBenfordBin(benfordsContainer, num);
                }
                else
                {
                    double num = currentNumber;
                    while (num &gt;= 10)
                        num /= 10;
                    PackageNumberInBenfordBin(benfordsContainer, num);
                }
            }

            return benfordsContainer;
        }

        /// &lt;summary&gt;
        /// Trailings the digit check.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;data&quot;&gt;The data.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public double[] TrailingDigitCheck(double[] data)
        {
            if (data.Count() == 0)
            {
                throw new ArgumentException(&quot;Error: There are no items in your data array.&quot;);
            }

            //Create benford bins to hold counts
            var trailingContainer = new double[10];

            // Loop through array
            foreach (double number in data)
            {
                // Get absolute value of number
                double currentNumber = Math.Abs(number);
                string numTemp = currentNumber.ToString();
                string numTemp2 = numTemp.Substring(numTemp.Length - 1);
                PackageNumberInTrailingBin(trailingContainer, Convert.ToDouble(numTemp2));
            }

            return trailingContainer;
        }

        /// &lt;summary&gt;
        /// Packages the number in benford bin.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myContainer&quot;&gt;My container.&lt;/param&gt;
        /// &lt;param name=&quot;num&quot;&gt;The num.&lt;/param&gt;
        private static void PackageNumberInBenfordBin(double[] myContainer, double num)
        {
            // Update container totals
            int myNum = Convert.ToInt32(Math.Floor(num));
            myContainer[myNum-1] += 1;
        }

        /// &lt;summary&gt;
        /// Packages the number in trailing bin.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myContainer&quot;&gt;My container.&lt;/param&gt;
        /// &lt;param name=&quot;num&quot;&gt;The num.&lt;/param&gt;
        private static void PackageNumberInTrailingBin(double[] myContainer, double num)
        {
            // Update container totals
            int myNum = Convert.ToInt32(Math.Floor(num));
            myContainer[myNum] += 1;
        }
    }
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/befords-law-and-trailing-digit-tests/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>K-Means Document Clustering</title>
		<link>http://eric.ness.net/archives/k-means-document-clustering/</link>
		<comments>http://eric.ness.net/archives/k-means-document-clustering/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 17:35:48 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=357</guid>
		<description><![CDATA[K-Means Document Clustering in C#]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/k-means-document-clustering/&amp;n=K-Means+Document+Clustering&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/k-means-document-clustering/&amp;title=K-Means+Document+Clustering" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/k-means-document-clustering/&amp;title=K-Means+Document+Clustering" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/k-means-document-clustering/&amp;t=K-Means+Document+Clustering" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/k-means-document-clustering/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/k-means-document-clustering/&amp;title=K-Means+Document+Clustering" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/k-means-document-clustering/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/k-means-document-clustering/&amp;title=K-Means+Document+Clustering" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=K-Means+Document+Clustering+-+http://b2l.me/kxmjk&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>Using our <a href="http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/">previous example</a> as a basis to move to the next step let&#8217;s take a look at clustering using the <a href="http://en.wikipedia.org/wiki/K-means_clustering">K-Means</a> clustering algorithm to group the documents in to their appropriate categories.</p>
<p>In the paper â€œ<a href="http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf">Indexing by Latent Semantic Analysis</a>â€ (Deerwester et al.) they have an example of 9 titles of different papers grouped in to two categories â€œhuman computer interactionâ€ &amp; â€œgraphs &amp; treesâ€. So far, we&#8217;ve used <a href="http://eric.ness.net/archives/singular-value-decomposition/">Singular Value Decomposition</a> (SVD) and <a href="http://eric.ness.net/archives/latent-semantic-indexing/">Latent Semantic Indexing</a> (LSI) to better understand the relationship of words and documents. In the <a href="http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/">last blog post</a> we then took the results in LSI to plot words and documents on a two dimensional Cartesian plane.</p>
<p>All of this is pretty interesting stuff in and of itself however, the next step really is to see which documents belong in each group. One way to do this is by using K-Means clustering.</p>
<blockquote><p>Simply speaking k-means clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data. [<a href="http://people.revoledu.com/kardi/tutorial/kMean/WhatIs.htm">Kardi Teknomo</a>]</p></blockquote>
<p>A big chunk of the code is built off of the same project we are working on. I am using <a href="http://sites.google.com/site/docaresh/">Aresh Saharkhiz</a> K-Means implementation in the project with some minor changes/refactoring done by me.</p>
<p>Let take a look at the code!</p>
<p>This first part is the display (an ASP.NET app.)</p>
<pre class="brush: jscript;">
&lt;%@ Page Language=&quot;C#&quot; AutoEventWireup=&quot;true&quot; CodeBehind=&quot;Default.aspx.cs&quot; Inherits=&quot;LSITest._Default&quot; %&gt;
&lt;%@ Register Assembly=&quot;DundasWebChart&quot; Namespace=&quot;Dundas.Charting.WebControl&quot; TagPrefix=&quot;DCWC&quot; %&gt;
&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt;

&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; &gt;
&lt;head runat=&quot;server&quot;&gt;
    &lt;title&gt;LSI Test&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
    &lt;form id=&quot;form1&quot; runat=&quot;server&quot;&gt;
    &lt;div&gt;
        &lt;DCWC:Chart ID=&quot;Chart1&quot; runat=&quot;server&quot; Height=&quot;400px&quot; Width=&quot;400px&quot;
            ImageType=&quot;Jpeg&quot;&gt;
            &lt;Legends&gt;
                &lt;DCWC:Legend Name=&quot;Default&quot; Alignment=&quot;Center&quot; Docking=&quot;Bottom&quot;&gt;&lt;/DCWC:Legend&gt;
            &lt;/Legends&gt;
            &lt;Titles&gt;
                &lt;DCWC:Title Name=&quot;Title1&quot;&gt;
                &lt;/DCWC:Title&gt;
            &lt;/Titles&gt;
            &lt;Series&gt;
                &lt;DCWC:Series Name=&quot;Series1&quot; ChartType=&quot;Point&quot; MarkerBorderColor=&quot;64, 64, 64&quot;
                    ShadowOffset=&quot;1&quot;&gt;
                &lt;/DCWC:Series&gt;
                &lt;DCWC:Series Name=&quot;Series2&quot; ChartType=&quot;Point&quot; MarkerBorderColor=&quot;64, 64, 64&quot;
                    ShadowOffset=&quot;1&quot;&gt;
                &lt;/DCWC:Series&gt;
                &lt;DCWC:Series Name=&quot;Series3&quot; ChartType=&quot;Point&quot; MarkerBorderColor=&quot;64, 64, 64&quot;
                    ShadowOffset=&quot;1&quot;&gt;
                &lt;/DCWC:Series&gt;
            &lt;/Series&gt;
            &lt;ChartAreas&gt;
                &lt;DCWC:ChartArea Name=&quot;Series2&quot;&gt;
                    &lt;axisy interval=&quot;0.5&quot; maximum=&quot;2&quot; minimum=&quot;-1&quot;&gt;
                        &lt;majorgrid linecolor=&quot;Gray&quot; linestyle=&quot;Dash&quot; /&gt;
                    &lt;/axisy&gt;
                    &lt;axisx interval=&quot;0.5&quot; maximum=&quot;2.5&quot; minimum=&quot;-0.5&quot;&gt;
                        &lt;majorgrid linecolor=&quot;Gray&quot; linestyle=&quot;Dash&quot; /&gt;
                    &lt;/axisx&gt;
                &lt;/DCWC:ChartArea&gt;
            &lt;/ChartAreas&gt;
        &lt;/DCWC:Chart&gt;
    &lt;/div&gt;
    &lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre>
<p>This is the code behind for the ASP.NET page. Because we are only dealing with two known categories K-Means is plotting out the two categories and if you wanted to do more you would definitely have to re-write the ColorCodeDocuments function.</p>
<pre class="brush: jscript;">
using System;
using System.Data;
using System.Drawing;
using System.Web.UI;
using Dundas.Charting.WebControl;

namespace LSITest
{
    public partial class _Default : Page
    {
        /// &lt;summary&gt;
        /// Handles the Load event of the Page control.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;sender&quot;&gt;The source of the event.&lt;/param&gt;
        /// &lt;param name=&quot;e&quot;&gt;The &lt;see cref=&quot;System.EventArgs&quot;/&gt; instance containing the event data.&lt;/param&gt;
        protected void Page_Load(object sender, EventArgs e)
        {
            // Perform LSI
            var mylsi = new lsi();
            mylsi.LSITest();
            double[,] myDocs = mylsi.MyDocs;

            // Plot Documents and the k-means
            const string distanceType = &quot;manhattan&quot;;
            PlotDocuments(myDocs, mylsi.MyDocRowCount);
            PlotKMeansPoints(myDocs, 2, distanceType);
            ColorCodeDocuments(distanceType);

            // If you want to plot the words just un-comment the next two lines
            //double[,] myWords = mylsi.MyWords;
            //PlotWords(myDocs, mylsi.MyWordsRowCount);

            // comment this line out to show words in legend
            Chart1.Series[&quot;Series2&quot;].ShowInLegend = false;
        }

        /// &lt;summary&gt;
        /// Plots the words.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myWords&quot;&gt;My words.&lt;/param&gt;
        /// &lt;param name=&quot;myWordsRowCount&quot;&gt;My words row count.&lt;/param&gt;
        private void PlotWords(double[,] myWords, int myWordsRowCount)
        {
            for (int i = 0; i &lt; myWordsRowCount; i++)
            {
                Chart1.Series[&quot;Series2&quot;].Points.AddXY(myWords[i, 0], myWords[i, 1]);
            }

            // Set point colors and shapes
            Chart1.Series[&quot;Series2&quot;].LegendText = &quot;Words&quot;;
            Chart1.Series[&quot;Series2&quot;].Color = Color.Gray;
            Chart1.Series[&quot;Series2&quot;].MarkerStyle = MarkerStyle.Circle;
            Chart1.Series[&quot;Series2&quot;].MarkerSize = 6;
        }

        /// &lt;summary&gt;
        /// Plots the documents.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myDocs&quot;&gt;My docs.&lt;/param&gt;
        /// &lt;param name=&quot;myDocRowCount&quot;&gt;My doc row count.&lt;/param&gt;
        private void PlotDocuments(double[,] myDocs, int myDocRowCount)
        {
            // Load documents
            for (int i = 0; i &lt; myDocRowCount; i++)
            {
                Chart1.Series[&quot;Series1&quot;].Points.AddXY(myDocs[i, 0], myDocs[i, 1]);
            }

            // Set point colors and shapes
            Chart1.Series[&quot;Series1&quot;].LegendText = &quot;Documents&quot;;
            Chart1.Series[&quot;Series1&quot;].Color = Color.Red;
            Chart1.Series[&quot;Series1&quot;].MarkerStyle = MarkerStyle.Diamond;
            Chart1.Series[&quot;Series1&quot;].MarkerSize = 12;
        }

        /// &lt;summary&gt;
        /// Plots the K means points.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;items&quot;&gt;The items.&lt;/param&gt;
        /// &lt;param name=&quot;k&quot;&gt;The k.&lt;/param&gt;
        /// &lt;param name=&quot;distanceType&quot;&gt;&lt;/param&gt;
        private void PlotKMeansPoints(double[,] items, int k, string distanceType)
        {
            ClusterCollection clusters = kmeans.ClusterDataSet(k, items, distanceType);

            for (int i = 0; i &lt; clusters.Count; i++)
            {
                Chart1.Series[&quot;Series3&quot;].Points.AddXY(clusters[i].ClusterMean[0], clusters[i].ClusterMean[1]);
            }

            // Set point colors and shapes
            Chart1.Series[&quot;Series3&quot;].LegendText = &quot;Cluster&quot;;
            Chart1.Series[&quot;Series3&quot;].Color = Color.Gold;
            Chart1.Series[&quot;Series3&quot;].MarkerStyle = MarkerStyle.Star6;
            Chart1.Series[&quot;Series3&quot;].MarkerSize = 18;
        }

        /// &lt;summary&gt;
        /// Colors the code documents.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;distanceType&quot;&gt;Type of the distance.&lt;/param&gt;
        private void ColorCodeDocuments(string distanceType)
        {
            var myDist = new similarity();

            // Extract data
            DataSet myDocs = Chart1.DataManipulator.ExportSeriesValues(&quot;Series1&quot;);
            DataSet myKMeansPoints = Chart1.DataManipulator.ExportSeriesValues(&quot;Series3&quot;);

            // Document counter
            int count = 0;

            // Get co-ordinates for k-means points
            double firstKMeansX = Convert.ToDouble(myKMeansPoints.Tables[0].Rows[0][&quot;X&quot;]);
            double firstKMeansY = Convert.ToDouble(myKMeansPoints.Tables[0].Rows[0][&quot;Y&quot;]);
            double secondKMeansX = Convert.ToDouble(myKMeansPoints.Tables[0].Rows[1][&quot;X&quot;]);
            double secondKMeansY = Convert.ToDouble(myKMeansPoints.Tables[0].Rows[1][&quot;Y&quot;]);

            foreach (DataRow docRow in myDocs.Tables[0].Rows)
            {
                // get co-ordinates for current doc
                double currentDocX = Convert.ToDouble(docRow[&quot;X&quot;]);
                double currentDocY = Convert.ToDouble(docRow[&quot;Y&quot;]);

                // load in to arrays
                double[] firstX = {currentDocX, currentDocY};
                double[] firstY = {firstKMeansX, firstKMeansY};
                double[] secondX = {currentDocX, currentDocY};
                double[] secondY = {secondKMeansX, secondKMeansY};

                // find the distance
                double firstDist = myDist.FindDistance(firstX, firstY, distanceType);
                double secondDist = myDist.FindDistance(secondX, secondY, distanceType);

                // Color accordingly
                Chart1.Series[&quot;Series1&quot;].Points[count].Color = firstDist &lt; secondDist ? Color.Blue : Color.Gray;
                count++;
            }
        }
    }
}
</pre>
<p>This is the K-Means class written by Aresh Saharkhiz with my changes</p>
<pre class="brush: jscript;">
/// Most of this code was written by Aresh Saharkhiz
/// Re-organized by me
/// See Code Project: http://www.codeproject.com/KB/recipes/K-Mean_Clustering.aspx
using System;
using System.Collections;
using System.Data;
using System.Diagnostics;

namespace LSITest
{
    public class kmeans
    {
        /// &lt;summary&gt;
        /// Calculates The Mean Of A Cluster OR The Cluster Center
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;cluster&quot;&gt;
        /// A two-dimensional array containing a dataset of numeric values
        /// &lt;/param&gt;
        /// &lt;returns&gt;
        /// Returns an Array Defining A Data Point Representing The Cluster Mean or Centroid
        /// &lt;/returns&gt;
        public static double[] ClusterMean(double[,] cluster)
        {
            int rowCount = cluster.GetUpperBound(0) + 1;
            int fieldCount = cluster.GetUpperBound(1) + 1;
            var dataSum = new double[1,fieldCount];
            var centroid = new double[fieldCount];

            for (int j = 0; j &lt; fieldCount; j++)
            {
                for (int i = 0; i &lt; rowCount; i++)
                {
                    dataSum[0, j] = dataSum[0, j] + cluster[i, j];
                }

                centroid[j] = (dataSum[0, j]/rowCount);
            }

            return centroid;
        }

        /// &lt;summary&gt;
        /// Seperates a dataset into clusters or groups with similar characteristics
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;clusterCount&quot;&gt;The number of clusters or groups to form&lt;/param&gt;
        /// &lt;param name=&quot;data&quot;&gt;An array containing data that will be clustered&lt;/param&gt;
        /// &lt;param name=&quot;type&quot;&gt;&lt;/param&gt;
        /// &lt;returns&gt;A collection of clusters of data&lt;/returns&gt;
        public static ClusterCollection ClusterDataSet(int clusterCount, double[,] data, string type)
        {
            int rowCount = data.GetUpperBound(0) + 1;
            int fieldCount = data.GetUpperBound(1) + 1;
            int stableClustersCount = 0;
            double[] dataPoint;
            var random = new Random();
            Cluster cluster;
            var clusters = new ClusterCollection();
            var clusterNumbers = new ArrayList(clusterCount);
            var myDist = new similarity();

            while (clusterNumbers.Count &lt; clusterCount)
            {
                int clusterNumber = random.Next(0, rowCount - 1);

                if (!clusterNumbers.Contains(clusterNumber))
                {
                    cluster = new Cluster();
                    clusterNumbers.Add(clusterNumber);
                    dataPoint = new double[fieldCount];

                    for (int field = 0; field &lt; fieldCount; field++)
                    {
                        dataPoint.SetValue((data[clusterNumber, field]), field);
                    }

                    cluster.Add(dataPoint);
                    clusters.Add(cluster);
                }
            }

            while (stableClustersCount != clusters.Count)
            {
                stableClustersCount = 0;
                ClusterCollection newClusters = ClusterDataSet(clusters, data, type);

                for (int clusterIndex = 0; clusterIndex &lt; clusters.Count; clusterIndex++)
                {
                    if ((myDist.FindDistance(newClusters[clusterIndex].ClusterMean, clusters[clusterIndex].ClusterMean, type)) == 0)
                    {
                        stableClustersCount++;
                    }
                }

                clusters = newClusters;
            }

            return clusters;
        }

        /// &lt;summary&gt;
        /// Seperates a dataset into clusters or groups with similar characteristics
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;clusters&quot;&gt;A collection of data clusters&lt;/param&gt;
        /// &lt;param name=&quot;data&quot;&gt;An array containing data to b eclustered&lt;/param&gt;
        /// &lt;param name=&quot;type&quot;&gt;&lt;/param&gt;
        /// &lt;returns&gt;A collection of clusters of data&lt;/returns&gt;
        public static ClusterCollection ClusterDataSet(ClusterCollection clusters, double[,] data, string type)
        {
            double[] dataPoint;
            double firstClusterDistance = 0.0;
            int rowCount = data.GetUpperBound(0) + 1;
            int fieldCount = data.GetUpperBound(1) + 1;
            int position = 0;
            var myDist = new similarity();

            // create a new collection of clusters
            var newClusters = new ClusterCollection();

            for (int count = 0; count &lt; clusters.Count; count++)
            {
                var newCluster = new Cluster();
                newClusters.Add(newCluster);
            }

            if (clusters.Count &lt;= 0)
            {
                throw new SystemException(&quot;Cluster Count Cannot Be Zero!&quot;);
            }

            for (int row = 0; row &lt; rowCount; row++)
            {
                dataPoint = new double[fieldCount];

                for (int field = 0; field &lt; fieldCount; field++)
                {
                    dataPoint.SetValue((data[row, field]), field);
                }

                for (int cluster = 0; cluster &lt; clusters.Count; cluster++)
                {
                    double[] clusterMean = clusters[cluster].ClusterMean;

                    if (cluster == 0)
                    {
                        firstClusterDistance = myDist.FindDistance(dataPoint, clusterMean, type);
                        position = cluster;
                    }
                    else
                    {
                        double secondClusterDistance = myDist.FindDistance(dataPoint, clusterMean, type);

                        if (firstClusterDistance &gt; secondClusterDistance)
                        {
                            firstClusterDistance = secondClusterDistance;
                            position = cluster;
                        }
                    }
                }

                newClusters[position].Add(dataPoint);
            }

            return newClusters;
        }

        /// &lt;summary&gt;
        /// Converts the data table to array.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;table&quot;&gt;The table.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public static double[,] ConvertDataTableToArray(DataTable table)
        {
            int rowCount = table.Rows.Count;
            int fieldCount = table.Columns.Count;

            var dataPoints = new double[rowCount,fieldCount];

            for (int rowPosition = 0; rowPosition &lt; rowCount; rowPosition++)
            {
                DataRow row = table.Rows[rowPosition];

                for (int fieldPosition = 0; fieldPosition &lt; fieldCount; fieldPosition++)
                {
                    double fieldValue;
                    try
                    {
                        fieldValue = double.Parse(row[fieldPosition].ToString());
                    }
                    catch (Exception ex)
                    {
                        Debug.WriteLine(ex.ToString());
                        throw new InvalidCastException(&quot;Invalid row at &quot; + rowPosition + &quot; and field &quot; + fieldPosition,
                                                       ex);
                    }

                    dataPoints[rowPosition, fieldPosition] = fieldValue;
                }
            }

            return dataPoints;
        }
    }

    /// &lt;summary&gt;
    /// A class containing a group of data with similar characteristics (cluster)
    /// &lt;/summary&gt;
    [Serializable]
    public class Cluster : CollectionBase
    {
        private double[] _clusterMean;
        private double[] _clusterSum;

        /// &lt;summary&gt;
        /// The sum of all the data in the cluster
        /// &lt;/summary&gt;
        public double[] ClusterSum
        {
            get { return _clusterSum; }
        }

        /// &lt;summary&gt;
        /// The mean of all the data in the cluster
        /// &lt;/summary&gt;
        public double[] ClusterMean
        {
            get
            {
                for (int count = 0; count &lt; this[0].Length; count++)
                {
                    _clusterMean[count] = (_clusterSum[count]/List.Count);
                }

                return _clusterMean;
            }
        }

        /// &lt;summary&gt;
        /// Returns the one dimensional array data located at the index
        /// &lt;/summary&gt;
        public virtual double[] this[int index]
        {
            get
            {
                //return the Neuron at IList[index]
                return (double[]) List[index];
            }
        }

        /// &lt;summary&gt;
        /// Adds a single dimension array data to the cluster
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;data&quot;&gt;A 1-dimensional array containing data that will be added to the cluster&lt;/param&gt;
        public virtual void Add(double[] data)
        {
            List.Add(data);

            if (List.Count == 1)
            {
                _clusterSum = new double[data.Length];

                _clusterMean = new double[data.Length];
            }

            for (int count = 0; count &lt; data.Length; count++)
            {
                _clusterSum[count] = _clusterSum[count] + data[count];
            }
        }
    }

    /// &lt;summary&gt;
    /// A collection of Cluster objects or Clusters
    /// &lt;/summary&gt;
    [Serializable]
    public class ClusterCollection : CollectionBase
    {
        /// &lt;summary&gt;
        /// Returns the Cluster at this index
        /// &lt;/summary&gt;
        public virtual Cluster this[int index]
        {
            get
            {
                //return the Neuron at IList[index]
                return (Cluster) List[index];
            }
        }

        /// &lt;summary&gt;
        /// Adds a Cluster to the collection of Clusters
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;cluster&quot;&gt;A Cluster to be added to the collection of clusters&lt;/param&gt;
        public virtual void Add(Cluster cluster)
        {
            List.Add(cluster);
        }
    }
}
</pre>
<p>Here is the similarity class than can calculate Euclidean, Manhattan, Chebyshev, Minkowski distances</p>
<pre class="brush: jscript;">
/// Most of this code was written by Aresh Saharkhiz
/// Re-organized by me
/// See Code Project: http://www.codeproject.com/KB/recipes/Quantitative_Distances.aspx
using System;

namespace LSITest
{
    public class similarity
    {
        /// &lt;summary&gt;
        /// Finds the distance.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;x&quot;&gt;The x.&lt;/param&gt;
        /// &lt;param name=&quot;y&quot;&gt;The y.&lt;/param&gt;
        /// &lt;param name=&quot;type&quot;&gt;The type.&lt;/param&gt;
        /// &lt;param name=&quot;distanceType&quot;&gt;&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public double FindDistance(double[] x, double[] y, string distanceType)
        {
            double distance;

            switch (distanceType.ToLower())
            {
                case &quot;euclidean&quot;:
                    distance = EuclideanDistance(x, y);
                    break;
                case &quot;manhattan&quot;:
                    distance = ManhattanDistance(x, y);
                    break;
                case &quot;minkowski&quot;:
                    distance = MinkowskiDistance(x, y, 1);
                    break;
                case &quot;chebyshev&quot;:
                    distance = ChebyshevDistance(x, y);
                    break;
                default:
                    distance = 0.0;
                    break;
            }

            return distance;
        }

        /// &lt;summary&gt;
        /// Finds the Euclideans distance.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;x&quot;&gt;The x.&lt;/param&gt;
        /// &lt;param name=&quot;y&quot;&gt;The y.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public double EuclideanDistance(double[] x, double[] y)
        {
            double sum = 0.0;

            if (x.GetUpperBound(0) != y.GetUpperBound(0))
            {
                throw new ArgumentException(&quot;the number of elements in x must match the number of elements in y&quot;);
            }

            int count = x.Length;

            for (int i = 0; i &lt; count; i++)
            {
                sum += Math.Pow(Math.Abs(x[i] - y[i]), 2);
            }

            double distance = Math.Sqrt(sum);
            return distance;
        }

        /// &lt;summary&gt;
        /// Finds Manhattan distance.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;x&quot;&gt;The x.&lt;/param&gt;
        /// &lt;param name=&quot;y&quot;&gt;The y.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public double ManhattanDistance(double[] x, double[] y)
        {
            double sum = 0.0;

            if (x.GetUpperBound(0) != y.GetUpperBound(0))
            {
                throw new ArgumentException(&quot;the number of elements in x must match the number of elements in y&quot;);
            }

            int count = x.Length;

            for (int i = 0; i &lt; count; i++)
            {
                sum += Math.Abs(x[i] - y[i]);
            }

            double distance = sum;
            return distance;
        }

        /// &lt;summary&gt;
        /// Finds Chebyshevs distance.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;x&quot;&gt;The x.&lt;/param&gt;
        /// &lt;param name=&quot;y&quot;&gt;The y.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public static double ChebyshevDistance(double[] x, double[] y)
        {
            if (x.GetUpperBound(0) != y.GetUpperBound(0))
            {
                throw new ArgumentException(&quot;the number of elements in x must match the number of elements in y&quot;);
            }
            int count = x.Length;
            var newData = new double[count];

            for (int i = 0; i &lt; count; i++)
            {
                newData[i] = Math.Abs(x[i] - y[i]);
            }
            double max = double.MinValue;

            foreach (double num in newData)
            {
                if (num &gt; max)
                {
                    max = num;
                }
            }
            return max;
        }

        /// &lt;summary&gt;
        /// Finds Minkowskis distance.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;x&quot;&gt;The x.&lt;/param&gt;
        /// &lt;param name=&quot;y&quot;&gt;The y.&lt;/param&gt;
        /// &lt;param name=&quot;order&quot;&gt;The order.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public double MinkowskiDistance(double[] x, double[] y, double order)
        {
            double sum = 0.0;

            if (x.GetUpperBound(0) != y.GetUpperBound(0))
            {
                throw new ArgumentException(&quot;the number of elements in x must match the number of elements in y&quot;);
            }
            int count = x.Length;

            for (int i = 0; i &lt; count; i++)
            {
                sum = sum + Math.Pow(Math.Abs(x[i] - y[i]), order);
            }

            double distance = Math.Pow(sum, (1 / order));
            return distance;
        }
    }
}
</pre>
<p>And finally the same LSI class used in the previous examples.</p>
<pre class="brush: jscript;">
using System;
using SmartMathLibrary;

namespace LSITest
{
    public class lsi
    {
        // this returns the formated html results
        public int MyDocColumnCount;
        public int MyDocRowCount;
        public double[,] MyDocs;
        public double[,] MyWords;
        public int MyWordsColumnCount;
        public int MyWordsRowCount;
        public string ToPrint;

        /// &lt;summary&gt;
        /// LISs the test.
        /// &lt;/summary&gt;
        public void LSITest()
        {
            //Create Matrix
            var testArray = new double[,]
                                {
                                    {1, 0, 0, 1, 0, 0, 0, 0, 0},
                                    {1, 0, 1, 0, 0, 0, 0, 0, 0},
                                    {1, 1, 0, 0, 0, 0, 0, 0, 0},
                                    {0, 1, 1, 0, 1, 0, 0, 0, 0},
                                    {0, 1, 1, 2, 0, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 1, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 1, 0, 0, 0, 0},
                                    {0, 0, 1, 1, 0, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 0, 0, 0, 0, 1},
                                    {0, 0, 0, 0, 0, 1, 1, 1, 0},
                                    {0, 0, 0, 0, 0, 0, 1, 1, 1},
                                    {0, 0, 0, 0, 0, 0, 0, 1, 1}
                                };

            // Load array in to Matrix
            var a = new Matrix(testArray);

            // print original matrix
            PrintMatrix(a);

            // preform Latent Semantic Indexing
            GetDocumentWordPlots(a);
        }

        /// &lt;summary&gt;
        /// Prints the matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        private void PrintMatrix(IMatrix myMatrix)
        {
            ToPrint += &quot;&lt;br /&gt;&lt;br /&gt;&quot;;

            for (int i = 0; i &lt; myMatrix.Rows; i++)
            {
                for (int j = 0; j &lt; myMatrix.Columns; j++)
                {
                    ToPrint += String.Format(&quot;{0:0.##}&quot;, myMatrix.MatrixData[i, j]) + &quot;\t&quot;;
                }
                ToPrint += &quot;&lt;br /&gt;&quot;;
            }
        }

        /// &lt;summary&gt;
        /// Gets the document word plots.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        private void GetDocumentWordPlots(Matrix myMatrix)
        {
            // Run single value decomposition
            var svd = new SingularValueDecomposition(myMatrix);
            svd.ExecuteDecomposition();

            // Put components into individual matrices
            Matrix wordVector = svd.U.Copy();
            Matrix sigma = svd.S.ToMatrix();
            Matrix documentVector = svd.V.Copy();

            // get value of k
            // you can also manually set the value of k
            var k = (int) Math.Floor(Math.Sqrt(myMatrix.Columns));

            // reduce the vectors
            Matrix reducedWordVector = CopyMatrix(wordVector, wordVector.Rows, k - 1);
            Matrix reducedSigma = CreateSigmaMatrix(sigma, k - 1, k - 1);
            Matrix reducedDocumentVector = CopyMatrix(documentVector, documentVector.Rows, k - 1);

            // Recalculate the matrix
            Matrix docs = reducedDocumentVector*reducedSigma;
            Matrix words = reducedWordVector*reducedSigma;

            // Fill doc plot locations
            MyDocs = new double[docs.Rows,docs.Columns];
            for (int i = 0; i &lt; docs.Rows; i++)
            {
                for (int j = 0; j &lt; docs.Columns; j++)
                {
                    MyDocs[i, j] = docs.MatrixData[i, j];
                }
            }

            // Fill word plot locations
            MyWords = new double[words.Rows,words.Columns];
            for (int i = 0; i &lt; words.Rows; i++)
            {
                for (int j = 0; j &lt; words.Columns; j++)
                {
                    MyWords[i, j] = words.MatrixData[i, j];
                }
            }

            // Set counts for charts
            MyDocRowCount = docs.Rows;
            MyWordsRowCount = words.Rows;

            PrintMatrix(docs);
            PrintMatrix(words);
        }

        /// &lt;summary&gt;
        /// Creates the sigma matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;matrix&quot;&gt;The matrix.&lt;/param&gt;
        /// &lt;param name=&quot;rowEnd&quot;&gt;The row end.&lt;/param&gt;
        /// &lt;param name=&quot;columnEnd&quot;&gt;The column end.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private static Matrix CreateSigmaMatrix(IMatrix matrix, int rowEnd, int columnEnd)
        {
            var copyMatrix = new Matrix(rowEnd, columnEnd);

            for (int i = 0; i &lt; columnEnd; i++)
            {
                copyMatrix.MatrixData[i, i] = matrix.MatrixData[i, 0];
            }

            return copyMatrix;
        }

        /// &lt;summary&gt;
        /// Copies the matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        /// &lt;param name=&quot;rowEnd&quot;&gt;The row end.&lt;/param&gt;
        /// &lt;param name=&quot;columnEnd&quot;&gt;The column end.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private static Matrix CopyMatrix(IMatrix myMatrix, int rowEnd, int columnEnd)
        {
            var copyMatrix = new Matrix(rowEnd, columnEnd);

            for (int i = 0; i &lt; rowEnd; i++)
            {
                for (int j = 0; j &lt; columnEnd; j++)
                {
                    copyMatrix.MatrixData[i, j] = myMatrix.MatrixData[i, j];
                }
            }

            return copyMatrix;
        }
    }
}
</pre>
<p>And what do the results look like?</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/kmeansresults.jpg"><img class="alignnone size-full wp-image-361" style="margin-left: 100px; margin-right: 100px;" title="kmeansresults" src="http://eric.ness.net/wp-content/uploads/2009/11/kmeansresults.jpg" alt="kmeansresults" width="400" height="400" /></a></p>
<p>As you can see the K-Means clustering algorithm correctly grouped the documents in the appropriate categories.</p>
<p>Recommended reading and thanks goes to <a href="http://www.codeproject.com/KB/recipes/K-Mean_Clustering.aspx">Aresh Saharkhiz</a> for sharing his implementation of K-Means Clustering.</p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/k-means-document-clustering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plotting Documents &amp; Words: Using Latent Semantic Indexing</title>
		<link>http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/</link>
		<comments>http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 00:32:22 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=338</guid>
		<description><![CDATA[Plotting Documents &#038; Words from LSI results]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;n=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;title=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;title=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;t=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;title=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/&amp;title=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Plotting+Documents+%26+Words%3A+Using+Latent+Semantic+Indexing+-+http://b2l.me/kya8g&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>In the<a href="http://eric.ness.net/archives/latent-semantic-indexing/"> last blog post</a> we looked over a couple of great papers talking about using <a href="http://eric.ness.net/archives/singular-value-decomposition/">Singular Value Decomposition</a> (SVD) to do <a href="http://eric.ness.net/archives/latent-semantic-indexing/">Latent Semantic Indexing</a> (LSI) using the <a href="http://smartmathlibrary.codeplex.com/">SmartMathLibrary</a>. Now that we have the results we should plot them to get a sense of where these words and documents lay on a two dimensional Cartesian plane.</p>
<p style="text-align: left;">Jennifer Flynnâ€™s presentation &#8220;<a href="http://www.soe.ucsc.edu/classes/cmps290c/Spring07/proj/Flynn_talk.pdf">Latent Semantic Indexing Using SVD and Riemannian SVD</a>&#8221; actually goes on to tell us how to do this. Essentially the process is the same as before however, k must equal 2. We ended up with k = 2 in our previous example however, in larger examples k will more than likely be a different number. Regardless, here we want to end up with a matrix with two columns giving us our (x,y) &#8211; if you wanted to plot these items in a three dimensional space k=3 and if you find an awesome way to plot where k=5 e-mail me.Â  The formulas we use are as follows after we have performed SVD.</p>
<blockquote>
<p style="text-align: left;"><strong>Documents = U*âˆ‘</strong></p>
<p style="text-align: left;"><strong>Words = V*âˆ‘</strong></p>
</blockquote>
<p style="text-align: left;">The resulting matrices give us our (x,y) co-ordinates that we can then plot. I have been using the Dundas charting library for over two years now but the library is expensive so you should go and get the free library <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=130f7986-bf49-4fe5-9ca8-910ae6ea442c&amp;DisplayLang=en">here</a> since Microsoft acquired them and is a free download. And again for simplicities sake, this project is just a simple ASP.NET application.</p>
<p style="text-align: left;">The LSI Class:</p>
<p style="text-align: left;">Please note that this is almost exactly the same as in the previous blog however, here at the end of the GetDocumentWordPlots function we use the formulas mention above to load the co-ordinates of the words and documents in to a double array that we will ultimately pass to the chart.</p>
<pre class="brush: jscript;">
using System;
using SmartMathLibrary;

namespace LSITest
{
    public class lsi
    {
        // this returns the formated html results
        public int MyDocColumnCount;
        public int MyDocRowCount;
        public double[,] MyDocs;
        public double[,] MyWords;
        public int MyWordsColumnCount;
        public int MyWordsRowCount;
        public string ToPrint;

        /// &lt;summary&gt;
        /// LISs the test.
        /// &lt;/summary&gt;
        public void LSITest()
        {
            //Create Matrix
            var testArray = new double[,]
                                {
                                    {1, 0, 0, 1, 0, 0, 0, 0, 0},
                                    {1, 0, 1, 0, 0, 0, 0, 0, 0},
                                    {1, 1, 0, 0, 0, 0, 0, 0, 0},
                                    {0, 1, 1, 0, 1, 0, 0, 0, 0},
                                    {0, 1, 1, 2, 0, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 1, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 1, 0, 0, 0, 0},
                                    {0, 0, 1, 1, 0, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 0, 0, 0, 0, 1},
                                    {0, 0, 0, 0, 0, 1, 1, 1, 0},
                                    {0, 0, 0, 0, 0, 0, 1, 1, 1},
                                    {0, 0, 0, 0, 0, 0, 0, 1, 1}
                                };

            // Load array in to Matrix
            var a = new Matrix(testArray);

            // print original matrix
            PrintMatrix(a);

            // preform Latent Semantic Indexing
            GetDocumentWordPlots(a);
        }

        /// &lt;summary&gt;
        /// Prints the matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        private void PrintMatrix(IMatrix myMatrix)
        {
            ToPrint += &quot;&lt;br /&gt;&lt;br /&gt;&quot;;

            for (int i = 0; i &lt; myMatrix.Rows; i++)
            {
                for (int j = 0; j &lt; myMatrix.Columns; j++)
                {
                    ToPrint += String.Format(&quot;{0:0.##}&quot;, myMatrix.MatrixData[i, j]) + &quot;\t&quot;;
                }
                ToPrint += &quot;&lt;br /&gt;&quot;;
            }
        }

        /// &lt;summary&gt;
        /// Gets the document word plots.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        private void GetDocumentWordPlots(Matrix myMatrix)
        {
            // Run single value decomposition
            var svd = new SingularValueDecomposition(myMatrix);
            svd.ExecuteDecomposition();

            // Put components into individual matrices
            Matrix wordVector = svd.U.Copy();
            Matrix sigma = svd.S.ToMatrix();
            Matrix documentVector = svd.V.Copy();

            // get value of k
            var k = 2;

            // reduce the vectors
            Matrix reducedWordVector = CopyMatrix(wordVector, wordVector.Rows, k - 1);
            Matrix reducedSigma = CreateSigmaMatrix(sigma, k - 1, k - 1);
            Matrix reducedDocumentVector = CopyMatrix(documentVector, documentVector.Rows, k - 1);

            // Recalculate the matrix
            Matrix docs = reducedDocumentVector*reducedSigma;
            Matrix words = reducedWordVector*reducedSigma;

            // Fill doc plot locations
            MyDocs = new double[docs.Rows,docs.Columns];
            for (int i = 0; i &lt; docs.Rows; i++)
            {
                for (int j = 0; j &lt; docs.Columns; j++)
                {
                    MyDocs[i, j] = docs.MatrixData[i, j];
                }
            }

            // Fill word plot locations
            MyWords = new double[words.Rows,words.Columns];
            for (int i = 0; i &lt; words.Rows; i++)
            {
                for (int j = 0; j &lt; words.Columns; j++)
                {
                    MyWords[i, j] = words.MatrixData[i, j];
                }
            }

            // Set counts for charts
            MyDocRowCount = docs.Rows;
            MyWordsRowCount = words.Rows;

            PrintMatrix(docs);
            PrintMatrix(words);
        }

        /// &lt;summary&gt;
        /// Creates the sigma matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;matrix&quot;&gt;The matrix.&lt;/param&gt;
        /// &lt;param name=&quot;rowEnd&quot;&gt;The row end.&lt;/param&gt;
        /// &lt;param name=&quot;columnEnd&quot;&gt;The column end.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private static Matrix CreateSigmaMatrix(IMatrix matrix, int rowEnd, int columnEnd)
        {
            var copyMatrix = new Matrix(rowEnd, columnEnd);

            for (int i = 0; i &lt; columnEnd; i++)
            {
                copyMatrix.MatrixData[i, i] = matrix.MatrixData[i, 0];
            }

            return copyMatrix;
        }

        /// &lt;summary&gt;
        /// Copies the matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        /// &lt;param name=&quot;rowEnd&quot;&gt;The row end.&lt;/param&gt;
        /// &lt;param name=&quot;columnEnd&quot;&gt;The column end.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private static Matrix CopyMatrix(IMatrix myMatrix, int rowEnd, int columnEnd)
        {
            var copyMatrix = new Matrix(rowEnd, columnEnd);

            for (int i = 0; i &lt; rowEnd; i++)
            {
                for (int j = 0; j &lt; columnEnd; j++)
                {
                    copyMatrix.MatrixData[i, j] = myMatrix.MatrixData[i, j];
                }
            }

            return copyMatrix;
        }
    }
}
</pre>
<p>Here is the code behind for the web page that displays the chart and the matrices. Essentially we iterate through the double array pulled from the LSI class and load them in to a chart series.</p>
<pre class="brush: jscript;">
using System;
using System.Drawing;
using System.Web.UI;
using Dundas.Charting.WebControl;

namespace LSITest
{
    public partial class _Default : Page
    {
        /// &lt;summary&gt;
        /// Handles the Load event of the Page control.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;sender&quot;&gt;The source of the event.&lt;/param&gt;
        /// &lt;param name=&quot;e&quot;&gt;The &lt;see cref=&quot;System.EventArgs&quot;/&gt; instance containing the event data.&lt;/param&gt;
        protected void Page_Load(object sender, EventArgs e)
        {
            var mylsi = new lsi();
            mylsi.LSITest();
            Label1.Text = mylsi.ToPrint;
            double[,] myDocs = mylsi.MyDocs;
            double[,] myWords = mylsi.MyWords;

            // Load documents
            for (int i = 0; i &lt; mylsi.MyDocRowCount; i++)
            {
                Chart1.Series[&quot;Series1&quot;].Points.AddXY(myDocs[i, 0], myDocs[i, 1]);
            }

            // Load words
            for (int i = 0; i &lt; mylsi.MyWordsRowCount; i++)
            {
                Chart1.Series[&quot;Series2&quot;].Points.AddXY(myWords[i, 0], myWords[i, 1]);
            }

            // Set title
            Chart1.Series[&quot;Series1&quot;].LegendText = &quot;Documents&quot;;
            Chart1.Series[&quot;Series2&quot;].LegendText = &quot;Words&quot;;

            // Set point colors and shapes
            Chart1.Series[&quot;Series1&quot;].Color = Color.Red;
            Chart1.Series[&quot;Series1&quot;].MarkerStyle = MarkerStyle.Diamond;
            Chart1.Series[&quot;Series1&quot;].MarkerSize = 12;
            Chart1.Series[&quot;Series2&quot;].Color = Color.Gray;
            Chart1.Series[&quot;Series2&quot;].MarkerStyle = MarkerStyle.Circle;
            Chart1.Series[&quot;Series2&quot;].MarkerSize = 6;
        }
    }
}
</pre>
<p>And finally here is the ASP.NET web page.</p>
<pre class="brush: jscript;">
&lt;%@ Page Language=&quot;C#&quot; AutoEventWireup=&quot;true&quot; CodeBehind=&quot;Default.aspx.cs&quot; Inherits=&quot;LSITest._Default&quot; %&gt;
&lt;%@ Register Assembly=&quot;DundasWebChart&quot; Namespace=&quot;Dundas.Charting.WebControl&quot; TagPrefix=&quot;DCWC&quot; %&gt;
&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt;

&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; &gt;
&lt;head runat=&quot;server&quot;&gt;
    &lt;title&gt;LSI Test&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
    &lt;form id=&quot;form1&quot; runat=&quot;server&quot;&gt;
    &lt;div&gt;
        &lt;DCWC:Chart ID=&quot;Chart1&quot; runat=&quot;server&quot; Height=&quot;400px&quot; Width=&quot;400px&quot;
            ImageType=&quot;Jpeg&quot;&gt;
            &lt;Legends&gt;
                &lt;DCWC:Legend Name=&quot;Default&quot; Alignment=&quot;Center&quot; Docking=&quot;Bottom&quot;&gt;&lt;/DCWC:Legend&gt;
            &lt;/Legends&gt;
            &lt;Titles&gt;
                &lt;DCWC:Title Name=&quot;Title1&quot;&gt;
                &lt;/DCWC:Title&gt;
            &lt;/Titles&gt;
            &lt;Series&gt;
                &lt;DCWC:Series Name=&quot;Series1&quot; ChartType=&quot;Point&quot; MarkerBorderColor=&quot;64, 64, 64&quot;
                    ShadowOffset=&quot;1&quot;&gt;
                &lt;/DCWC:Series&gt;
                &lt;DCWC:Series Name=&quot;Series2&quot; ChartType=&quot;Point&quot; MarkerBorderColor=&quot;64, 64, 64&quot;
                    ShadowOffset=&quot;1&quot;&gt;
                &lt;/DCWC:Series&gt;
            &lt;/Series&gt;
            &lt;ChartAreas&gt;
                &lt;DCWC:ChartArea Name=&quot;Series2&quot;&gt;
                    &lt;axisy interval=&quot;0.5&quot; maximum=&quot;2&quot; minimum=&quot;-1&quot;&gt;
                        &lt;majorgrid linecolor=&quot;Gray&quot; linestyle=&quot;Dash&quot; /&gt;
                    &lt;/axisy&gt;
                    &lt;axisx interval=&quot;0.5&quot; maximum=&quot;2.5&quot; minimum=&quot;-0.5&quot;&gt;
                        &lt;majorgrid linecolor=&quot;Gray&quot; linestyle=&quot;Dash&quot; /&gt;
                    &lt;/axisx&gt;
                &lt;/DCWC:ChartArea&gt;
            &lt;/ChartAreas&gt;
        &lt;/DCWC:Chart&gt;
        &lt;br /&gt;
        &lt;asp:Label ID=&quot;Label1&quot; runat=&quot;server&quot; Text=&quot;&quot;&gt;&lt;/asp:Label&gt;
    &lt;/div&gt;
    &lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre>
<p>So lets see the result!</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/documents.jpg"><img class="alignnone size-full wp-image-341" style="margin-left: 100px; margin-right: 100px;" title="documents" src="http://eric.ness.net/wp-content/uploads/2009/11/documents.jpg" alt="documents" width="400" height="400" /></a></p>
<p>Now obviously you could/should probably write this a different way but it gets you to where you need to be.</p>
<p>I would also recommend you read in Flynn&#8217;s presentation on how to compare two words/documents by using the dot product of two row vectors. Or one could also use the <a href="http://eric.ness.net/archives/euclidean-distance-score/">Euclidean Distance Score</a>. And if you are also interested I would recommend Sujit Pal&#8217;s blog post &#8220;<a href="http://sujitpal.blogspot.com/2008/10/ir-math-in-java-cluster-visualization.html">IR Math in Java : Cluster Visualization</a>&#8221; for additional reading.</p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/plotting-documents-words-using-latent-semantic-indexing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Latent Semantic Indexing</title>
		<link>http://eric.ness.net/archives/latent-semantic-indexing/</link>
		<comments>http://eric.ness.net/archives/latent-semantic-indexing/#comments</comments>
		<pubDate>Sun, 01 Nov 2009 15:48:26 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=309</guid>
		<description><![CDATA[Latent Semantic Indexing in C#]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/latent-semantic-indexing/&amp;n=Latent+Semantic+Indexing&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/latent-semantic-indexing/&amp;title=Latent+Semantic+Indexing" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/latent-semantic-indexing/&amp;title=Latent+Semantic+Indexing" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/latent-semantic-indexing/&amp;t=Latent+Semantic+Indexing" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/latent-semantic-indexing/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/latent-semantic-indexing/&amp;title=Latent+Semantic+Indexing" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/latent-semantic-indexing/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/latent-semantic-indexing/&amp;title=Latent+Semantic+Indexing" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Latent+Semantic+Indexing+-+http://b2l.me/kyby8&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>Latent Semantic Indexing (LSI) is commonly described as a &#8220;indexing and retrieval method that uses a mathematical technique called <a href="http://eric.ness.net/archives/singular-value-decomposition/">Singular Value Decomposition</a> (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.&#8221;. To be a bit more clear Sujit Pal has one of the best descriptions of what LSI is and how it occures:</p>
<blockquote><p>Latent Semantic Indexing attempts to uncover latent relationships among documents based on word co-occurence. So if document A contains (w1,w2) and document B contains (w2,w3), we can conclude that there is something common between documents A and B. LSI does this by decomposing the input raw term frequency matrix (A, see below) into three different matrices (U, S and V) using Singular Value Decomposition (SVD). Once that is done, the three vectors are &#8220;reduced&#8221; and the original vector rebuilt from the reduced vectors. Because of the reduction, noisy relationships are suppressed and relations become very clearly visible.</p></blockquote>
<p><strong>So how is this done?</strong></p>
<p>To start with let&#8217;s use the example in &#8220;<a href="http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf">Indexing by Latent Semantic Analysis</a>&#8221; (Deerwester et al.) because you see this example repeated in a number of places on the web. The example in the paper says let&#8217;s take a look at 9 titles of papers that fall in to two categories &#8220;human computer interaction&#8221; &amp; &#8220;graphs &amp; trees&#8221;. <a href="http://eric.ness.net/wp-content/uploads/2009/11/listofwords.jpg"></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/listofwords.jpg"><img class="alignnone size-full wp-image-310" style="margin-left: 100px; margin-right: 100px;" title="listofwords" src="http://eric.ness.net/wp-content/uploads/2009/11/listofwords.jpg" alt="listofwords" width="400" height="505" /></a></p>
<p>In this example the matrix is comprised of the word counts in the different document. The next step is take this matrix and break it down in to it&#8217;s different parts using SVD. The result looks like this:</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/svd.jpg"><img class="alignnone size-full wp-image-320" title="svd" src="http://eric.ness.net/wp-content/uploads/2009/11/svd.jpg" alt="svd" width="600" /></a></p>
<p>After you have preformed SVD on the original matrix you then reduce the individual vectors. What the reduction of the vectors does is get rid of some of the &#8220;noise&#8221; &#8211; exposing the relationship between words and documents.</p>
<p>One question that arises is how much do you want to reduce the vectors (often called k)? There seems to be no hard and fast rule to this as different papers have different approaches/results with different values. In Sujit Pal&#8217;s <a href="http://sujitpal.blogspot.com/2008/09/ir-math-with-java-tf-idf-and-lsi.html">post</a> he uses the square root of the number of columns of the original matrix (m) which is then rounded down minus 1 which is I think a good method to use. It also happens to be the value that is used in the Deerwester paper k=2. The following picture shows what this looks like (please see Jennifer Flynn&#8217;s presentation <a href="http://www.soe.ucsc.edu/classes/cmps290c/Spring07/proj/Flynn_talk.pdf">Latent Semantic Indexing Using SVD and Riemannian SVD</a> for a more elaborate example):</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/reduce.jpg"><img class="alignnone size-full wp-image-323" title="reduce" src="http://eric.ness.net/wp-content/uploads/2009/11/reduce.jpg" alt="reduce" width="600" /></a></p>
<p>After the vectors have been reduced all that is required to do is take the vectors and multiply them back together again and that is it. See the result:</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/11/lsi1.jpg"><img class="alignnone size-full wp-image-326" title="lsi" src="http://eric.ness.net/wp-content/uploads/2009/11/lsi1.jpg" alt="lsi" width="600" /></a></p>
<p>[<strong>Update</strong>: as one of the readers (Jorge) noted v is not exactly correct it should be V.Transpose. Please check out the “Indexing by Latent Semantic Analysis” (Deerwester et al.) paper starting on page 26 for the correct values of the matrices <a rel="nofollow" href="http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf">http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf</a> I will try to update this here shortly]</p>
<p>So lets take a look at the code &#8211; it follows the example outlined in the Deerwester paper and please keep in mind that is just a little class i put together in a asp.net test app that shows a html formatted matrix of the original (m) and the LSI result:</p>
<pre class="brush: jscript;">
using System;
using SmartMathLibrary;

namespace LSITest
{
    public class lsi
    {
        // this returns the formated html results
        public string ToPrint;

        /// &lt;summary&gt;
        /// LISs the test.
        /// &lt;/summary&gt;
        public void LSITest()
        {
            //Create Matrix
            var testArray = new double[,]
                                {
                                    {1, 0, 0, 1, 0, 0, 0, 0, 0},
                                    {1, 0, 1, 0, 0, 0, 0, 0, 0},
                                    {1, 1, 0, 0, 0, 0, 0, 0, 0},
                                    {0, 1, 1, 0, 1, 0, 0, 0, 0},
                                    {0, 1, 1, 2, 0, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 1, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 1, 0, 0, 0, 0},
                                    {0, 0, 1, 1, 0, 0, 0, 0, 0},
                                    {0, 1, 0, 0, 0, 0, 0, 0, 1},
                                    {0, 0, 0, 0, 0, 1, 1, 1, 0},
                                    {0, 0, 0, 0, 0, 0, 1, 1, 1},
                                    {0, 0, 0, 0, 0, 0, 0, 1, 1}
                                };

            // Load array in to Matrix
            var a = new Matrix(testArray);

            // print original matrix
            PrintMatrix(a);

            // preform Latent Semantic Indexing
            Transform(a);
        }

        /// &lt;summary&gt;
        /// Prints the matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        private void PrintMatrix(IMatrix myMatrix)
        {
            ToPrint += &quot;&lt;br /&gt;&lt;br /&gt;&quot;;

            for (int i = 0; i &lt; myMatrix.Rows; i++)
            {
                for (int j = 0; j &lt; myMatrix.Columns; j++)
                {
                    ToPrint += String.Format(&quot;{0:0.##}&quot;, myMatrix.MatrixData[i, j]) + &quot;\t&quot;;
                }
                ToPrint += &quot;&lt;br /&gt;&quot;;
            }
        }

        /// &lt;summary&gt;
        /// Transforms the specified my matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        private void Transform(Matrix myMatrix)
        {
            // Run single value decomposition
            var svd = new SingularValueDecomposition(myMatrix);
            svd.ExecuteDecomposition();

            // Put components into individual matrices
            Matrix wordVector = svd.U.Copy();
            Matrix sigma = svd.S.ToMatrix();
            Matrix documentVector = svd.V.Copy();

            // get value of k
            // you can also manually set the value of k
            var k = (int) Math.Floor(Math.Sqrt(myMatrix.Columns));

            // reduce the vectors
            Matrix reducedWordVector = CopyMatrix(wordVector, wordVector.Rows, k - 1);
            Matrix reducedSigma = CreateSigmaMatrix(sigma, k - 1, k - 1);
            Matrix reducedDocumentVector = CopyMatrix(documentVector, documentVector.Rows, k - 1);

            // re-compute matrix
            Matrix a = reducedWordVector*reducedSigma*reducedDocumentVector.Transpose();

            // print result
            PrintMatrix(a);
        }

        /// &lt;summary&gt;
        /// Creates the sigma matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;matrix&quot;&gt;The matrix.&lt;/param&gt;
        /// &lt;param name=&quot;rowEnd&quot;&gt;The row end.&lt;/param&gt;
        /// &lt;param name=&quot;columnEnd&quot;&gt;The column end.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private static Matrix CreateSigmaMatrix(IMatrix matrix, int rowEnd, int columnEnd)
        {
            var copyMatrix = new Matrix(rowEnd, columnEnd);

            for (int i = 0; i &lt; columnEnd; i++)
            {
                copyMatrix.MatrixData[i, i] = matrix.MatrixData[i, 0];
            }

            return copyMatrix;
        }

        /// &lt;summary&gt;
        /// Copies the matrix.
        /// &lt;/summary&gt;
        /// &lt;param name=&quot;myMatrix&quot;&gt;My matrix.&lt;/param&gt;
        /// &lt;param name=&quot;rowEnd&quot;&gt;The row end.&lt;/param&gt;
        /// &lt;param name=&quot;columnEnd&quot;&gt;The column end.&lt;/param&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        private static Matrix CopyMatrix(IMatrix myMatrix, int rowEnd, int columnEnd)
        {
            var copyMatrix = new Matrix(rowEnd, columnEnd);

            for (int i = 0; i &lt; rowEnd; i++)
            {
                for (int j = 0; j &lt; columnEnd; j++)
                {
                    copyMatrix.MatrixData[i, j] = myMatrix.MatrixData[i, j];
                }
            }

            return copyMatrix;
        }
    }
}
</pre>
<p>With much thanks to Sujit Pal, Jennifer Flynn and Deerwester for their excellent explanations.</p>
<p><strong>Recommended Reading</strong></p>
<p><a href="http://sujitpal.blogspot.com/2008/09/ir-math-with-java-tf-idf-and-lsi.html">IR Math with Java : TF, IDF and LSI</a> &#8211; Sujit Pal</p>
<p><a href="http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf">Indexing by Latent Semantic Analysis</a> &#8211; Deerwester et al</p>
<p><a href="http://www.soe.ucsc.edu/classes/cmps290c/Spring07/proj/Flynn_talk.pdf">Latent Semantic Indexing Using SVD and Riemannian SVD</a> &#8211; Jennifer Flynn</p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/latent-semantic-indexing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Singular Value Decomposition</title>
		<link>http://eric.ness.net/archives/singular-value-decomposition/</link>
		<comments>http://eric.ness.net/archives/singular-value-decomposition/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 16:00:06 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=299</guid>
		<description><![CDATA[Singular Value Decomposition using the SmartMathLibrary in C#]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/singular-value-decomposition/&amp;n=Singular+Value+Decomposition&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/singular-value-decomposition/&amp;title=Singular+Value+Decomposition" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/singular-value-decomposition/&amp;title=Singular+Value+Decomposition" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/singular-value-decomposition/&amp;t=Singular+Value+Decomposition" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/singular-value-decomposition/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/singular-value-decomposition/&amp;title=Singular+Value+Decomposition" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/singular-value-decomposition/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/singular-value-decomposition/&amp;title=Singular+Value+Decomposition" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Singular+Value+Decomposition+-+http://b2l.me/kwt6t&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>Singular Value Decomposition is something I&#8217;ve been wanting to wrap my head around for a while now that I am getting really into Machine Learning. Unfortunately, a lot of the material out there is often hard to understand and believe it or not there are few libraries that are available in .NET.</p>
<p>So what is singular value decomposition (SVD)? Probabably, the best description I&#8217;ve run across is:</p>
<blockquote><p>Singular Value Decomposition is a way of factoring matrices into a series of linear approximations that expose the underlying structure of the matrix. SVD is extraordinarily useful and has many applications such as data analysis, signal processing, pattern recognition, image compression, weather prediction, and Latent Semantic Analysis. [<a href="http://puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html">iMetaSearch</a>]</p></blockquote>
<p>SVD formula is:</p>
<p style="text-align: center;"><strong>M=Uâˆ‘V*</strong></p>
<p style="text-align: left;">M is simply a m-by-n matrix, U form a set of orthonormal &#8220;output&#8221; basis vector directions for M, Î£ are the singular values, which can be thought of as scalar &#8220;gain controls&#8221; by which each corresponding input is multiplied to give a corresponding output and V* form a set of orthonormal &#8220;input&#8221; or &#8220;analysing&#8221; basis vector directions for M. The best walk through I&#8217;ve come across is over at iMetaSearch <a href="http://puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html">here</a>.</p>
<p style="text-align: left;"><strong>A lack of good .NET Libraries.</strong></p>
<p style="text-align: left;">I tried out four different libraries: <a href="http://smartmathlibrary.codeplex.com/">SmartMathLibrary</a>, <a href="http://latoolnet.codeplex.com/">LatoolNet</a>, <a href="http://www.alglib.net/">ALGLIB</a> and <a href="http://www.codeproject.com/KB/recipes/psdotnetmatrix.aspx?msg=2345970">DotNetMatrix</a>. Out of these four I could only get two of them completely working and I ultimately came to the conclusion that <a href="http://smartmathlibrary.codeplex.com/">SmartMathLibrary</a> was the best for doing SVD.</p>
<p style="text-align: left;"><strong>The Code</strong></p>
<p style="text-align: left;">Here is the code to replicate this tutorial over at <a href="http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm">MIT</a>.</p>
<pre class="brush: jscript;">
using System;
using SmartMathLibrary;

namespace MatrixTest2
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            SVDTest();
            Console.ReadLine();
        }

        private static void SVDTest()
        {
            // Create/load array
            var holeDifficulty = new double[,]
                                     {
                                         {2, 1, 0, 0},
                                         {4, 3, 0, 0}
                                     };

            // Load in to Matrix
            var a = new Matrix(holeDifficulty);

            // Singular Value Decomposition
            var SVD = new SingularValueDecomposition(a);
            SVD.ExecuteDecomposition();

            // Get the general vector
            GeneralVector s = SVD.S;

            // Display results
            Console.WriteLine(a.Transpose().ToString());
            Console.WriteLine();
            Console.WriteLine(s.ToString());
            Console.WriteLine();
            Console.WriteLine(SVD.U.ToString());
            Console.WriteLine();
            Console.WriteLine(SVD.V.ToString());
        }
    }
}
</pre>
<p><strong>Additional Resources</strong></p>
<p><a href="http://puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html">iMetaSearch</a></p>
<p><a href="http://sujitpal.blogspot.com/2008/09/ir-math-with-java-tf-idf-and-lsi.html">IR Math with Java : TF, IDF and LSI</a></p>
<p><a href="http://alias-i.com/lingpipe/demos/tutorial/svd/read-me.html">SVD Tutorial</a></p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/singular-value-decomposition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pearson&#8217;s Correlation Coefficient</title>
		<link>http://eric.ness.net/archives/pearsons-correlation-coefficient/</link>
		<comments>http://eric.ness.net/archives/pearsons-correlation-coefficient/#comments</comments>
		<pubDate>Sun, 25 Oct 2009 19:33:25 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=272</guid>
		<description><![CDATA[Pearson's Correlation Coefficient walk through]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;n=Pearson%27s+Correlation+Coefficient&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;title=Pearson%27s+Correlation+Coefficient" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;title=Pearson%27s+Correlation+Coefficient" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;t=Pearson%27s+Correlation+Coefficient" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;title=Pearson%27s+Correlation+Coefficient" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/pearsons-correlation-coefficient/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/pearsons-correlation-coefficient/&amp;title=Pearson%27s+Correlation+Coefficient" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Pearson%27s+Correlation+Coefficient+-+http://b2l.me/kwzud&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>In Toby Segaran&#8217;s book &#8220;Programming Collective Intelligence&#8221; one additional methods used &#8220;to determine the similarity between people&#8217;s interests is to use the Pearson&#8217;s correlation coefficient. In statistics Pearson&#8217;s correlation coefficient is often symbolized as simply r. I also covered Toby&#8217;s Euclidean Distance Score <a title="http://eric.ness.net/archives/euclidean-distance-score/" href="http://eric.ness.net/archives/euclidean-distance-score/">here</a>.</p>
<p><img src="http://eric.ness.net/wp-content/uploads/2009/10/hl_correl_frm_r.png" alt="hl_correl_frm_r" width="197" height="102" /></p>
<p style="padding-left: 30px;">.</p>
<p style="padding-left: 60px;">.</p>
<p style="padding-left: 60px;">.</p>
<p style="padding-left: 60px;">.</p>
<p style="padding-left: 30px;">
<p style="padding-left: 30px;">
<p>Is how r is calculated.</p>
<p>And here is some sloppy source code to get you going:</p>
<pre class="brush: jscript;">
using System;
using System.Linq;

namespace PearsonTest
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            var myP = new Correlation();

            var lisaRose = new double[] {0, 2, 4, 6, 8, 10, 12};
            var jackMatthews = new[] {2.1, 5, 9, 12.6, 17.3, 21, 24.7};

            double score = myP.PearsonCorrelation(lisaRose, jackMatthews);

            Console.WriteLine(score);
            Console.ReadLine();

            // The answer is 0.99887956534852
        }
    }

    internal class Correlation
    {
        public double PearsonCorrelation(double[] x, double[] y)
        {
            double result;
            double xMean = 0;
            double yMean = 0;
            double xDenom = 0;
            double yDenom = 0;
            double denominator;
            double numerator = 0;
            double n;

            // Make sure arrays are same size and greater than 1
            if ((x.Count() == y.Count()) &amp;&amp; (x.Count() &gt;= 1))
            {
                n = x.Count();
            }
            else
            {
                result = 0;
                return result;
            }

            // Find Means
            for (int i = 0; i &lt;= n - 1; i++)
            {
                xMean += x[i];
                yMean += y[i];
            }
            xMean = xMean/n;
            yMean = yMean/n;

            // Caluculate numerator and denominator
            for (int i = 0; i &lt;= n - 1; i++)
            {
                //Caluculate numerator
                double numX = x[i] - xMean;
                double numY = y[i] - yMean;
                numerator += numX*numY;

                // Caluculate denominator parts
                xDenom += Math.Pow(numX, 2);
                yDenom += Math.Pow(numY, 2);
            }

            // Caluculate denominator
            denominator = Math.Sqrt(xDenom*yDenom);

            // Check for division by zero
            if (denominator == 0)
            {
                result = 0;
            }
            else
            {
                result = numerator/denominator;
            }

            return result;
        }
    }
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/pearsons-correlation-coefficient/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SQL Server Migration Assistant for Access Walk Through.</title>
		<link>http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/</link>
		<comments>http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 10:07:47 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=242</guid>
		<description><![CDATA[SQL Server Migration Assistant for Access Walk Through.]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;n=SQL+Server+Migration+Assistant+for+Access+Walk+Through.&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;title=SQL+Server+Migration+Assistant+for+Access+Walk+Through." rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;title=SQL+Server+Migration+Assistant+for+Access+Walk+Through." rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;t=SQL+Server+Migration+Assistant+for+Access+Walk+Through." rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;title=SQL+Server+Migration+Assistant+for+Access+Walk+Through." rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/&amp;title=SQL+Server+Migration+Assistant+for+Access+Walk+Through." rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=SQL+Server+Migration+Assistant+for+Access+Walk+Through.+-+http://b2l.me/mpp6w&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>I am currently traveling and do not have MS SQL Server Workgroup edition loaded up on my laptop and needed to import some data from an Microsoft Access database. In doing a quick search I cam across this great utility called SQL Server Migration Assistant for Access which you can <a href="http://www.microsoft.com/downloads/details.aspx?familyid=D842F8B4-C914-4AC7-B2F3-D25FFF4E24FB&amp;displaylang=en">download for free here</a>. Does everything I need and is pretty easy to use &#8211; but, I took some screen shots to show you how it&#8217;s done.</p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/1.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/1.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/2.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/2.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/3.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/3.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/4.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/4.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/5.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/5.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/6.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/6.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/7.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/7.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/8.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/8.jpg" alt="1" width="562" height="366" /></a></p>
<p><a href="http://eric.ness.net/wp-content/uploads/2009/10/9.jpg"><img class="alignnone size-full wp-image-246" title="1" src="http://eric.ness.net/wp-content/uploads/2009/10/9.jpg" alt="1" width="562" height="366" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/sql-server-migration-assistant-for-access-walk-through/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My toolbox: A little list of programming tools.</title>
		<link>http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/</link>
		<comments>http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 17:05:53 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://eric.ness.net/?p=213</guid>
		<description><![CDATA[A little list of tools I use.]]></description>
			<content:encoded><![CDATA[

<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge">
<ul class="socials">
		<li class="shr-blogger">
			<a href="http://www.blogger.com/blog_this.pyra?t&amp;u=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;n=My+toolbox%3A+A+little+list+of+programming+tools.&amp;pli=1" rel="nofollow" class="external" title="Blog this on Blogger">Blog this on Blogger</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;title=My+toolbox%3A+A+little+list+of+programming+tools." rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;title=My+toolbox%3A+A+little+list+of+programming+tools." rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;t=My+toolbox%3A+A+little+list+of+programming+tools." rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;title=My+toolbox%3A+A+little+list+of+programming+tools." rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-squidoo">
			<a href="http://www.squidoo.com/lensmaster/bookmark?http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/" rel="nofollow" class="external" title="Add to a lense on Squidoo">Add to a lense on Squidoo</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/&amp;title=My+toolbox%3A+A+little+list+of+programming+tools." rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=My+toolbox%3A+A+little+list+of+programming+tools.+-+http://b2l.me/k2m5c&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

<p>List of tools I use:</p>
<h2><strong>Add-In&#8217;s</strong></h2>
<p><strong>Resharper:</strong> Is one of those add-ins that once you start using it would be hard to give up â€“ as it provides solutions to programming errors, helps out with refactoring, unit testing, code formatting and clean up. [<a title="Resharper" href="http://www.jetbrains.com/resharper/" target="_blank">link</a>]<br />
<strong>VisualSVN:</strong> You need a plug-in that connects your solution to some form of version control. Iâ€™ve always been a fan of Subversion [free] and VisualSVN helps me connect to it.<a title="ankhsvn" href="http://ankhsvn.open.collab.net/" target="_blank"> AnkhSVN</a> is also a great free alternative. I would also recommend that you check out <a title="Visual SVN Server" href="http://www.visualsvn.com/server/" target="_blank">Visual SVN Server</a> as it allows you configure and manage Subversion for your whole team and it is free. [<a title="VisualSVN" href="http://www.visualsvn.com/" target="_blank">link</a>]<br />
<strong>GhostDoc:</strong> This free extension automatically generates your documentation comments for your code. A must. [<a title="ghostdoc" href="http://submain.com/products/ghostdoc.aspx" target="_blank">link</a>]<br />
<strong>Clone Detective: </strong>This add-in tells you where you are repeating code. Very helpful but sometimes does not play well with others add-ins. [<a title="Clone Detective" href="http://www.codeplex.com/CloneDetectiveVS" target="_blank">link</a>]</p>
<h2><strong>Controls</strong></h2>
<p><strong>Dundas:</strong> Best graphing framework on .NET period. Only negative is that it is not cheap but you can accomplish most stuff with ASP.NET Charting Controls. [<a title="Dundas" href="http://dundas.com/" target="_blank">link</a>]<br />
<strong>Virtual Earth:</strong> Fun control to use Virtual Earth. [<a title="Virtual Earth" href="http://msdn.microsoft.com/en-us/library/dd877180.aspx" target="_blank">link</a>]<br />
<strong>Telerik RadControls:</strong> Great controls and is a must but like Dundas is not cheap. [<a title="telerik" href="http://www.telerik.com/" target="_blank">link</a>]<br />
<strong>ASP.Net Ajax Framework:</strong> A must if you are still doing web forms page. [<a title="ASP.NET Ajax" href="http://www.asp.net/ajax/">link</a>]<br />
<strong>ASP.Net Ajax Toolkit:</strong> The bells and whistles for the framework. [<a title="ASP.NET Ajax Toolkit" href="http://www.asp.net/ajax/AjaxControlToolkit/Samples/" target="_blank">link</a>]</p>
]]></content:encoded>
			<wfw:commentRss>http://eric.ness.net/archives/my-toolbox-a-little-list-of-programming-tools/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
