Euclidean Distance Score

Euclidean Distance Score

October 30th, 2008  |  Published in Machine Learning, Statistics

I am currently reading Toby Segaran’s book “Programming Collective Intelligence” and one of the first topics it covers is how do you determine of similar two people are.

One approach is to use the Euclidean Distance Score. Arun Vijayan C has an excellent power point presentation “Finding more people like you” – on this topic:

I then wrote up some quick code in C# that uses the values in Arun’s presentation:

// Euclidean Distance Score

using System;
using System.Collections.Generic;

namespace ConsoleApplication1
{
    internal class Program
    {
        private static void Main()
        {
            // Load People and Values
            var myP = new List
                          {
                              new People("John", 1.5, 4),
                              new People("Ravi", 4.5, 1.5),
                              new People("Kiran", 1, 3.5),
                              new People("Deepti", 3, 5)
                          };

            // Print header
            Console.WriteLine("People And Scores");
            Console.WriteLine("###################");
            Console.WriteLine();

            // Loop through people and values
            foreach (People people in myP)
            {
                Console.WriteLine(people.Name + "\t" + people.xScore + "\t" + people.yScore);
            }

            // Print Distance And Value Headers
            Console.WriteLine();
            Console.WriteLine("Distance Comparison");
            Console.WriteLine("###################");
            Console.WriteLine();

            // Loop through people and scores
            int myCount = 1;
            for (int i = 0; i < myP.Count; i++)
            {
                for (int j = myCount; j < myP.Count; j++)
                {
                    // Euclidean Distance Score
                    // Sqrt( (x1-x2)^2 + (y1+y2)^2)
                    Console.WriteLine(myP[i].Name + "\t" + myP[j].Name + ":\t" +
                                      Math.Sqrt(Math.Pow(myP[i].xScore - myP[j].xScore, 2) +
                                                Math.Pow(myP[i].yScore - myP[j].yScore, 2)).ToString("0.##"));
                }

                // Skip to the next guy
                myCount++;
            }

            // Print Closer
            Console.WriteLine();
            Console.WriteLine("Press enter to continue...");
            Console.ReadLine();
        }
    }

    internal class People
    {
        public string Name;
        public double xScore;
        public double yScore;

        public People(string _Name, double _xScore, double _yScore)
        {
            Name = _Name;
            xScore = _xScore;
            yScore = _yScore;
        }
    }
}


Related Posts

K-Means Document Clustering
Problems with Html.DropDownList
Cheap GPS and Code Project Tutorial

Responses

  1. Soriano says:

    June 6th, 2009at 10:03 pm(#)

    Hi,
    I´m reading the same book, and I had an doubt.
    In “Euclidean Distance Score” the autor have been used
    a simple function:

    >> sqrt(pow(5-4,2)+pow(4-1,2))
    3.1622776601683795

    The values used in the function, the author
    have been talked to calculate the distance
    between Toby and LaSalle in the chart on the
    figure 2-1.

    But in the chart Toby has Snakes 4.5 and Dupree to 1.0
    and LaSalle 4.0 to Snakes and 2.0 to Dupree.

    My question is:
    Why he didn´t used this values as below.

    D(Toby,LaSalle) =
    >> sqrt(pow(1.0-2.0,2)+pow(4.5-4.0,2))
    3.1622776601683795

    Regards,

    Soriano from Brazil

  2. Eric says:

    June 7th, 2009at 12:26 am(#)

    Hey Soriano,

    I think you are correct in pointing out this error. My guess is that probably got missed during editing.

    Wikipedia has it stated as thus:

    Two-dimensional distance

    For two 2D points, P=(p_x,p_y)\, and Q=(q_x,q_y)\,, the distance is computed as:

    \sqrt{(p_x-q_x)^2 + (p_y-q_y)^2}.

    http://en.wikipedia.org/wiki/Euclidean_distance

    So in python it should read as you’ve written it:

    >> sqrt(pow(1.0-2.0,2)+pow(4.5-4.0,2))
    1.11803398875

    Good catch!

    Eric

  3. Nadya says:

    August 5th, 2009at 8:30 am(#)

    Eric,

    thank you very much for your answer. I spent one day trying to understand why my calculattions for Toy and LaSalle in the book do not match values i get.

    Soriano,
    Thank you very much for raising this issue so readers can find an answer on this page.

    Nadya

  4. Radski says:

    September 1st, 2009at 6:45 pm(#)

    Thanks very much, same book same error…

  5. Aresh says:

    November 17th, 2009at 7:47 am(#)

    interesting implementation!

  6. salma says:

    April 17th, 2010at 1:02 am(#)

    hi i need source code for k-means clustering algorithm for documents…in c#.plz any one have this code then send to my mail.Thanks in advance.

  7. Eric says:

    April 17th, 2010at 1:18 am(#)

    Hey Salma – The k-means class on http://eric.ness.net/archives/k-means-document-clustering/ is in C#. Or over at http://www.codeproject.com/KB/recipes/K-Mean_Clustering.aspx. Let me know if you have any other questions.

Leave a Response


Archives

Categories

Calendar

September 2010
S M T W T F S
« May    
 1234
567891011
12131415161718
19202122232425
2627282930