Euclidean Distance Score
October 30th, 2008 | Published in Machine Learning, Statistics
I am currently reading Toby Segaran’s book “Programming Collective Intelligence” and one of the first topics it covers is how do you determine of similar two people are.
One approach is to use the Euclidean Distance Score. Arun Vijayan C has an excellent power point presentation “Finding more people like you” – on this topic:
I then wrote up some quick code in C# that uses the values in Arun’s presentation:
// Euclidean Distance Score
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main()
{
// Load People and Values
var myP = new List
{
new People("John", 1.5, 4),
new People("Ravi", 4.5, 1.5),
new People("Kiran", 1, 3.5),
new People("Deepti", 3, 5)
};
// Print header
Console.WriteLine("People And Scores");
Console.WriteLine("###################");
Console.WriteLine();
// Loop through people and values
foreach (People people in myP)
{
Console.WriteLine(people.Name + "\t" + people.xScore + "\t" + people.yScore);
}
// Print Distance And Value Headers
Console.WriteLine();
Console.WriteLine("Distance Comparison");
Console.WriteLine("###################");
Console.WriteLine();
// Loop through people and scores
int myCount = 1;
for (int i = 0; i < myP.Count; i++)
{
for (int j = myCount; j < myP.Count; j++)
{
// Euclidean Distance Score
// Sqrt( (x1-x2)^2 + (y1+y2)^2)
Console.WriteLine(myP[i].Name + "\t" + myP[j].Name + ":\t" +
Math.Sqrt(Math.Pow(myP[i].xScore - myP[j].xScore, 2) +
Math.Pow(myP[i].yScore - myP[j].yScore, 2)).ToString("0.##"));
}
// Skip to the next guy
myCount++;
}
// Print Closer
Console.WriteLine();
Console.WriteLine("Press enter to continue...");
Console.ReadLine();
}
}
internal class People
{
public string Name;
public double xScore;
public double yScore;
public People(string _Name, double _xScore, double _yScore)
{
Name = _Name;
xScore = _xScore;
yScore = _yScore;
}
}
}
Related Posts
K-Means Document ClusteringProblems with Html.DropDownList
Cheap GPS and Code Project Tutorial
June 6th, 2009at 10:03 pm(#)
Hi,
I´m reading the same book, and I had an doubt.
In “Euclidean Distance Score” the autor have been used
a simple function:
>> sqrt(pow(5-4,2)+pow(4-1,2))
3.1622776601683795
The values used in the function, the author
have been talked to calculate the distance
between Toby and LaSalle in the chart on the
figure 2-1.
But in the chart Toby has Snakes 4.5 and Dupree to 1.0
and LaSalle 4.0 to Snakes and 2.0 to Dupree.
My question is:
Why he didn´t used this values as below.
D(Toby,LaSalle) =
>> sqrt(pow(1.0-2.0,2)+pow(4.5-4.0,2))
3.1622776601683795
Regards,
Soriano from Brazil
June 7th, 2009at 12:26 am(#)
Hey Soriano,
I think you are correct in pointing out this error. My guess is that probably got missed during editing.
Wikipedia has it stated as thus:
Two-dimensional distance
For two 2D points, P=(p_x,p_y)\, and Q=(q_x,q_y)\,, the distance is computed as:
\sqrt{(p_x-q_x)^2 + (p_y-q_y)^2}.
http://en.wikipedia.org/wiki/Euclidean_distance
So in python it should read as you’ve written it:
>> sqrt(pow(1.0-2.0,2)+pow(4.5-4.0,2))
1.11803398875
Good catch!
Eric
August 5th, 2009at 8:30 am(#)
Eric,
thank you very much for your answer. I spent one day trying to understand why my calculattions for Toy and LaSalle in the book do not match values i get.
Soriano,
Thank you very much for raising this issue so readers can find an answer on this page.
Nadya
September 1st, 2009at 6:45 pm(#)
Thanks very much, same book same error…
November 17th, 2009at 7:47 am(#)
interesting implementation!
April 17th, 2010at 1:02 am(#)
hi i need source code for k-means clustering algorithm for documents…in c#.plz any one have this code then send to my mail.Thanks in advance.
April 17th, 2010at 1:18 am(#)
Hey Salma – The k-means class on http://eric.ness.net/archives/k-means-document-clustering/ is in C#. Or over at http://www.codeproject.com/KB/recipes/K-Mean_Clustering.aspx. Let me know if you have any other questions.