Professional Documents
Culture Documents
http://inside.mines.edu/~ckarlsso/mining_portfolio/similarity.html
1 of 4
8/28/12 1:48 PM
http://inside.mines.edu/~ckarlsso/mining_portfolio/similarity.html
si[item] = 1
# If they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
sum_of_squares = sum([pow(prefs[person1][item]-prefs[person2][item],2)
for item in si])
#
#
return sqrt(sum_of_squares)
To give us a higher value when two points are close add one and take inverse
return 1/(1+sqrt(sum_of_squares))
2 of 4
8/28/12 1:48 PM
http://inside.mines.edu/~ckarlsso/mining_portfolio/similarity.html
Cosine Similarity:
is often used when comparing two
documents against each other. It measures
the angle between the two vectors. If the
value is zero the angle between the two
vectors is 90 degrees and they share no
terms. If the value is 1 the two vectors are the
same except for magnitude. Cosine is used
when data is sparse, asymmetric and there is
a similarity of lacking characteristics.
# Returns the Cosine Similarity Score for p1 and p2
def sim_cosine(prefs, p1, p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Calcuate the normalized vector
num_p = sum([prefs[p1][it]*prefs[p2][it] for it in si])
norm_p1 = sqrt(sum([pow(prefs[p1][it],2) for it in si]))
norm_p2 = sqrt(sum([pow(prefs[p2][it],2) for it in si]))
# Calculate the Cosine Similarity Score.
s_cos = cos(num_p / (norm_p1*norm_p2))
3 of 4
8/28/12 1:48 PM
http://inside.mines.edu/~ckarlsso/mining_portfolio/similarity.html
return s_cos
4 of 4
8/28/12 1:48 PM