官术网_书友最值得收藏!

Comparing sparse data using cosine similarity

When a data set has multiple empty fields, comparing the distance using the Manhattan or Euclidean metrics might result in skewed results. Cosine similarity measures how closely two vectors are oriented with each other. For example, the vectors (82, 86) and (86, 82) essentially point in the same direction. In fact, their cosine similarity is equivalent to the cosine similarity between (41, 43) and (43, 41). A cosine similarity of 1 corresponds to vectors that point in the exact same direction, and 0 corresponds to vectors that are completely orthogonal to each other.

As long as the angles between the two vectors are equal, their cosine similarity is equivalent. Applying a distance metric such as the Manhattan distance or Euclidean distance in this case produces a significant difference between the two sets of data.

The cosine similarity between the two vectors is the dot product of the two vectors divided by the product of their magnitudes.

How to do it...

Create a new file, which we will call Main.hs, and perform the following steps:

  1. Implement main to compute the cosine similarity between two lists of numbers.
    main :: IO ()
    main = do
      let d1 = [3.5, 2, 0, 4.5, 5, 1.5, 2.5, 2]
      let d2 = [  3, 0, 0,   5, 4, 2.5,   3, 0]
  2. Compute the cosine similarity.
      let similarity = dot d1 d2 / (eLen d1 * eLen d2)
      print similarity
  3. Define the dot product and Euclidean length helper functions.
    dot a b = sum $ zipWith (*) a b  
    eLen a = sqrt $ dot a a
  4. Run the code to print the cosine similarity.
    $ runhaskell Main.hs
    
    0.924679432210068
    

See also

If the data set is not sparse, consider using the Manhattan or Euclidean distance metrics instead, as detailed in the recipes Computing the Manhattan distance and Computing the Euclidean distance.

主站蜘蛛池模板: 台北市| 姜堰市| 西和县| 尉犁县| 内丘县| 吉林省| 万州区| 乌拉特后旗| 隆化县| 栖霞市| 密云县| 托克托县| 盐亭县| 惠水县| 壤塘县| 大荔县| 泽州县| 牙克石市| 松原市| 泾阳县| 搜索| 安图县| 常州市| 盖州市| 锡林郭勒盟| 呼和浩特市| 乐东| 大厂| 宽甸| 增城市| 原阳县| 通州市| 扶风县| 无棣县| 洪湖市| 太仆寺旗| 咸阳市| 邢台市| 内江市| 通河县| 黄山市|