- Applied Unsupervised Learning with Python
- Benjamin Johnston Aaron Jones Christopher Kruger
- 405字
- 2021-06-11 13:23:55
Clustering Refresher
Chapter 1, Introduction to Clustering, covered both the high-level intuition and in-depth details of one of the most basic clustering algorithms: k-means. While it is indeed a simple approach, do not discredit it; it will be a valuable addition to your toolkit as you continue your exploration of the unsupervised learning world. In many real-world use cases, companies experience groundbreaking discoveries through the simplest methods, such as k-means or linear regression (for supervised learning). As a refresher, let's quickly walk through what clusters are and how k-means works to find them:

Figure 2.1: The attributes that separate supervised and unsupervised problems
If you were given a random collection of data without any guidance, you would likely start your exploration using basic statistics – for example, what the mean, median, and mode values are of each of the features. Remember that, from a high-level data model that simply exists, knowing whether it is supervised or unsupervised learning is ascribed by the data goals that you have set for yourself or that were set by your manager. If you were to determine that one of the features was actually a label and you wanted to see how the remaining features in the dataset influence it, this would become a supervised learning problem. However, if after initial exploration you realize that the data you have is simply a collection of features without a target in mind (such as a collection of health metrics, purchase invoices from a web store, and so on), then you could analyze it through unsupervised methods.
A classic example of unsupervised learning is finding clusters of similar customers in a collection of invoices from a web store. Your hypothesis is that by understanding which people are most similar, you can create more granular marketing campaigns that appeal to each cluster's interests. One way to achieve these clusters of similar users is through k-means.
k-means Refresher
k-means clustering works by finding "k" number clusters in your data through pairwise Euclidean distance calculations. "K" points (also called centroids) are randomly initialized in your data and the distance is calculated from each data point to each of the centroids. The minimum of these distances designates which cluster a data point belongs to. Once every point has been assigned to a cluster, the mean intra-cluster data point is calculated as the new centroid. This process is repeated until the newly-calculated cluster centroid no longer changes position.
- Advanced Splunk
- INSTANT Mock Testing with PowerMock
- AngularJS Testing Cookbook
- What's New in TensorFlow 2.0
- Java Web及其框架技術
- 微服務設計原理與架構
- Extending Puppet(Second Edition)
- 零基礎Java學習筆記
- C# and .NET Core Test Driven Development
- 深入剖析Java虛擬機:源碼剖析與實例詳解(基礎卷)
- Kivy Cookbook
- HTML+CSS+JavaScript網頁設計從入門到精通 (清華社"視頻大講堂"大系·網絡開發視頻大講堂)
- Image Processing with ImageJ
- OpenCV 3.0 Computer Vision with Java
- Python預測分析與機器學習