官术网_书友最值得收藏!

Cluster assumption

This assumption is strictly linked to the previous one and it's probably easier to accept. It can be expressed with a chain of interdependent conditions. Clusters are high density regions; therefore, if two points are close, they are likely to belong to the same cluster and their labels must be the same. Low density regions are separation spaces; therefore, samples belonging to a low density region are likely to be boundary points and their classes can be different. To better understand this concept, it's useful to think about supervised SVM: only the support vectors should be in low density regions. Let's consider the following bidimensional example:

In a semi-supervised scenario, we couldn't know the label of a point belonging to a high density region; however, if it is close enough to a labeled point that it's possible to build a ball where all the points have the same average density, we are allowed to predict the label of our test sample. Instead, if we move to a low-density region, the process becomes harder, because two points can be very close but with different labels. We are going to discuss the semi-supervised, low-density separation problem at the end of this chapter.

主站蜘蛛池模板: 滁州市| 临澧县| 得荣县| 佛冈县| 沙雅县| 万山特区| 南皮县| 靖宇县| 监利县| 陆丰市| 邛崃市| 会泽县| 泸定县| 连州市| 莱阳市| 新田县| 年辖:市辖区| 新余市| 丰顺县| 绍兴县| 齐齐哈尔市| 子洲县| 静乐县| 阿克苏市| 乌恰县| 浙江省| 自贡市| 道真| 开原市| 抚宁县| 凤台县| 泸西县| 安吉县| 涟水县| 咸丰县| 平定县| 佛山市| 新和县| 泉州市| 英吉沙县| 张家港市|