- Mastering Machine Learning Algorithms
- Giuseppe Bonaccorso
- 257字
- 2021-06-25 22:07:34
Example of label spreading
We can test this algorithm using the Scikit-Learn implementation. Let's start by creating a very dense dataset:
from sklearn.datasets import make_classification
nb_samples = 5000
nb_unlabeled = 1000
X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0, random_state=100)
Y[nb_samples - nb_unlabeled:nb_samples] = -1
We can train a LabelSpreading instance with a clamping factor alpha=0.2. We want to preserve 80% of the original labels but, at the same time, we need a smooth solution:
from sklearn.semi_supervised import LabelSpreading
ls = LabelSpreading(kernel='rbf', gamma=10.0, alpha=0.2)
ls.fit(X, Y)
Y_final = ls.predict(X)
The result is shown, as usual, together with the original dataset:
Original dataset (left). Dataset after a complete label spreading (right)
As it's possible to see in the first figure (left), in the central part of the cluster (x [-1, 0]), there's an area of circle dots. Using a hard-clamping, this aisle would remain unchanged, violating both the smoothness and clustering assumptions. Setting α > 0, it's possible to avoid this problem. Of course, the choice of α is strictly correlated with each single problem. If we know that the original labels are absolutely correct, allowing the algorithm to change them can be counterproductive. In this case, for example, it would be better to preprocess the dataset, filtering out all those samples that violate the semi-supervised assumptions. If, instead, we are not sure that all samples are drawn from the same pdata, and it's possible to be in the presence of spurious elements, using a higher α value can smooth the dataset without any other operation.
- Microsoft Dynamics CRM Customization Essentials
- 網上生活必備
- VMware Performance and Capacity Management(Second Edition)
- 電腦上網直通車
- SharePoint 2010開發最佳實踐
- 人工智能與人工生命
- Android游戲開發案例與關鍵技術
- 運動控制系統應用與實踐
- 網絡安全與防護
- Blender 3D Printing by Example
- LMMS:A Complete Guide to Dance Music Production Beginner's Guide
- Puppet 3 Beginner’s Guide
- Redash v5 Quick Start Guide
- 自適應學習:人工智能時代的教育革命
- Embedded Linux Development using Yocto Projects(Second Edition)