官术网_书友最值得收藏!

Atom extraction and dictionary learning

Dictionary learning is a technique which allows rebuilding a sample starting from a sparse dictionary of atoms (similar to principal components). In Mairal J., Bach F., Ponce J., Sapiro G., Online Dictionary Learning for Sparse Coding, Proceedings of the 29th International Conference on Machine Learning, 2009 there's a description of the same online strategy adopted by scikit-learn, which can be summarized as a double optimization problem where:

Is an input dataset and the target is to find both a dictionary D and a set of weights for each sample:

After the training process, an input vector can be computed as:

The optimization problem (which involves both D and alpha vectors) can be expressed as the minimization of the following loss function:

Here the parameter c controls the level of sparsity (which is proportional to the strength of L1 normalization). This problem can be solved by alternating the least square variable until a stable point is reached.

In scikit-learn, we can implement such an algorithm with the class DictionaryLearning (using the usual MNIST datasets), where n_components, as usual, determines the number of atoms:

from sklearn.decomposition import DictionaryLearning

>>> dl = DictionaryLearning(n_components=36, fit_algorithm='lars', transform_algorithm='lasso_lars')
>>> X_dict = dl.fit_transform(digits.data)

A plot of each atom (component) is shown in the following figure:

This process can be very long on low-end machines. In such a case, I suggest limiting the number of samples to 20 or 30.
主站蜘蛛池模板: 江陵县| 郯城县| 女性| 遂川县| 邹平县| 句容市| 徐州市| 南溪县| 孟州市| 阿克苏市| 日土县| 黑山县| 卢氏县| 开鲁县| 荥经县| 鄂托克前旗| 喀喇沁旗| 电白县| 吉隆县| 尚义县| 罗源县| 衡阳县| 合作市| 洮南市| 永吉县| 巨鹿县| 吉林省| 秭归县| 永春县| 永安市| 阿瓦提县| 察哈| 阳高县| 鸡泽县| 景宁| 壤塘县| 诸城市| 连云港市| 潼关县| 临湘市| 通辽市|