官术网_书友最值得收藏!

How it works...

We begin by reading in our dataset and then standardizing it, as in the recipe on standardizing data (steps 1 and 2). (It is necessary to work with standardized data before applying PCA). We now instantiate a new PCA transformer instance, and use it to both learn the transformation (fit) and also apply the transform to the dataset, using fit_transform (step 3). In step 4, we analyze our transformation. In particular, note that the elements of pca.explained_variance_ratio_ indicate how much of the variance is accounted for in each direction. The sum is 1, indicating that all the variance is accounted for if we consider the full space in which the data lives. However, just by taking the first few directions, we can account for a large portion of the variance, while limiting our dimensionality. In our example, the first 40 directions account for 90% of the variance:

sum(pca.explained_variance_ratio_[0:40])

This produces the following output:

0.9068522354673663

This means that we can reduce our number of features to 40 (from 78) while preserving 90% of the variance. The implications of this are that many of the features of the PE header are closely correlated, which is understandable, as they are not designed to be independent.

主站蜘蛛池模板: 兴业县| 保定市| 孝昌县| 仙居县| 郸城县| 周至县| 苍梧县| 永清县| 古浪县| 金湖县| 军事| 綦江县| 山阴县| 泸水县| 土默特右旗| 元谋县| 登封市| 富蕴县| 武邑县| 安达市| 永康市| 蓝山县| 镇雄县| 石渠县| 疏附县| 通城县| 忻城县| 湘乡市| 瑞昌市| 莎车县| 察隅县| 宁阳县| 东阿县| 千阳县| 内丘县| 勐海县| 石嘴山市| 霍山县| 思南县| 海原县| 廉江市|