官术网_书友最值得收藏!

How it works...

We started by loading in the #Anonops text dataset (step 1). The Anonops IRC channel has been affiliated with the Anonymous hacktivist group. In particular, chat participants have in the past planned and announced their future targets on Anonops. Consequently, a well-engineered ML system would be able to predict cyber attacks by training on such data. In step 2, we instantiated a hashing vectorizer. The hashing vectorizer gave us counts of the 1- and 2-grams in the text, in other words, singleton and consecutive pairs of words (tokens) in the articles. We then applied a tf-idf transformer to give appropriate weights to the counts that the hashing vectorizer gave us. Our final result is a large, sparse matrix representing the occurrences of 1- and 2-grams in the texts, weighted by importance. Finally, we examined the frontend of a sparse matrix representation of our featured data in Scipy.

主站蜘蛛池模板: 丹棱县| 柞水县| 高雄县| 富民县| 宾阳县| 宜兰市| 信阳市| 门源| 卢氏县| 和静县| 南京市| 中山市| 贵南县| 余姚市| 雅江县| 潮州市| 铜陵市| 嘉义市| 喀什市| 闻喜县| 烟台市| 诸暨市| 福贡县| 中牟县| 台安县| 舞阳县| 陈巴尔虎旗| 东海县| 花莲县| 永安市| 迁西县| 广西| 湘西| 永修县| 乌审旗| 峨眉山市| 龙海市| 渭南市| 五峰| 沂源县| 泸州市|