官术网_书友最值得收藏!

How it works...

Leveraging the dataset we built up in the Scraping GitHub for files of a specific type recipe, we place files in different directories, based on their file type, and then specify the paths in preparation for building our classifier (step 1). The code for this recipe assumes that the "JavascriptSamples" directory and others contain the samples, and have no subdirectories. We read in all files into a corpus, and record their labels (step 2). We train-test split the data and prepare a pipeline that will perform basic NLP on the files, followed by a random forest classifier (step 3). The choice of classifier here is meant for illustrative purposes, rather than to imply a best choice of classifier for this type of data. Finally, we perform the basic, but important, steps in the process of creating a machine learning classifier, consisting of fitting the pipeline to the training data and then assessing its performance on the testing set by measuring its accuracy and confusion matrix (step 4).

主站蜘蛛池模板: 云梦县| 淮南市| 葵青区| 锦州市| 瑞金市| 兴化市| 青浦区| 怀来县| 肥东县| 偏关县| 特克斯县| 云南省| 雅江县| 常宁市| 金坛市| 集安市| 肥乡县| 玉门市| 花垣县| 天柱县| 临邑县| 肃宁县| 宁乡县| 社会| 社旗县| 鞍山市| 茂名市| 六盘水市| 旬邑县| 滦平县| 昌黎县| 建宁县| 广南县| 临海市| 中宁县| 奎屯市| 中阳县| 余江县| 长宁区| 萍乡市| 怀集县|