官术网_书友最值得收藏!

Feature extraction and pipeline

Once your features and datasets have been obtained, the next step is to perform feature extraction. Feature extraction, depending on the size of your dataset and your features, could be one of the most time-consuming elements of the model building process. 

For example, let's say that the results from the aforementioned fictitious John Doe County Election Poll had 40,000 responses. Each response was stored in a SQL database captured from a web form. Performing a SQL query, let's say you then returned all of the data into a CSV file, using which your model can be trained. At a high level, this is your feature extraction and pipeline. For more complex scenarios, such as predicting malicious web content or image classification, the extraction will include binary extraction of specific bytes in files. Properly storing this data to avoid having to re-run the extraction is crucial to iterating quickly (assuming the features did not change). 

In Chapter 11, Training and Building Production Models, we will deep dive into ways to version your feature-extracted data and maintain control over your data, especially as your dataset grows in size.

主站蜘蛛池模板: 湟中县| 永康市| 乌审旗| 康定县| 新绛县| 连云港市| 大邑县| 井研县| 碌曲县| 临泽县| 盐津县| 辰溪县| 杨浦区| 阿荣旗| 鹤山市| 团风县| 潞城市| 资兴市| 富民县| 九江市| 成都市| 兴义市| 洛浦县| 武冈市| 岳西县| 临泽县| 子长县| 成安县| 昌邑市| 蒙城县| 永善县| 榆中县| 望江县| 武定县| 华蓥市| 常熟市| 绵阳市| 连城县| 科尔| 浑源县| 肇庆市|