官术网_书友最值得收藏!

Analyzing the data and/or applying machine learning to the data

In this phase, quite a bit of analysis takes place as the data scientist (driven by a high level of scientific curiosity and experience) attempts to shape a story based upon an observation or the interpretation of their understanding of the data (up to this point). The data scientist continues to slice and dice the data, using analytics or BI packages—such as Tableau or Pentaho or an open source solution such as R or Python—to create a concrete data storyline. Once again, based on these analysis results, the data scientist may elect to again go back to a prior phase, pulling new data, processing and reprocessing, and creating additional visualizations. At some point, when appropriate progress has been made, the data scientist may decide that the data is at such point where data analysis can begin. Machine learning (defined further later in this chapter) has evolved over time from being more of an exercise in pattern recognition to now being defined as utilizing a selected statistical method to dig deeper, using the data and results of the analysis of this phase to learn and make a prediction, on the project data.

The ability of a data scientist to extract a quantitative result from data through machine learning and express it as something that everyone (not just other data scientists) can understand immediately is an invaluable skill, and we will talk more about this throughout this book.

主站蜘蛛池模板: 博乐市| 沛县| 八宿县| 康乐县| 广昌县| 西林县| 平舆县| 东阳市| 青阳县| 肇州县| 时尚| 城步| 横峰县| 蓬莱市| 稻城县| 金昌市| 郧西县| 隆德县| 宁南县| 周口市| 泸溪县| 屯门区| 咸丰县| 和顺县| 都兰县| 新沂市| 陆丰市| 康乐县| 清水县| 阜城县| 台南市| 长子县| 安乡县| 三河市| 株洲市| 武宁县| 新田县| 文昌市| 肥城市| 津南区| 灵宝市|