官术网_书友最值得收藏!

Data science - an iterative process

Often, the process flow of many big data projects is iterative, which means a lot of back-and-forth testing new ideas, new features to include, tweaking various hyper-parameters, and so on, with a fail fast attitude. The end result of these projects is usually a model that can answer a question being posed. Notice that we didn't say accurately answer a question being posed! One pitfall of many data scientists these days is their inability to generalize a model for new data, meaning that they have overfit their data so that the model provides poor results when given new data. Accuracy is extremely task-dependent and is usually dictated by the business needs with some sensitivity analysis being done to weigh the cost-benefits of the model outcomes. However, there are a few standard accuracy measures that we will go over throughout this book so that you can compare various models to see how changes to the model impact the result.

H2O is constantly giving meetup talks and inviting others to give machine learning meetups around the US and Europe. Each meetup or conference slides is available on SlideShare ( http://www.slideshare.com/0xdata) or YouTube. Both the sites serve as great sources of information not only about machine learning and statistics but also about distributed systems and computation. For example, one of the most interesting presentations highlights the "Top 10 pitfalls in a data scientist job" ( http://www.slideshare.net/0xdata/h2o-world-top-10-data-science-pitfalls-mark-landry)
主站蜘蛛池模板: 水富县| 长宁县| 蒲城县| 佳木斯市| 罗城| 南部县| 莱芜市| 北票市| 南郑县| 河北省| 波密县| 竹溪县| 盘锦市| 阿鲁科尔沁旗| 夏河县| 轮台县| 建平县| 闽侯县| 吉首市| 荆门市| 苍南县| 许昌市| 镇安县| 大渡口区| 巩留县| 广安市| 宜丰县| 镇巴县| 瓮安县| 宜宾市| 普宁市| 承德县| 隆尧县| 鄢陵县| 观塘区| 温泉县| 彭山县| 武陟县| 宁国市| 商丘市| 巫山县|