- Machine Learning Quick Reference
- Rahul Kumar
- 388字
- 2021-08-20 10:05:05
Statistical modeling – the two cultures of Leo Breiman
Whenever we try to analyze data and finally make a prediction, there are two approaches that we consider, both of which were discovered by Leo Breiman, a Berkeley professor, in his paper titled Statistical Modeling: Two Cultures in 2001.
Any analysis needs data. An analysis can be as follows:

A vector of X (Features) undergoes a nature box, which translates into a response. A nature box tries to establish a relationship between X and Y. Typically, there are goals pertaining to this analysis, as follows:
- Prediction: To predict the response with the future input features
- Information: To find out and understand the association between the response and driving input variables
Breiman states that, when it comes to solving business problems, there are two distinct approaches:
- The data modeling culture: In this kind of model, nature takes the shape of a stochastic model that estimates the necessary parameters. Linear regression, logistic regression, and the Cox model usually act under the nature box. This model talks about observing the pattern of the data and looks to design an approximation of what is being observed. Based on their experience, the scientist or a statistician would decide which model to be used. It is the case of a model coming before the problem and the data, the solutions from this model is more towards the model's architecture. Breiman says that over-reliance on this kind of approach doesn't help the statisticians cater to a diverse set of problems. When it comes to finding out solutions pertaining to earthquake prediction, rain prediction, and global warming causes, it doesn't give accurate results, since this approach doesn't focus on accuracy, and instead focuses on the two goals.
- The algorithm modeling culture: In this approach, pre-designed algorithms are used to make a better approximation. Here, the algorithms use complex mathematics to reach out to the conclusion and acts inside the nature box. With better computing power and using these models, it's easy to replicate the driving factors as the model keeps on running until it learns and understands the pattern that drives the outcome. It enables us to address more complex problems, and emphasizes more on accuracy. With more data coming through, it can give a much better result than the data modeling culture.
推薦閱讀
- 大數據導論:思維、技術與應用
- Canvas LMS Course Design
- 城市道路交通主動控制技術
- Implementing Oracle API Platform Cloud Service
- INSTANT Autodesk Revit 2013 Customization with .NET How-to
- 網絡布線與小型局域網搭建
- 突破,Objective-C開發速學手冊
- 基于神經網絡的監督和半監督學習方法與遙感圖像智能解譯
- SMS 2003部署與操作深入指南
- 自動化生產線安裝與調試(三菱FX系列)(第二版)
- 會聲會影X4中文版從入門到精通
- 玩機器人 學單片機
- C#編程兵書
- 多媒體技術應用教程
- 軟件需求最佳實踐