官术网_书友最值得收藏!

Data science

Data science is the discipline of extracting actionable knowledge from data of various forms. The name data science emerged quite recently--it was invented by DJ Patil and Jeff Hammerbacher and popularized in the article Data Scientist: The Sexiest Job of the 21st Century in 2012. But the discipline itself had existed before for quite a while and previously was known by other names such as data mining or predictive analytics. Data science, like its predecessors, is built on statistics and machine learning algorithms for knowledge extraction and model building.

The science part of the term data science is no coincidence--if we look up science, its definition can be summarized to systematic organization of knowledge in terms testable explanations and predictions. This is exactly what data scientists do, by extracting patterns from available data, they can make predictions about future unseen data, and they make sure the predictions are validated beforehand. 

Nowadays, data science is used across many fields, including (but not limited to):

  • Banking: Risk management (for example, credit scoring), fraud detection, trading
  • Insurance: Claims management (for example, accelerating claim approval), risk and losses estimation, also fraud detection
  • Health care: Predicting diseases (such as strokes, diabetes, cancer) and relapses
  • Retail and e-commerce: Market basket analysis (identifying product that go well together), recommendation engines, product categorization, and personalized searches

This book covers the following practical use cases:

  • Predicting whether an URL is likely to appear on the first page of a search engine
  • Predicting how fast an operation will be completed given the hardware specifications
  • Ranking text documents for a search engine
  • Checking whether there is a cat or a dog on a picture
  • Recommending friends in a social network
  • Processing large-scale textual data on a cluster of computers

In all these cases, we will use data science to learn from data and use the learned knowledge to solve a particular business problem.

We will also use a running example throughout the book, building a search engine. We will use it to illustrate many data science concepts such as, supervised machine learning, dimensionality reduction, text mining, and learning to rank models. 

主站蜘蛛池模板: 黔西县| 金沙县| 和龙市| 乌拉特前旗| 耒阳市| 临沂市| 屏南县| 迁安市| 马龙县| 河南省| 平顺县| 都江堰市| 四川省| 黑山县| 札达县| 固镇县| 岐山县| 河西区| 鄄城县| 正阳县| 都江堰市| 广西| 双城市| 集安市| 怀柔区| 彰化市| 宁蒗| 兴宁市| 宝坻区| 太白县| 天柱县| 稻城县| 连江县| 化州市| 东阿县| 虹口区| 泰安市| 庆元县| 漾濞| 高州市| 崇礼县|