官术网_书友最值得收藏!

Chapter 2. Data Pipelines

In the first chapter, you were acquainted with some rudimentary concepts regarding data processing, clustering, and classification.

This chapter is dedicated to the creation and maintenance of a flexible end-to-end workflow to train and classify data. The first section of the chapter introduces a data-centric (functional) approach to create number crunching applications, followed by a description of a configurable workflow computation model. The chapter concludes with an overview of different model validation techniques.

You will learn how to do the following:

  • Apply the concept of monadic design to create dynamic workflows
  • Leverage some of Scala's advanced patterns, such as the cake pattern, to build portable computational workflows
  • Take into account the bias-variance trade-off in selecting a model
  • Overcome overfitting in modeling
  • Break down data into training, test and validation sets
  • Implement model validation in Scala using precision, recall, and F score
主站蜘蛛池模板: 屏东县| 定南县| 宜宾市| 高陵县| 霍林郭勒市| 彰化市| 龙山县| 和顺县| 康定县| 奈曼旗| 阳新县| 哈巴河县| 宜阳县| 遂溪县| 安龙县| 满城县| 长兴县| 正定县| 海安县| 余干县| 固镇县| 河北区| 买车| 阳泉市| 尼勒克县| 清水县| 宁波市| 驻马店市| 崇仁县| 中西区| 阜城县| 合山市| 大冶市| 大安市| 家居| 龙门县| 来凤县| 鸡东县| 内江市| 龙泉市| 定边县|