官术网_书友最值得收藏!

Designing a Machine Learning System

In this chapter, we will design a high-level architecture for an intelligent, distributed machine learning system that uses Spark as its core computation engine. The problem we will focus on will be taking the existing architecture for a web-based business and redesigning it to use automated machine learning systems to power key areas of the business.

Before we dig deeper into our scenario, we will spend some time understanding what machine learning is.

Then we will:

  • Introduce a hypothetical business scenario
  • Provide an overview of the current architecture
  • Explore various ways in which machine learning systems can enhance or replace certain business functions
  • Provide a new architecture based on these ideas

A modern large-scale data environment includes the following requirements:

  • It must integrate with the other components of the system, especially with data collection and storage systems, analytics and reporting, and frontend applications
  • It should be easily scalable and independent of the rest of the architecture. Ideally, this should be in the form of horizontal as well as vertical scalability
  • It should allow efficient computation with respect to the type of workload in mind, that is, machine learning and iterative analytics applications
  • If possible, it should support both batch and real-time workload

As a framework, Spark meets these criteria. However, we must ensure that the machine learning systems designed on Spark also meet this criteria. There is no good in implementing an algorithm that ends up having bottlenecks that cause our system to fail in terms of one or more of these requirements.

主站蜘蛛池模板: 兴山县| 雷州市| 白玉县| 北碚区| 寻乌县| 工布江达县| 黑水县| 运城市| 凤台县| 晋江市| 花垣县| 南充市| 谷城县| 芷江| 民勤县| 松阳县| 安陆市| 赤峰市| 象州县| 巴彦县| 阿拉善左旗| 温宿县| 巴彦淖尔市| 贵港市| 铁力市| 涪陵区| 舞钢市| 海门市| 龙岩市| 明溪县| 永安市| 德钦县| 乌拉特中旗| 博湖县| 定远县| 濉溪县| 丹凤县| 杨浦区| 洞头县| 诸暨市| 竹山县|