官术网_书友最值得收藏!

Spark machine learning

It is difficult to run a machine-learning algorithm when your data is distributed across multiple machines. There might be a case when the calculation depends on another point that is stored or processed on a different executor. Data can be shuffling across executors or workers, but shuffle comes with a heavy cost. Spark provides a way to avoid shuffling data. Yes, it is caching. Spark's ability to keep a large amount of data in memory makes it easy to write machine-learning algorithms.

Spark MLlib and ML are the Spark’s packages to work with machine-learning algorithms. They provide the following:

  • Inbuilt machine-learning algorithms such as Classification, Regression, Clustering, and more
  • Features such as pipelining, vector creation, and more

The previous algorithms and features are optimized for data shuffle and to scale across the cluster.

主站蜘蛛池模板: 洞头县| 南充市| 肇庆市| 洞口县| 五大连池市| 平山县| 白河县| 阆中市| 马龙县| 健康| 宜都市| 彩票| 蓬莱市| 清涧县| 禄劝| 高州市| 榆中县| 库车县| 拜泉县| 香港 | 南京市| 建德市| 图木舒克市| 阿荣旗| 东山县| 大余县| 泰安市| 托克逊县| 太白县| 收藏| 咸宁市| 芮城县| 多伦县| 札达县| 金乡县| 台湾省| 河津市| 阿克| 博罗县| 海晏县| 阿拉善左旗|