官术网_书友最值得收藏!

Spark computing for machine learning

With its innovations on RDD and in-memory processing, Apache Spark has truly made distributed computing easily accessible to data scientists and machine learning professionals. According to the Apache Spark team, Apache Spark runs on the Mesos cluster manager, letting it share resources with Hadoop and other applications. Therefore, Apache Spark can read from any Hadoop input source like HDFS.

Spark computing for machine learning

For the above, the Apache Spark computing model is very suitable to distributed computing for machine learning. Especially for rapid interactive machine learning, parallel computing, and complicated modelling at scale, Apache Spark should definitely be utilized.

According to the Spark development team, Spark's philosophy is to make life easy and productive for data scientists and machine learning professionals. Due to this, Apache Spark has:

  • Well documented, expressive API's
  • Powerful domain specific libraries
  • Easy integration with storage systems
  • Caching to avoid data movement

Per the introduction by Patrick Wendell, co-founder of Databricks, Spark is especially made for large scale data processing. Apache Spark supports agile data science to iterate rapidly, and Spark can be integrated with IBM and other solutions easily.

主站蜘蛛池模板: 汨罗市| 离岛区| 德州市| 三门县| 阳朔县| 盐池县| 上饶市| 巩义市| 墨竹工卡县| 孟州市| 威远县| 华容县| 锦州市| 南开区| 运城市| 乐亭县| 日喀则市| 合江县| 石阡县| 正镶白旗| 海伦市| 定襄县| 开平市| 樟树市| 泰安市| 响水县| 青龙| 五华县| 台中县| 安丘市| 临桂县| 门头沟区| 宜兰市| 江川县| 休宁县| 泸西县| 广昌县| 明水县| 彭水| 正镶白旗| 江源县|