官术网_书友最值得收藏!

Spark MLlib

Apache Spark is an open-source platform for large dataset processing. It is well suited for iterative machine learning tasks as it leverages in-memory data structures such as RDDs. MLlib is Spark's machine learning library. MLlib provides functionality for various learning algorithms-supervised and unsupervised. It includes various statistical and linear algebra optimizations. It is shipped along with Apache Spark and hence saves on installation headaches like some other libraries. MLlib supports several higher languages such as Scala, Java, Python and R. It also provides a high-level API to build machine-learning pipelines.

MLlib's integration with Spark has quite a few benefits. Spark is designed for iterative computation cycles; it enables efficient implementation platform for large machine learning algorithms, as these algorithms are themselves iterative.

Any improvement in Spark's data structures results in direct gains for MLlib. Spark's large community contributions have helped bring new algorithms to MLlib faster.

Spark also has other APIs such as Pipeline APIs GraphX, which can be used in conjunction with MLlib; it makes building interesting use cases on top of MLlib easier.

主站蜘蛛池模板: 绍兴县| 承德县| 班玛县| 永修县| 池州市| 客服| 特克斯县| 轮台县| 雅江县| 垫江县| 潜山县| 贡嘎县| 阿尔山市| 阆中市| 兴文县| 普兰店市| 广元市| 满洲里市| 抚州市| 呼图壁县| 蓝山县| 贡嘎县| 专栏| 进贤县| 韶山市| 铜川市| 海伦市| 宣城市| 海宁市| 砀山县| 黎城县| 开化县| 龙口市| 浏阳市| 自治县| 宝坻区| 田阳县| 灌云县| 岑溪市| 天祝| 洛浦县|