- Artificial Intelligence for Big Data
- Anand Deshpande Manish Kumar
- 187字
- 2021-06-25 21:57:12
The Spark MLlib library
The Spark MLlib is a library of machine learning algorithms and utilities designed to make machine learning easy and run in parallel. This includes regression, collaborative filtering, classification, and clustering. Spark MLlib provides two types of API included in the packages, namely spark.mllib and spark.ml, where spark.mllib is built on top of RDDs and spark.ml is built on top of the DataFrame. The primary machine learning API for Spark is now the DataFrame-based API in the spark.ml package. Using spark.ml with the DataFrame API is more versatile and flexible, and we can have the benefits provided by DataFrame, such as catalyst optimizer and spark.mllib, which is an RDD-based API that is expected to be removed in the future.
Machine learning is applicable to various data types, including text, images, structured data, and vectors. To support these data types under a unified dataset concept, Spark ML includes the Spark SQL DataFrame. It is easy to combine various algorithms in a single workflow or pipeline.
The following sections will give you a detailed view of a few key concepts in the Spark ML API.
- GitHub Essentials
- InfluxDB原理與實戰
- Modern Programming: Object Oriented Programming and Best Practices
- Visual Studio 2015 Cookbook(Second Edition)
- 大數據:規劃、實施、運維
- Lego Mindstorms EV3 Essentials
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設計
- 企業級容器云架構開發指南
- MySQL DBA修煉之道
- 從實踐中學習sqlmap數據庫注入測試
- 數據賦能
- 改進的群智能算法及其應用
- 大數據測試技術:數據采集、分析與測試實踐(在線實驗+在線自測)
- 數據應用工程:方法論與實踐
- 大數據技術體系詳解:原理、架構與實踐