官术网_书友最值得收藏!

The Spark MLlib library

The Spark MLlib is a library of machine learning algorithms and utilities designed to make machine learning easy and run in parallel. This includes regression, collaborative filtering, classification, and clustering. Spark MLlib provides two types of API included in the packages, namely spark.mllib and spark.ml, where spark.mllib is built on top of RDDs and spark.ml is built on top of the DataFrame. The primary machine learning API for Spark is now the DataFrame-based API in the spark.ml package. Using spark.ml with the DataFrame API is more versatile and flexible, and we can have the benefits provided by DataFrame, such as catalyst optimizer and spark.mllib, which is an RDD-based API that is expected to be removed in the future.

Machine learning is applicable to various data types, including text, images, structured data, and vectors. To support these data types under a unified dataset concept, Spark ML includes the Spark SQL DataFrame. It is easy to combine various algorithms in a single workflow or pipeline. 

The following sections will give you a detailed view of a few key concepts in the Spark ML API.

主站蜘蛛池模板: 昆山市| 景洪市| 平邑县| 铜陵市| 富顺县| 沈丘县| 灵川县| 英吉沙县| 海宁市| 平安县| 楚雄市| 阿城市| 中超| 香河县| 广宗县| 宜章县| 江源县| 天津市| 吴旗县| 准格尔旗| 舞钢市| 西乌| 黑河市| 巧家县| 洛川县| 嘉义县| 阜南县| 绥宁县| 习水县| 雅安市| 靖安县| 北宁市| 巴楚县| 琼海市| 茂名市| 营山县| 海林市| 那坡县| 东丽区| 盐城市| 桂东县|