書名： Machine Learning in Java
作者名： AshishSingh Bhatia Bostjan Kaluza
本章字數： 263字
更新時間： 2021-06-10 19:30:07

Apache Mahout

The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers.

Mahout features a console interface and the Java API as scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems:

Item recommendation: Recommending items such as People who liked this movie also liked
Clustering: Sorting of text documents into groups of topically-related documents
Classification: Learning which topic to assign to an unlabelled document

Mahout is distributed under a commercially friendly Apache license, which means that you can use it as long as you keep the Apache license included and display it in your program's copyright notice.

Mahout features the following libraries:

org.apache.mahout.cf.taste: These are collaborative filtering algorithms based on user-based and item-based collaborative filtering and matrix factorization with ALS
org.apache.mahout.classifier: These are in-memory and distributed implementations, including logistic regression, Naive Bayes, random forest, hidden Markov models (HMM), and multilayer perceptron
org.apache.mahout.clustering: These are clustering algorithms such as canopy clustering, k-means, fuzzy k-means, streaming k-means, and spectral clustering
org.apache.mahout.common: These are utility methods for algorithms, including distances, MapReduce operations, iterators, and so on
org.apache.mahout.driver: This implements a general-purpose driver to run main methods of other classes
org.apache.mahout.ep: This is the evolutionary optimization using the recorded-step mutation
org.apache.mahout.math: These are various math utility methods and implementations in Hadoop
org.apache.mahout.vectorizer: These are classes for data presentation, manipulation, and MapReduce jobs

官术网_书友最值得收藏!

Machine Learning in Java

Apache Mahout