- Machine Learning in Java
- AshishSingh Bhatia Bostjan Kaluza
- 263字
- 2021-06-10 19:30:07
Apache Mahout
The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers.
Mahout features a console interface and the Java API as scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems:
- Item recommendation: Recommending items such as People who liked this movie also liked
- Clustering: Sorting of text documents into groups of topically-related documents
- Classification: Learning which topic to assign to an unlabelled document
Mahout is distributed under a commercially friendly Apache license, which means that you can use it as long as you keep the Apache license included and display it in your program's copyright notice.
Mahout features the following libraries:
- org.apache.mahout.cf.taste: These are collaborative filtering algorithms based on user-based and item-based collaborative filtering and matrix factorization with ALS
- org.apache.mahout.classifier: These are in-memory and distributed implementations, including logistic regression, Naive Bayes, random forest, hidden Markov models (HMM), and multilayer perceptron
- org.apache.mahout.clustering: These are clustering algorithms such as canopy clustering, k-means, fuzzy k-means, streaming k-means, and spectral clustering
- org.apache.mahout.common: These are utility methods for algorithms, including distances, MapReduce operations, iterators, and so on
- org.apache.mahout.driver: This implements a general-purpose driver to run main methods of other classes
- org.apache.mahout.ep: This is the evolutionary optimization using the recorded-step mutation
- org.apache.mahout.math: These are various math utility methods and implementations in Hadoop
- org.apache.mahout.vectorizer: These are classes for data presentation, manipulation, and MapReduce jobs
推薦閱讀
- Word 2003、Excel 2003、PowerPoint 2003上機指導與練習
- 我的J2EE成功之路
- 網絡服務器架設(Windows Server+Linux Server)
- 大數據專業英語
- 極簡AI入門:一本書讀懂人工智能思維與應用
- 程序設計缺陷分析與實踐
- 數控銑削(加工中心)編程與加工
- Matplotlib 3.0 Cookbook
- 21天學通Visual Basic
- 網絡化分布式系統預測控制
- 步步圖解自動化綜合技能
- 網站入侵與腳本攻防修煉
- MATLAB-Simulink系統仿真超級學習手冊
- Mastering Exploratory Analysis with pandas
- Spark大數據商業實戰三部曲:內核解密|商業案例|性能調優