官术网_书友最值得收藏!

  • Machine Learning in Java
  • AshishSingh Bhatia Bostjan Kaluza
  • 373字
  • 2021-06-10 19:30:08

MALLET

The Machine Learning for Language Toolkit (MALLET) is a large library of natural language processing algorithms and utilities. It can be used in a variety of tasks such as document classification, document clustering, information extraction, and topic modelling. It features a command-line interface as well as a Java API for several algorithms such as Naive Bayes, HMM, Latent Dirichlet topic models, logistic regression, and conditional random fields.

MALLET is available under the Common Public License 1.0, which means that you can even use it in commercial applications. It can be downloaded from http://mallet.cs.umass.edu. A MALLET instance is represented by name, label, data, and source. However, there are two methods to import data into the MALLET format, as shown in the following list:

  • Instance per file: Each file or document corresponds to an instance and MALLET accepts the directory name for the input.
  • Instance per line: Each line corresponds to an instance, where the following format is assumed—the instance_name label token. Data will be a feature vector, consisting of distinct words that appear as tokens and their occurrence count.

The library is comprised of the following packages:

  • cc.mallet.classify: These are algorithms for training and classifying instances, including AdaBoost, bagging, C4.5, as well as other decision tree models, multivariate logistic regression, Naive Bayes, and Winnow2.
  • cc.mallet.cluster: These are unsupervised clustering algorithms, including greedy agglomerative, hill climbing, k-best, and k-means clustering.
  • cc.mallet.extract: This implements tokenizers, document extractors, document viewers, cleaners, and so on.
  • cc.mallet.fst: This implements sequence models, including conditional random fields, HMM, maximum entropy Markov models, and corresponding algorithms and evaluators.
  • cc.mallet.grmm: This implements graphical models and factor graphs such as inference algorithms, learning, and testing, for example, loopy belief propagation, Gibbs sampling, and so on.
  • cc.mallet.optimize: These are optimization algorithms for finding the maximum of a function, such as gradient ascent, limited-memory BFGS, stochastic meta ascent, and so on.
  • cc.mallet.pipe: These are methods as pipelines to process data into MALLET instances.
  • cc.mallet.topics: These are topics modelling algorithms, such as Latent Dirichlet allocation, four-level pachinko allocation, hierarchical PAM, DMRT, and so on.
  • cc.mallet.types: This implements fundamental data types such as dataset, feature vector, instance, and label.
  • cc.mallet.util: These are miscellaneous utility functions such as command-line processing, search, math, test, and so on.
主站蜘蛛池模板: 赤壁市| 始兴县| 正阳县| 丹棱县| 慈溪市| 天峻县| 遵义县| 大理市| 大厂| 威信县| 临西县| 巴中市| 乌兰浩特市| 阜城县| 盐池县| 金山区| 得荣县| 和顺县| 中牟县| 萨嘎县| 休宁县| 平江县| 清河县| 卓尼县| 锦州市| 嵊州市| 松溪县| 邹平县| 林周县| 新兴县| 贵阳市| 黄平县| 西乡县| 北辰区| 张家川| 东乡族自治县| 策勒县| 虞城县| 武威市| 宾阳县| 昌图县|