- Machine Learning in Java
- AshishSingh Bhatia Bostjan Kaluza
- 203字
- 2021-06-10 19:30:09
Big data application architecture
Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.
MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at https://hbase.apache.org/ .
推薦閱讀
- 空間機(jī)器人遙操作系統(tǒng)及控制
- TestStand工業(yè)自動(dòng)化測(cè)試管理(典藏版)
- Mastering Salesforce CRM Administration
- 步步圖解自動(dòng)化綜合技能
- Hands-On Reactive Programming with Reactor
- 智能生產(chǎn)線的重構(gòu)方法
- Microsoft System Center Confi guration Manager
- Visual FoxPro程序設(shè)計(jì)
- 計(jì)算機(jī)組網(wǎng)技術(shù)
- 電腦上網(wǎng)輕松入門(mén)
- INSTANT Puppet 3 Starter
- 大數(shù)據(jù)導(dǎo)論
- 人工智能:智能人機(jī)交互
- 新世紀(jì)Photoshop CS6中文版應(yīng)用教程
- Mastercam X5應(yīng)用技能基本功特訓(xùn)