- Machine Learning in Java
- AshishSingh Bhatia Bostjan Kaluza
- 203字
- 2021-06-10 19:30:09
Big data application architecture
Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.
MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at https://hbase.apache.org/ .
推薦閱讀
- 輕輕松松自動化測試
- 中文版Photoshop CS5數碼照片處理完全自學一本通
- 樂高機器人EV3設計指南:創造者的搭建邏輯
- 數據庫原理與應用技術學習指導
- Lightning Fast Animation in Element 3D
- 從零開始學C++
- Building a BeagleBone Black Super Cluster
- Learn QGIS
- 工業機器人入門實用教程
- Red Hat Enterprise Linux 5.0服務器構建與故障排除
- Moodle 2.0 Course Conversion(Second Edition)
- 運動控制系統
- 軟件質量管理實踐
- Flash CS3動畫制作
- 輸送技術、設備與工業應用