官术网_书友最值得收藏!

Big data application architecture

Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.

MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at  https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at  http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at  http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at  https://hbase.apache.org/ .
主站蜘蛛池模板: 松潘县| 赤城县| 麻城市| 乌苏市| 江西省| 云龙县| 武威市| 临武县| 南通市| 新巴尔虎左旗| 徐州市| 江北区| 浠水县| 镇沅| 钦州市| 凉城县| 额尔古纳市| 忻州市| 曲沃县| 甘泉县| 隆子县| 茶陵县| 迁西县| 桐城市| 重庆市| 翁牛特旗| 博野县| 灵山县| 威宁| 庆阳市| 静海县| 十堰市| 高碑店市| 云南省| 多伦县| 洛隆县| 屏山县| 沂水县| 房产| 马山县| 古交市|