官术网_书友最值得收藏!

Big data application architecture

Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.

MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at  https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at  http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at  http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at  https://hbase.apache.org/ .
主站蜘蛛池模板: 阜南县| 筠连县| 荔波县| 大同县| 清流县| 奉化市| 宣汉县| 西乌珠穆沁旗| 五河县| 武安市| 华蓥市| 灵川县| 乌苏市| 西峡县| 大埔区| 海盐县| 吴堡县| 专栏| 石狮市| 奈曼旗| 望谟县| 五华县| 南丹县| 工布江达县| 朝阳区| 巴塘县| 郁南县| 宜昌市| 讷河市| 三江| 奉节县| 清流县| 阿拉尔市| 老河口市| 和顺县| 肃宁县| 中牟县| 镇坪县| 囊谦县| 秭归县| 宜黄县|