官术网_书友最值得收藏!

  • Machine Learning in Java
  • AshishSingh Bhatia Bostjan Kaluza
  • 203字
  • 2021-06-10 19:30:09

Big data application architecture

Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.

MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at  https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at  http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at  http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at  https://hbase.apache.org/ .
主站蜘蛛池模板: 宿迁市| 平乐县| 唐海县| 广饶县| 泽州县| 南华县| 巴林左旗| 长乐市| 布尔津县| 化德县| 朝阳县| 新营市| 绥中县| 庆城县| 湖北省| 读书| 淮滨县| 开远市| 梨树县| 迁西县| 陇西县| 阜新市| 福州市| 屯留县| 石棉县| 郎溪县| 三门县| 泽州县| 循化| 当涂县| 永顺县| 东至县| 建阳市| 正宁县| 东兰县| 恩平市| 济源市| 南漳县| 元阳县| 偃师市| 宜城市|