官术网_书友最值得收藏!

  • Machine Learning in Java
  • AshishSingh Bhatia Bostjan Kaluza
  • 203字
  • 2021-06-10 19:30:09

Big data application architecture

Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.

MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at  https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at  http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at  http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at  https://hbase.apache.org/ .
主站蜘蛛池模板: 九寨沟县| 陕西省| 海伦市| 中超| 土默特右旗| 陵川县| 宁津县| 儋州市| 镇雄县| 织金县| 苏尼特右旗| 永平县| 安陆市| 揭西县| 太仆寺旗| 南川市| 集贤县| 南溪县| 垦利县| 屏东市| 谷城县| 天津市| 兴和县| 天台县| 杭锦后旗| 龙南县| 开原市| 台安县| 朝阳区| 姜堰市| 满洲里市| 渝北区| 改则县| 原平市| 高碑店市| 襄垣县| 曲水县| 盐山县| 万州区| 石泉县| 会同县|