- Practical Big Data Analytics
- Nataraj Dasgupta
- 235字
- 2021-07-02 19:26:25
Big Data With Hadoop
Hadoop has become the de facto standard in the world of big data, especially over the past three to four years. Hadoop started as a subproject of Apache Nutch in 2006 and introduced two key features related to distributed filesystems and distributed computing, also known as MapReduce, that caught on very rapidly among the open source community. Today, there are thousands of new products that have been developed leveraging the core features of Hadoop, and it has evolved into a vast ecosystem consisting of more than 150 related major products. Arguably, Hadoop was one of the primary catalysts that started the big data and analytics industry.
In this chapter, we will discuss the background and core concepts of Hadoop, the components of the Hadoop platform, and delve deeper into the major products in the Hadoop ecosystem. We will learn about the core concepts of distributed filesystems and distributed processing and optimizations to improve the performance of Hadoop deployments. We'll conclude with real-world hands-on exercises using the Cloudera Distribution of Hadoop (CDH). The topics we will cover are:
- The basics of Hadoop
- The core components of Hadoop
- Hadoop 1 and Hadoop 2
- The Hadoop Distributed File System
- Distributed computing principles with MapReduce
- The Hadoop ecosystem
- Overview of the Hadoop ecosystem
- Hive, HBase, and more
- Hadoop Enterprise deployments
- In-house deployments
- Cloud deployments
- Hands-on with Cloudera Hadoop
- Using HDFS
- Using Hive
- MapReduce with WordCount
- Learning Microsoft Azure Storage
- 工業機器人產品應用實戰
- 基于LabWindows/CVI的虛擬儀器設計與應用
- 計算機應用基礎·基礎模塊
- Getting Started with Oracle SOA B2B Integration:A Hands-On Tutorial
- 計算機網絡技術實訓
- Hybrid Cloud for Architects
- 大數據驅動的設備健康預測及維護決策優化
- 工業機器人安裝與調試
- 貫通Java Web開發三劍客
- Windows Server 2003系統安全管理
- 邊緣智能:關鍵技術與落地實踐
- Building a BeagleBone Black Super Cluster
- MATLAB-Simulink系統仿真超級學習手冊
- 從零開始學JavaScript