- Apache Hadoop 3 Quick Start Guide
- Hrishikesh Vijay Karambelkar
- 197字
- 2021-06-10 19:18:44
Organizational data growth
Although Hadoop allows you to add and remove new nodes dynamically for on-premise cluster setup, it is never a day-to-day task. So, when you approach sizing, you must be cognizant of data growth over the years. For example, if you are building a cluster to process social media analytics, and the organization expects to add x pages a month for processing, sizing needs to be computed accordingly. You may start computing data generation for each with the following formula:
Data Generated in Year X = Data Generated in Year (X-1) X (1 * % Growth) + Data coming from additional sources in year X.
The following image shows a cluster sizing calculator, which can be used to compute the size of your cluster based on data growth (Excel attached). In this case, for the first year, last year's data can provide an initial size estimate:

While we work through storage sizing, it is worth pointing out another interesting difference between Hadoop and traditional storage systems, that is, Hadoop does not require RAID servers. This is because it does not add value primarily due to the underlying data replication of HDFS, scalability, and high-availability capability.
- Dreamweaver CS3網(wǎng)頁制作融會貫通
- 機器人智能運動規(guī)劃技術(shù)
- Hands-On Cybersecurity with Blockchain
- 計算機網(wǎng)絡(luò)技術(shù)基礎(chǔ)
- WordPress Theme Development Beginner's Guide(Third Edition)
- JSF2和RichFaces4使用指南
- 傳感器與物聯(lián)網(wǎng)技術(shù)
- 網(wǎng)絡(luò)服務搭建、配置與管理大全(Linux版)
- 大數(shù)據(jù)技術(shù)基礎(chǔ):基于Hadoop與Spark
- 液壓機智能故障診斷方法集成技術(shù)
- 網(wǎng)絡(luò)脆弱性掃描產(chǎn)品原理及應用
- 從零開始學JavaScript
- WOW!Photoshop CS6完全自學寶典
- 運動控制系統(tǒng)
- Hands-On Generative Adversarial Networks with Keras