官术网_书友最值得收藏!

Organizational data growth

Although Hadoop allows you to add and remove new nodes dynamically for on-premise cluster setup, it is never a day-to-day task. So, when you approach sizing, you must be cognizant of data growth over the years. For example, if you are building a cluster to process social media analytics, and the organization expects to add x pages a month for processing, sizing needs to be computed accordingly. You may start computing data generation for each with the following formula:

Data Generated in Year X = Data Generated in Year (X-1) X (1 * % Growth) + Data coming from additional sources in year X. 

The following image shows a cluster sizing calculator, which can be used to compute the size of your cluster based on data growth (Excel attached). In this case, for the first year, last year's data can provide an initial size estimate:

While we work through storage sizing, it is worth pointing out another interesting difference between Hadoop and traditional storage systems, that is, Hadoop does not require RAID servers. This is because it does not add value primarily due to the underlying data replication of HDFS, scalability, and high-availability capability.

主站蜘蛛池模板: 奇台县| 上犹县| 和田县| 通道| 襄汾县| 安仁县| 治多县| 莒南县| 邵阳市| 孝义市| 浮山县| 汽车| 延吉市| 慈溪市| 库车县| 威信县| 上蔡县| 呼图壁县| 武乡县| 佳木斯市| 洪江市| 高尔夫| 太湖县| 唐山市| 广饶县| 承德市| 保康县| 平和县| 宜黄县| 玛多县| 栾川县| 深泽县| 阜新市| 阳曲县| 绥芬河市| 太康县| 交口县| 屯门区| 达拉特旗| 堆龙德庆县| 巴中市|