官术网_书友最值得收藏!

Organizational data growth

Although Hadoop allows you to add and remove new nodes dynamically for on-premise cluster setup, it is never a day-to-day task. So, when you approach sizing, you must be cognizant of data growth over the years. For example, if you are building a cluster to process social media analytics, and the organization expects to add x pages a month for processing, sizing needs to be computed accordingly. You may start computing data generation for each with the following formula:

Data Generated in Year X = Data Generated in Year (X-1) X (1 * % Growth) + Data coming from additional sources in year X. 

The following image shows a cluster sizing calculator, which can be used to compute the size of your cluster based on data growth (Excel attached). In this case, for the first year, last year's data can provide an initial size estimate:

While we work through storage sizing, it is worth pointing out another interesting difference between Hadoop and traditional storage systems, that is, Hadoop does not require RAID servers. This is because it does not add value primarily due to the underlying data replication of HDFS, scalability, and high-availability capability.

主站蜘蛛池模板: 麦盖提县| 广州市| 海林市| 枝江市| 盘锦市| 峨山| 镇坪县| 彩票| 克什克腾旗| 广东省| 望奎县| 尚志市| 新安县| 庄河市| 桂林市| 汽车| 依兰县| 磐安县| 洪江市| 绵阳市| 通海县| 安庆市| 平邑县| 安仁县| 九寨沟县| 靖边县| 博白县| 公安县| 连云港市| 荆门市| 平定县| 焉耆| 游戏| 化隆| 富民县| 镇坪县| 晋州市| 张掖市| 北京市| 沐川县| 广宗县|