官术网_书友最值得收藏!

Velocity of data and other factors

The velocity of data generated and transferred to the Hadoop cluster also impacts cluster sizing. Take two scenarios of data population, such as data generated in GBs per minute, as shown in the following diagram:

In the preceding diagram, both scenarios have generated the same data each day, but with a different velocity. In the first scenario, there are spikes of data, whereas the second sees a consistent flow of data. In scenario 1, you will need more hardware with additional CPUs or GPUs and storage over scenario 2. There are many other influencing parameters that can impact the sizing of the cluster; for example, the type of data can influence the compression factor of your cluster. Compression can be achieved with gzip, bzip, and other compression utilities. If the data is textual, the compression is usually higher. Similarly, intermediate storage requirements also add up to an additional 25% to 35%. Intermediate storage is used by MapReduce tasks to store intermediate results of processing. You can access an example Hadoop sizing calculator here.

主站蜘蛛池模板: 信宜市| 贺兰县| 湛江市| 清新县| 阿克陶县| 乌什县| 天峨县| 顺昌县| 阿拉尔市| 德保县| 泽库县| 静乐县| 会理县| 台东市| 新安县| 大化| 石城县| 陆河县| 湟中县| 浦江县| 新化县| 宣武区| 扎兰屯市| 湘潭县| 大荔县| 乡宁县| 通海县| 精河县| 五家渠市| 望谟县| 旺苍县| 宝山区| 昌图县| 长阳| 广安市| 长乐市| 松桃| 武夷山市| 邳州市| 东兰县| 固阳县|