官术网_书友最值得收藏!

Erasure coding

HDFS has been the fundamental component since the inception of Hadoop. In Hadoop 1.x as well as Hadoop 2.x, a typical HDFS installation uses a replication factor of three.

Compared to the default replication factor of three, EC is probably the biggest change in HDFS in years and fundamentally doubles the capacity for many datasets by bringing down the replication factor from 3 to about 1.4. Let's now understand what EC is all about. 

EC is a method of data protection in which data is broken into fragments, expanded, encoded with redundant data pieces, and stored across a set of different locations or storage. If at some point during this process data is lost due to corruption, then it can be reconstructed using the information stored elsewhere. Although EC is more CPU intensive, this greatly reduces the storage needed for the reliable storing of large amounts of data (HDFS). HDFS uses replication to provide reliable storage and this is expensive, typically requiring three copies of data to be stored, thus causing a 200% overhead in storage space.

主站蜘蛛池模板: 赣榆县| 江永县| 拜城县| 江城| 安宁市| 循化| 吴桥县| 宜昌市| 鸡东县| 安图县| 仁化县| 宜州市| 四平市| 荔浦县| 礼泉县| 铁力市| 永顺县| 两当县| 巴里| 江川县| 灵宝市| 钟祥市| 太和县| 博野县| 遂宁市| 郸城县| 双峰县| 东光县| 平罗县| 邢台市| 安阳市| 台湾省| 民乐县| 措美县| 中江县| 紫金县| 温泉县| 雅江县| 个旧市| 巨野县| 黄平县|