官术网_书友最值得收藏!

Resilient Distributed Datasets

The core of Spark is a concept called the Resilient Distributed Dataset (RDD). An RDD is a collection of records (strictly speaking, objects of some type) that are distributed or partitioned across many nodes in a cluster (for the purposes of the Spark local mode, the single multithreaded process can be thought of in the same way). An RDD in Spark is fault-tolerant; this means that if a given node or task fails (for some reason other than erroneous user code, such as hardware failure, loss of communication, and so on), the RDD can be reconstructed automatically on the remaining nodes and the job will still be completed.

主站蜘蛛池模板: 九寨沟县| 米易县| 龙泉市| 瓮安县| 墨玉县| 安宁市| 肇东市| 新泰市| 浮山县| 清新县| 东山县| 左权县| 大理市| 新化县| 太仆寺旗| 当雄县| 喀喇沁旗| 荣昌县| 伊金霍洛旗| 岱山县| 鲁甸县| 嵊州市| 九龙县| 平和县| 平南县| 察雅县| 洱源县| 华坪县| 申扎县| 永宁县| 河北省| 大宁县| 临汾市| 惠安县| 龙南县| 锡林浩特市| 新河县| 永宁县| 墨脱县| 博野县| 福贡县|