官术网_书友最值得收藏!

How to do it...

In this section, we list common Apache Spark RDD transformations and code snippets. A more complete list can be found at https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations, https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD and https://training.databricks.com/visualapi.pdf.

The transformations include the following common tasks:

  • Removing the header line from your text file: zipWithIndex()
  • Selecting columns from your RDD: map()
  • Running a WHERE (filter) clause: filter()
  • Getting the distinct values: distinct()
  • Getting the number of partitions: getNumPartitions()
  • Determining the size of your partitions (that is, the number of elements within each partition): mapPartitionsWithIndex()
主站蜘蛛池模板: 精河县| 梁平县| 壶关县| 黑河市| 正阳县| 汉川市| 乃东县| 台北市| 搜索| 米脂县| 永顺县| 古蔺县| 康定县| 安庆市| 满城县| 霍城县| 惠东县| 神池县| 庆云县| 古浪县| 宜丰县| 广西| 天等县| 开化县| 莫力| 宾阳县| 葫芦岛市| 淮南市| 南木林县| 福安市| 布尔津县| 广元市| 康乐县| 沁阳市| 同心县| 姜堰市| 甘孜县| 乌拉特前旗| 花莲市| 巴彦淖尔市| 兴和县|