官术网_书友最值得收藏!

書名： PySpark Cookbook
作者名： Denny Lee Tomasz Drabas
本章字數(shù)： 86字
更新時間： 2021-06-18 19:06:37

How to do it...

In this section, we list common Apache Spark RDD transformations and code snippets. A more complete list can be found at https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations, https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD and https://training.databricks.com/visualapi.pdf.

The transformations include the following common tasks:

Removing the header line from your text file: zipWithIndex()
Selecting columns from your RDD: map()
Running a WHERE (filter) clause: filter()
Getting the distinct values: distinct()
Getting the number of partitions: getNumPartitions()
Determining the size of your partitions (that is, the number of elements within each partition): mapPartitionsWithIndex()

主站蜘蛛池模板：精河县| 梁平县| 壶关县| 黑河市| 正阳县| 汉川市| 乃东县| 台北市| 搜索| 米脂县| 永顺县| 古蔺县| 康定县| 安庆市| 满城县| 霍城县| 惠东县| 神池县| 庆云县| 古浪县| 宜丰县| 广西| 天等县| 开化县| 莫力| 宾阳县| 葫芦岛市| 淮南市| 南木林县| 福安市| 布尔津县| 广元市| 康乐县| 沁阳市| 同心县| 姜堰市| 甘孜县| 乌拉特前旗| 花莲市| 巴彦淖尔市| 兴和县|