- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 86字
- 2021-06-18 19:06:37
How to do it...
In this section, we list common Apache Spark RDD transformations and code snippets. A more complete list can be found at https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations, https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD and https://training.databricks.com/visualapi.pdf.
The transformations include the following common tasks:
- Removing the header line from your text file: zipWithIndex()
- Selecting columns from your RDD: map()
- Running a WHERE (filter) clause: filter()
- Getting the distinct values: distinct()
- Getting the number of partitions: getNumPartitions()
- Determining the size of your partitions (that is, the number of elements within each partition): mapPartitionsWithIndex()
推薦閱讀
- Learning Microsoft Windows Server 2012 Dynamic Access Control
- Learning Java Functional Programming
- C#編程入門指南(上下冊)
- Learning Firefox OS Application Development
- 假如C語言是我發(fā)明的:講給孩子聽的大師編程課
- Spring Boot進階:原理、實戰(zhàn)與面試題分析
- 一塊面包板玩轉(zhuǎn)Arduino編程
- Node.js從入門到精通
- INSTANT LESS CSS Preprocessor How-to
- Responsive Web Design with jQuery
- Scratch編程從入門到精通
- 算法精解:C語言描述
- C#程序開發(fā)參考手冊
- Spring Boot 2+Thymeleaf企業(yè)應用實戰(zhàn)
- 大話C語言