- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 169字
- 2021-06-18 19:06:36
.take(...) method
Now that you have created your RDD (myRDD), we will use the take() method to return the values to the console (or notebook cell). We will now execute an RDD action (more information on this in subsequent recipes), take(). Note that a common approach in PySpark is to use collect(), which returns all values in your RDD from the Spark worker nodes to the driver. There are performance implications when working with a large amount of data as this translates to large volumes of data being transferred from the Spark worker nodes to the driver. For small amounts of data (such as this recipe), this is perfectly fine, but, as a matter of habit, you should pretty much always use the take(n) method instead; it returns the first n elements of the RDD instead of the whole dataset. It is a more efficient method because it first scans one partition and uses those statistics to determine the number of partitions required to return the results.
- Oracle WebLogic Server 12c:First Look
- R語言數據可視化之美:專業圖表繪制指南
- Django:Web Development with Python
- Swift 3 New Features
- 編程數學
- Scientific Computing with Scala
- Spring Boot Cookbook
- The Professional ScrumMaster’s Handbook
- Test-Driven Development with Django
- C++ Application Development with Code:Blocks
- 零基礎看圖學ScratchJr:少兒趣味編程(全彩大字版)
- Mastering SciPy
- After Effects CC案例設計與經典插件(視頻教學版)
- SOA Patterns with BizTalk Server 2013 and Microsoft Azure(Second Edition)
- Mathematica Data Visualization