- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 109字
- 2021-07-02 18:55:31
RDDs versus DataFrames versus Datasets
To make it clear, we are discouraging you from using RDDs unless there is a strong reason to do so for the following reasons:
- RDDs, on an abstraction level, are equivalent to assembler or machine code when it comes to system programming
- RDDs express how to do something and not what is to be achieved, leaving no room for optimizers
- RDDs have proprietary syntax; SQL is more widely known
Whenever possible, use Datasets because their static typing makes them faster. As long as you are using statically typed languages such as Java or Scala, you are fine. Otherwise, you have to stick with DataFrames.
推薦閱讀
- HTML5+CSS3王者歸來
- ASP.NET Core 2 and Vue.js
- Reactive Programming With Java 9
- Python機器學習經典實例
- Jupyter數據科學實戰
- C++20高級編程
- R數據科學實戰:工具詳解與案例分析
- Android群英傳
- C語言程序設計教程
- MongoDB Cookbook
- Clojure for Finance
- Moodle 3.x Developer's Guide
- 前端程序員面試筆試真題與解析
- Building an E-Commerce Application with MEAN
- Drupal 7 Development by Example Beginner’s Guide