- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 130字
- 2021-07-02 18:55:28
Understanding the DataSource API
The DataSource API was introduced in Apache Spark 1.1, but is constantly being extended. You have already used the DataSource API without knowing when reading and writing data using SparkSession or DataFrames.
The DataSource API provides an extensible framework to read and write data to and from an abundance of different data sources in various formats. There is built-in support for Hive, Avro, JSON, JDBC, Parquet, and CSV and a nearly infinite number of third-party plugins to support, for example, MongoDB, Cassandra, ApacheCouchDB, Cloudant, or Redis.
Usually, you never directly use classes from the DataSource API as they are wrapped behind the read method of SparkSession or the write method of the DataFrame or Dataset. Another thing that is hidden from the user is schema discovery.
- Getting Started with React
- Computer Vision for the Web
- 深入淺出Android Jetpack
- Windows Server 2016 Automation with PowerShell Cookbook(Second Edition)
- Python機器學習基礎教程
- Oracle從入門到精通(第5版)
- Django 5企業級Web應用開發實戰(視頻教學版)
- 零基礎學HTML+CSS第2版
- Machine Learning for OpenCV
- Web前端開發技術實踐指導教程
- Learn C Programming
- Learning Swift
- ACE技術內幕:深入解析ACE架構設計與實現原理
- Unity Certified Programmer:Exam Guide
- 3D Printing Designs:Fun and Functional Projects