官术网_书友最值得收藏!

Understanding the DataSource API

The DataSource API was introduced in Apache Spark 1.1, but is constantly being extended. You have already used the DataSource API without knowing when reading and writing data using SparkSession or DataFrames.

The DataSource API provides an extensible framework to read and write data to and from an abundance of different data sources in various formats. There is built-in support for Hive, Avro, JSON, JDBC, Parquet, and CSV and a nearly infinite number of third-party plugins to support, for example, MongoDB, Cassandra, ApacheCouchDB, Cloudant, or Redis.

Usually, you never directly use classes from the DataSource API as they are wrapped behind the read method of SparkSession or the write method of the DataFrame or Dataset. Another thing that is hidden from the user is schema discovery.

主站蜘蛛池模板: 象山县| 桃园市| 昌乐县| 吕梁市| 枣庄市| 通许县| 镇远县| 广平县| 泽库县| 自贡市| 堆龙德庆县| 平阴县| 民勤县| 那坡县| 丽江市| 乌审旗| 嘉鱼县| 措勤县| 南雄市| 达孜县| 炎陵县| 峨眉山市| 牙克石市| 上栗县| 福鼎市| 芒康县| 柳河县| 文化| 石渠县| 吉水县| 五家渠市| 朝阳区| 大理市| 姚安县| 天柱县| 延津县| 平遥县| 垫江县| 朝阳县| 德昌县| 鱼台县|