- Spark Cookbook
- Rishi Yadav
- 178字
- 2021-07-16 13:44:01
Introduction
Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is the most used storage platform for Spark as it provides cost-effective storage for unstructured and semi-structured data on commodity hardware. Spark is not limited to HDFS and can work with any Hadoop-supported storage.
Hadoop supported storage means a storage format that can work with Hadoop's InputFormat
and OutputFormat
interfaces. InputFormat
is responsible for creating InputSplits
from input data and piding it further into records. OutputFormat
is responsible for writing to storage.
We will start with writing to the local filesystem and then move over to loading data from HDFS. In the Loading data from HDFS recipe, we will cover the most common file format: regular text files. In the next recipe, we will cover how to use any InputFormat
interface to load data in Spark. We will also explore loading data stored in Amazon S3, a leading cloud storage platform.
We will explore loading data from Apache Cassandra, which is a NoSQL database. Finally, we will explore loading data from a relational database.
- Spring Boot 2實戰之旅
- 從零開始構建企業級RAG系統
- 企業級Java EE架構設計精深實踐
- PHP基礎案例教程
- 實用防銹油配方與制備200例
- Learn Programming in Python with Cody Jackson
- JavaScript從入門到精通(第3版)
- Easy Web Development with WaveMaker
- 小程序開發原理與實戰
- iOS編程基礎:Swift、Xcode和Cocoa入門指南
- Learning AngularJS for .NET Developers
- Java并發編程:核心方法與框架
- Learning D
- Learning Shiny
- 系統分析師UML用例實戰