書名： Frank Kane's Taming Big Data with Apache Spark and Python
作者名： Frank Kane
本章字數： 91字
更新時間： 2021-07-02 21:12:19

What is Spark?

According to Apache, Spark is a fast and general engine for large-scale data processing. This is actually a really good summary of what it's all about. If you have a really massive dataset that can represent anything - weblogs, genomics data, you name it - Spark can slice and dice that data up. It can distribute the processing among a huge cluster of computers, taking a data analysis problem that's just too big to run on one machine and divide and conquer it by splitting it up among multiple machines.

官术网_书友最值得收藏!

Frank Kane's Taming Big Data with Apache Spark and Python

What is Spark?