官术网_书友最值得收藏!

Big data patterns

It may seem confusing that lambda can refer to both AWS Lambda functions as well as a pattern in and of itself. The lambda pattern was born from the need to analyze large amounts of data in real time. Before this, the big data movement, where large batch jobs would run to calculate and recalculate things, was in full swing. The problem faced by this movement was that these batch jobs, in order to get the latest results, would need to spend the majority of their computing resources recalculating metrics on data that hadn't changed.

The lambda pattern, which we will discuss in Chapter 7, Data Processing Using the Lambda Pattern, creates two parallel planes of computation, a batch layer, and a speed layer. The naming of these layers should give you an idea of what they're responsible for.

MapReduce is another well-known and tested paradigm that has been popular in the software world for some time now. Hadoop, arguably the most famous framework for MapReduce, helped to bring this pattern front and center after Google's original MapReduce paper in 2004.

As amazing as Hadoop is as a software system, there are substantial hurdles to overcome in running a production Hadoop cluster of your own. Due to this, systems such as Amazon's Elastic MapReduce (EMR) were developed, which provide on-demand Hadoop jobs to the developer. Still, authoring Hadoop jobs and managing the underlying computing resources can be non-trivial. We'll walk through writing your serverless MapReduce system in Chapter 8, The MapReduce Pattern.

主站蜘蛛池模板: 遂川县| 汶上县| 兴海县| 堆龙德庆县| 双辽市| 宁强县| 兰西县| 神木县| 尉犁县| 衢州市| 江达县| 洞口县| 清水县| 蓬安县| 屏东县| 康马县| 老河口市| 托克逊县| 关岭| 浪卡子县| 建德市| 唐河县| 钦州市| 城市| 侯马市| 彰化县| 尉氏县| 齐齐哈尔市| 二连浩特市| 尖扎县| 广昌县| 卢龙县| 区。| 九龙县| 西丰县| 道孚县| 阿勒泰市| 巴东县| 克什克腾旗| 台北县| 缙云县|