官术网_书友最值得收藏!

  • Scala for Data Science
  • Pascal Bugnion
  • 189字
  • 2021-07-23 14:33:07

Chapter 4. Parallel Collections and Futures

Data science often involves processing medium or large amounts of data. Since the previously exponential growth in the speed of individual CPUs has slowed down and the amount of data continues to increase, leveraging computers effectively must entail parallel computation.

In this chapter, we will look at ways of parallelizing computation and data processing over a single computer. Virtually all new computers have more than one processing unit, and distributing a calculation over these cores can be an effective way of hastening medium-sized calculations.

Parallelizing calculations over a single chip is suitable for calculations involving gigabytes or a few terabytes of data. For larger data flows, we must resort to distributing the computation over several computers in parallel. We will discuss Apache Spark, a framework for parallel data processing in Chapter 10, Distributed Batch Processing with Spark.

In this book, we will look at three common ways of leveraging parallel architectures in a single machine: parallel collections, futures, and actors. We will consider the first two in this chapter, and leave the study of actors to Chapter 9, Concurrency with Akka.

主站蜘蛛池模板: 师宗县| 奉节县| 沙坪坝区| 侯马市| 巍山| 肇东市| 文山县| 池州市| 阿鲁科尔沁旗| 军事| 阳泉市| 讷河市| 柳江县| 达州市| 定安县| 观塘区| 古浪县| 阿图什市| 沾益县| 大冶市| 临泉县| 定襄县| 灯塔市| 资阳市| 资中县| 七台河市| 河间市| 潜山县| 德令哈市| 郸城县| 淮安市| 绵阳市| 杭州市| 青岛市| 红原县| 阳春市| 石泉县| 邯郸市| 青阳县| 武义县| 中宁县|