官术网_书友最值得收藏!

Introduction

In general, designing and creating a computer system is a balancing act. We're constantly trying to add features and capabilities while keeping the code simple and the system's performance reasonable. In this respect, data analysis systems are no different. In fact, they may be worse. Often data is only partially consistent, and we need to employ a variety of strategies to extract usable data before we can even begin its analysis. Each added strategy adds a little more complexity, weight, and bloat to the code, each of which makes it a little harder to maintain. This can get out of hand.

Clojure has a number of libraries to help us manage our systems' complexity. One of the most powerful of these is concurrent programming. This allows us to conceptualize our programs differently and in ways that can help manage the complexity. Instead of having monolithic blocks of code that do many things and have direct, tight dependencies, we can structure our program more modularly by composing many independent modules together, each of them doing one thing. These communicate using simple, well-defined protocols, but they all work independently and concurrently (that is, at the same time).

Note

Clojure's concurrency features are built upon its Software Transactional Memory (STM) system, which is described at http://clojure.org/refs. This system takes the semantics of the database's transactions, which most developers are familiar with, and applies it to the computer's memory.

Clojure also has a concurrent message processing system (its agents) built on top of the STM. Agents contain state information and we send them function messages to update that state concurrently. Together, the STM and agents provide a way to structure programs to make them maintainable and easy to understand.

Both of these work well because all native Clojure data structures are immutable. They cannot be changed. Because it's working with immutable data, the STM can provide guarantees about the consistency and safety of its transactions, even in a highly concurrent environment. These guarantees are good for us because they help us think and reason about our data and our program, and they help us manage the complexity of the systems we're building.

Note that concurrent describes how a program is structured to work that will hopefully result in some speedup. Each thread may be doing different things, but concurrency is often just a good way to organize your program. It separates out and decouples the different parts of your program that are engaged in different tasks but still are coordinating with each other or interacting in some way. If you're doing the same thing over and over, where everything is independent of all the other things, and you want to do it faster, that's parallelism. We'll look at recipes related to about that in Chapter 4, Improving Performance with Parallel Programming.

主站蜘蛛池模板: 建昌县| 出国| 兴安盟| 江西省| 台中市| 闽清县| 大邑县| 元谋县| 淮安市| 辉县市| 深水埗区| 鄂伦春自治旗| 曲周县| 景东| 巴里| 额尔古纳市| 申扎县| 禄丰县| 安溪县| 上蔡县| 文水县| 安化县| 宝应县| 巴彦淖尔市| 炉霍县| 平谷区| 宁强县| 于田县| 英吉沙县| 明水县| 奉化市| 买车| 大英县| 永城市| 嘉祥县| 霍邱县| 富平县| 长治县| 双流县| 贡山| 大田县|