官术网_书友最值得收藏!

How it works...

In this recipe, we implement a Command-Line Interface (CLI) program that reads events from the data lake S3 bucket and sends them to a specific AWS Lambda function. When replaying events, we do not re-publish the events because this would broadcast the events to all subscribers. Instead, we want to replay events to a specific function to either repair the specific service or seed a new service.

When executing the program, we provide the name of the data lake bucket and the specific path prefix as arguments. The prefix allows us to replay only a portion of the events, such as a specific month, day, or hour. The program uses functional reactive programming with the Highland.js library. We use a generator function to page through the objects in the bucket and push each object down the stream. Backpressure is a major advantage of this programming approach, as we will discuss in Chapter 8, Designing for Failure. If we retrieved all the data from the bucket in a loop, as we would in the imperative programming style, then we would likely run out of memory and/or overwhelm the Lambda function and receive throttling errors.

Instead, we pull data through the stream. When downstream steps are ready for more work they pull the next piece of data. This triggers the generator function to paginate data from S3 when the program is ready for more data.

When storing events in the data lake bucket, Kinesis Firehose buffers the events until a maximum amount of time is reached or a maximum file size is reached. This buffering maximizes the write performance when saving the events. When transforming the data for these files, we delimited the events with an EOL character. Therefore, when we get a specific file, we leverage the Highland.js split function to stream each row in the file one at a time. The split function also supports backpressure.

For each event, we invoke the function specified in the command-line arguments. These functions are designed to listen for events from a Kinesis stream. Therefore, we must wrap each event in the Kinesis input format that these functions are expecting. This is one reason why we included the Kinesis metadata when saving the events to the data lake in the Creating a data lake recipe. To maximize throughput, we invoke the Lambda asynchronously with the Event InvocationType, provided that the payload size is within the limits. Otherwise, we invoke the Lambda synchronously with the RequestReponse InvocationType. We also leverage the Lambda DryRun feature so that we can see what events might be replayed before actually effecting the change.

主站蜘蛛池模板: 图们市| 云霄县| 石阡县| 花垣县| 南康市| 淮阳县| 邹平县| 安丘市| 尚志市| 沙田区| 海宁市| 凌云县| 新营市| 万全县| 湟源县| 三亚市| 汉川市| 南投县| 图们市| 芜湖市| 长垣县| 监利县| 进贤县| 徐闻县| 孝昌县| 颍上县| 青神县| 独山县| 福海县| 古浪县| 随州市| 句容市| 榆社县| 剑阁县| 泸水县| 卢氏县| 河津市| 四川省| 抚松县| 和政县| 牟定县|