官术网_书友最值得收藏!

How it works...

The most important characteristic of a data lake is that it stores data in perpetuity. The only way to really meet this requirement is to use object storage, such as AWS S3. S3 provides 11 nines of durability. Said another way, S3 provides 99.999999999% durability of objects over a given year. It is also fully managed and provides life cycle management features to age objects into cold storage. Note that the bucket is defined with the DeletionPolicy set to Retain. This highlights that even if the stack is deleted, we still want to ensure that we are not inappropriately deleting this valuable data.

We are using Kinesis Firehose because it performs the heavy lifting of writing the events to the bucket. It provides a buffer based on the time and size, compression, encryption, and error handling. To simplify this recipe, I did not use compression or encryption, but it is recommended that you use these features.

This recipe defines one delivery stream, because in this cookbook, our stream topology consists of only one stream with ${cf:cncb-event-stream-${opt:stage}.streamArn}. In practice, your topology will consist of multiple streams, and you will define one Firehose delivery stream per Kinesis stream to ensure that the data lake is capturing all events. We set prefix to ${cf:cncb-event-stream-${opt:stage}.streamName}/ so that we can easily distinguish the events in the data lake by their stream.

Another important characteristic of a data lake is that the data is stored in its raw format, without modification. To this end, the transformer function adorns all available metadata about the specific Kinesis stream and Firehose delivery stream, to ensure that all available information is collected. In the Replaying events recipe, we will see how this metadata can be leveraged. Also, note that transformer adds the end-of-line character (\n) to facilitate future processing of the data.

主站蜘蛛池模板: 高密市| 淮南市| 调兵山市| 伊吾县| 四川省| 卫辉市| 台东县| 车致| 乐清市| 襄樊市| 宝兴县| 如皋市| 盐边县| 尼勒克县| 钟祥市| 襄城县| 依安县| 新营市| 黄石市| 聂荣县| 夏津县| 松潘县| 什邡市| 淮安市| 公主岭市| 门源| 阜新市| 平乡县| 东乌珠穆沁旗| 维西| 花莲市| 大城县| 卢龙县| 武平县| 明光市| 诸城市| 波密县| 林州市| 延吉市| 榆林市| 盖州市|