書名： Building Data Streaming Applications with Apache Kafka
作者名： Manish Kumar Chanchal Singh
本章字數： 321字
更新時間： 2022-07-12 10:38:11

Kafka origins

Most of you must have used the LinkedIn portal in your professional career. The Kafka system was first built by the LinkedIn technical team. LinkedIn constructed a software metrics collecting system using custom in-house components with some support from existing open source tools. The system was used to collect user activity data on their portal. They use this activity data to show relevant information to each respective user on their web portal. The system was originally built as a traditional XML-based logging service, which was later processed using different Extract Transform Load (ETL) tools. However, this arrangement did not work well for a long time. They started running into various problems. To solve these problems, they built a system called Kafka.

LinkedIn built Kafka as a distributed, fault-tolerant, publish/subscribe system. It records messages organized into topics. Applications can produce or consume messages from topics. All messages are stored as logs to persistent filesystems. Kafka is a write-ahead logging (WAL) system that writes all published messages to log files before making it available for consumer applications. Subscribers/consumers can read these written messages as required in an appropriate time-frame. Kafka was built with the following goals in mind:

Loose coupling between message Producers and message Consumers
Persistence of message data to support a variety of data consumption scenarios and failure handling
Maximum end-to-end throughput with low latency components
Managing diverse data formats and types using binary data formats
Scaling servers linearly without affecting the existing cluster setup

While we will introduce Kafka in more detail in up coming sections, you should understand that one of the common uses of Kafka is in its stream processing architecture. With its reliable message delivery semantics, it helps in consuming high rates of events. Moreover, it provides message replaying capabilities along with support for different types of consumer.

This further helps in making streaming architecture fault-tolerant and supports a variety of alerting and notification services.

官术网_书友最值得收藏!

Building Data Streaming Applications with Apache Kafka

Kafka origins