官术网_书友最值得收藏!

Kafka origins

Most of you must have used the LinkedIn portal in your professional career. The Kafka system was first built by the LinkedIn technical team. LinkedIn constructed a software metrics collecting system using custom in-house components with some support from existing open source tools. The system was used to collect user activity data on their portal. They use this activity data to show relevant information to each respective user on their web portal. The system was originally built as a traditional XML-based logging service, which was later processed using different Extract Transform Load (ETL) tools. However, this arrangement did not work well for a long time. They started running into various problems. To solve these problems, they built a system called Kafka.

LinkedIn built Kafka as a distributed, fault-tolerant, publish/subscribe system. It records messages organized into topics. Applications can produce or consume messages from topics. All messages are stored as logs to persistent filesystems. Kafka is a write-ahead logging (WAL) system that writes all published messages to log files before making it available for consumer applications. Subscribers/consumers can read these written messages as required in an appropriate time-frame. Kafka was built with the following goals in mind:

  • Loose coupling between message Producers and message Consumers
  • Persistence of message data to support a variety of data consumption scenarios and failure handling
  • Maximum end-to-end throughput with low latency components
  • Managing diverse data formats and types using binary data formats
  • Scaling servers linearly without affecting the existing cluster setup
While we will introduce Kafka in more detail in up coming sections, you should understand that one of the common uses of Kafka is in its stream processing architecture. With its reliable message delivery semantics, it helps in consuming high rates of events. Moreover, it provides message replaying capabilities along with support for different types of consumer.

This further helps in making streaming architecture fault-tolerant and supports a variety of alerting and notification services.

主站蜘蛛池模板: 和平区| 东安县| 巴马| 浑源县| 宁强县| 大厂| 华安县| 宣汉县| 鱼台县| 丹阳市| 大关县| 黄冈市| 滦南县| 塔城市| 邹城市| 郸城县| 城固县| 图木舒克市| 黔江区| 鄂尔多斯市| 东乡县| 连山| 龙江县| 大名县| 大安市| 武汉市| 临颍县| 昆山市| 新野县| 龙山县| 郁南县| 井冈山市| 黄骅市| 扎鲁特旗| 福建省| 明溪县| 邳州市| 沙坪坝区| 武汉市| 广西| 南城县|