Stream Processing

Transmitting Event Streams

Messaging Systems

What happens if the producers send messages faster than the consumers can process them?

  • Drop messages
  • Buffer messages in a queue
  • Apply back pressure (flow control)

Message loss

Whether message loss is acceptable depends very much on the application. For example, with sensorreadings and metrics that are transmitted periodically, an occasional missing data point is perhapsnot important, since an updated value will be sent a short time later anyway

Traditional messaging systems

No message loss.

Traditional view of message brokers, which is encapsulated in standards like JMS and AMQP and implemented in softwarelike RabbitMQ, ActiveMQ, HornetQ, Qpid, TIBCO Enterprise Message Service, IBM MQ, Azure ServiceBus, and Google Cloud Pub/Sub.

In order toensure that the message is not lost, message brokers use acknowledgments: a client mustexplicitly tell the broker when it has finished processing a message so that the broker can removeit from the queue

Log-based messaging systems

tail -f like message queues

Apache Kafka, Amazon Kinesis Streams, and Twitter’s DistributedLog are log-based messagebrokers that work like this. Google Cloud Pub/Sub is architecturally similar but exposes aJMS-style API rather than a log abstraction.

Replicate changes made to the System of record (source of truth), consistently across cache, search indexes and more: Dual writes (write in SoR and then Search Index or anything else) has race condition

Streams analytics

Many open source distributed stream processing frameworks are designed with analytics in mind: forexample, Apache Storm, Spark Streaming, Flink, Concord, Samza, and Kafka Streams.Hosted services include Google Cloud Dataflow and Azure Stream Analytics.

Consuming messages

  • Load balancing: First consumer gets the message which then gets deleted
  • Fan out: All consumers get the message

Which clock to follow?

Which timestamp to trust: User/Sensor = Device, Server, Other? Use at least 3!

  • The time at which the event occurred, according to the device clock
  • The time at which the event was sent to the server, according to the device clock
  • The time at which the event was received by the server, according to the server clock

Summary

  • AMQP/JMS-style message broker (traditional)
    • The broker assigns individual messages to consumers, and consumers acknowledge individual messages when they have been successfully processed. Messages are deleted from the broker once they have been acknowledged.
  • Log-based message broker
    • The broker assigns all messages in a partition to the same consumer node, and always delivers messages in the same order. Parallelism is achieved through partitioning, and consumers track their progress by checkpointing the offset of the last message they have processed. The broker retains messages on disk, so it is possible to jump back and reread old messages if necessary.

results matching ""

    No results matching ""