Tech Kaizen: 2021

Uber Cadence is a fault-tolerant stateful Workflow Orchestrator. Workflows provide primitives to allow application developers to express complex business logic as code. The underlying platform abstracts scalability, reliability and availability concerns from individual developers/teams. It is open-source, developed by Uber and written in Go language. Define workflows in the code and Cadence will make sure that the whole workflow code will execute no matter what!

Cadence is a distributed, scalable, durable, and highly available orchestration engine developed at Uber Engineering to execute asynchronous long-running business logic in a scalable and resilient way. The unit of work that you can execute with a Cadence client is the Workflow . This is basically a Go function, which describes the main flow, precedence, branching, or, for example, iterations of actions. However there are some rules that you have to follow to get a correct and reliable behavior. Workflow encapsulates the orchestration of activities and child workflows.

The workflow is the implementation of the coordination logic. The Cadence programming framework (aka client library) allows you to write the workflow coordination logic as simple procedural code that uses standard Go data modeling. The client library takes care of the communication between the worker service and the Cadence service, and ensures state persistence between events even in case of worker failures. An activity is the implementation of a particular task in the business logic. Activities are implemented as functions. Data can be passed directly to an activity via function parameters.

What makes Cadence different?

What makes Cadence more than a sophisticated distributed task queue manager? You can define workflows in code and let Cadence handle state, timeouts, history and all the other little necessities. Then, those workflows can be inspected after submission, so you can check their progress and results, or send them external signals as needed.

Cadence fault-oblivious stateful code platform preserves a complete multithreaded application state including thread stacks with local variables across hardware and software failures. It greatly simplifies the coding of complex stateful distributed applications. At the same time, it is scalable and robust enough to power hundreds of critical use cases.

Microservice Orchestration and Saga:

The Saga Pattern is a microservices architectural pattern to implement a transaction that spans multiple services. A saga is a sequence of local transactions. Each service in a saga performs its own transaction and publishes an event. The other services listen to that event and perform the next local transaction.

It is common that some business processes are implemented as multiple microservice calls. And the implementation must guarantee that all of the calls must eventually succeed even with the occurrence of prolonged downstream service failures. In some cases, instead of trying to complete the process by retrying for a long time, compensation rollback logic should be executed. Saga Pattern is one way to standardize on compensation APIs.

Cadence is a perfect fit for such scenarios. It guarantees that workflow code eventually completes, has built-in support for unlimited exponential activity retries and simplifies coding of the compensation logic. It also gives full visibility into the state of each workflow, in contrast to an orchestration based on queues where getting a current status of each individual request is practically impossible.

ref:

Uber Cadence - https://cadenceworkflow.io/

Cadence github - https://github.com/uber/cadence

Cadence Get started - https://cadenceworkflow.io/docs/get-started/

Cadence usecases - https://cadenceworkflow.io/docs/use-cases/

Cadence Go Client - https://github.com/uber-go/cadence-client

Cadence Go client samples - https://github.com/uber-common/cadence-samples

Introduction to Cadence - https://banzaicloud.com/blog/introduction-to-cadence/

Uber Codence overview - https://eng.uber.com/open-source-orchestration-tool-cadence-overview/

Cadence workflow orchestrator - https://blog.usejournal.com/cadence-the-only-workflow-orchestrator-you-will-ever-need-ea8f74ed5563

Building your first Cadence Workflow - https://medium.com/stashaway-engineering/building-your-first-cadence-workflow-e61a0b29785

Saga pattern - https://microservices.io/patterns/data/saga.html

Cadence implementation of Saga pattern - https://github.com/uber/cadence-java-client/blob/master/src/main/java/com/uber/cadence/workflow/Saga.java

Misc -

1. Uber Cadence workflow orchestrator engine - https://uber-cadence.blogspot.com/2019/12/how-do-i-set-up-uber-cadence-workflow.html

2. Building Reliable Workflows: Cadence as a Fallback for Event-Driven Processing - https://doordash.engineering/2020/08/14/workflows-cadence-event-driven-processing/

3. Uber Cadence & Kubernetes - https://hub.kubeapps.com/charts/banzaicloud-stable/cadence

Temporal workflow - https://temporal.io/

Temporal github - https://github.com/temporalio, https://github.com/temporalio/temporal

Cadence and Temporal Workflow Engines - https://github.com/firdaus/awesome-cadence-temporal-workflow

Temporal documentation - https://docs.temporal.io/docs/get-started/

Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source distributed event streaming platform to provide a unified, high-throughput, low-latency platform for handling real-time data feeds(high-performance data pipelines), streaming analytics, data integration, and mission-critical applications. Kafka is an open-source software platform developed by the Apache Software Foundation written in Scala and Java.

Apache Kafka was built with the vision to become the central nervous system that makes real-time data available to all the applications that need to use it, with numerous use cases like stock trading and fraud detection, to transportation, data integration, and real-time analytics. It is a distributed streaming platform with plenty to offer from redundant storage of massive data volumes, to a message bus capable of throughput reaching millions of messages each second. These capabilities and more make Kafka a solution that’s tailor-made for processing streaming data from real-time applications.

Kafka is essentially a commit log with a very simplistic data structure. It just happens to be an exceptionally fault-tolerant and horizontally scalable one. The Kafka commit log provides a persistent ordered data structure. Records cannot be directly deleted or modified, only appended onto the log. The order of items in Kafka logs is guaranteed. The Kafka cluster creates and updates a partitioned commit log for each topic that exists. All messages sent to the same partition are stored in the order that they arrive. Because of this, the sequence of the records within this commit log structure is ordered and immutable. Kafka also assigns each record a unique sequential ID known as an “offset,” which is used to retrieve data.

Kafka Terminology:

Kafka uses its own terminology when it comes to its basic building blocks and key concepts. The usage of these terms might vary from other technologies. The following provides a list and definition of the most important concepts of Kafka:

Broker
A broker is a server that stores messages sent to the topics and serves consumer requests.

Topic
A topic is a queue of messages written by one or more producers and read by one or more consumers.

Producer
A producer is an external process that sends records to a Kafka topic.

Consumer
A consumer is an external process that receives topic streams from a Kafka cluster.

Client
Client is a term used to refer to either producers and consumers.

Record
A record is a publish-subscribe message. A record consists of a key/value pair and metadata including a timestamp.

Partition
Kafka divides records into partitions. Partitions can be thought of as a subset of all the records for a topic.