Unlocking Performance With Batches

Infinitic 0.16.2's new batch processing feature boosts performance by up to 10x, making it ideal for high-scale systems.

Nov 26, 2024

Infinitic 0.16.2 introduces batch processing, a powerful feature designed to optimize performance by processing multiple messages in batches instead of individually. This capability can boost throughput by up to 10x in high-volume scenarios.

By reducing the frequency of network calls and database operations, batching enhances the performance of both Infinitic's internal processes (e.g., database updates) and user-defined tasks (e.g., bulk API calls).

How Batch Processing Works in Infinitic

Message Consumption

Infinitic components can consume multiple messages in a single batch from the message broker. This reduces the number of network calls required to retrieve messages.

For instance, instead of making 1,000 individual network calls to consume 1,000 messages, Infinitic retrieves them all in one call, minimizing latency and network overhead, particularly in high-volume environments.

Message Processing

Once consumed, messages are processed in batches, optimizing performance across two areas:

Internal Components: Infinitic's internal engines, such as the Workflow State Engine and Service Tag Engine, consolidate multiple database operations into a single transaction.
User-defined Tasks: Tasks are grouped and processed together, enabling efficient bulk operations, such as executing a batch of API calls.

Batching also applies to message sending, reducing the number of network calls required for publishing messages. For example, instead of sending 1,000 messages one at a time, a single network call can send them all.

Message Acknowledgment

Similarly, message acknowledgments (or negative acknowledgments) are batched. Rather than sending an acknowledgment for each message, Infinitic groups and sends acknowledgments collectively, further reducing network traffic.

For example, successfully processing 1,000 messages requires just one batch acknowledgment instead of 1,000 individual ones.

Executors: Comparing Processing Modes

Without batching

Messages are processed individually and in parallel based on the concurrency setting. Each message goes through three sequential phases:

Deserialization - The message is deserialized from its transport format
Processing - The actual logic is executed
Acknowledgment - The message broker is notified of successful processing

These phases happen independently for each message, as shown in the diagram below:

With batching

Messages are processed in batches and in parallel. Messages are received into batches based on the batch.maxMessages setting (e.g. 1000 messages per batch).

These batches are then processed in parallel according to the concurrency setting (e.g. 10 concurrent batches).

For example, with concurrency = 10 and batch.maxMessages = 1000, the system can process up to 10,000 messages simultaneously in memory

For Service Executors specifically, messages within each batch can be further re-grouped using a custom batchKey. This allows related messages to be processed together efficiently, as shown in the diagram below:

Workflow State and Tag Engines: Special Considerations

The Workflow State and Tag Engines require special handling to maintain data consistency:

The Workflow State Engine must process messages for a given workflow instance sequentially to maintain workflow state integrity. For example, if workflow A has two pending messages, they must be processed one after another, not simultaneously.
Similarly, the Workflow Tag Engine must process messages for a given workflow tag sequentially, to avoid race conditions.

Without batching

Messages are first sharded by their key (workflow ID for State Engine, workflow tag for Tag Engine) to ensure sequential processing within each key. Then messages are processed individually and in parallel based on the concurrency setting, but messages with the same key are never processed simultaneously. This is illustrated in the diagram below:

With batching

Messages are first collected into batches based on the batch.maxMessages setting (e.g. 1000 messages per batch). These batches are then sharded by their key (workflow ID for State Engine, workflow tag for Tag Engine) to group related messages together. The resulting key-based batches are processed in parallel according to the concurrency setting, while ensuring sequential processing within each key.

This approach optimizes throughput while maintaining data consistency by:

Processing multiple independent keys in parallel
Batching related messages with the same key together
Preserving sequential processing order within each key
Minimizing database transactions through batch processing (only one read and one write by batch)

Real-World Performance Gains

While Infinitic hasn’t yet undergone third-party benchmarking, internal testing highlights its potential:

Local Testing: Up to 10x throughput improvement with batching enabled.
Scale Benefits: More pronounced at larger scales, especially under high-concurrency workloads.

In one test, Infinitic processed 100,000 simple workflows (2 tasks each) in just 1 minute with only 2 workers on a single MacBook.

Get Started with Infinitic 0.16.2

The addition of batch processing underscores Infinitic’s commitment to delivering scalable, high-performance solutions. Whether you’re managing high-throughput systems or orchestrating complex workflows, this feature is a game-changer for efficiency.

Explore the official documentation to start integrating batch processing into your applications today!

Infinitic’s Newsletter

Discussion about this post