Real-Time Analytics Architecture: When to Use Streaming and When Batch Is Still the Right Answer
Key takeaways
- The primary criterion for choosing streaming is the cost of data staleness, separate from how important the data feels.
- Micro-batch covers a large share of real-time requirements at significantly lower operational complexity.
- Streaming jobs run continuously and accumulate cost around the clock, even during low-volume periods, unlike batch jobs that release resources once complete.
- Migration from batch to streaming works best as a progressive, reversible shift, with both pipelines running in parallel before any cutover.
Every week, a team somewhere decides to rebuild a working batch pipeline as a streaming system. The trigger is usually a stakeholder request for real-time data. The definition of real-time and its actual cost rarely get discussed before the rebuild starts. Six months later, the team is managing Kafka offsets and watermark configurations for a dashboard that gets reviewed once each morning.
Streaming is a legitimate tool for specific problems, distinct from a maturity milestone that signals a more sophisticated team. Picking batch when batch meets the requirement is sound engineering. The confusion between these two positions has real consequences: higher infrastructure spend, more complex on-call rotations, and harder to debug failures.
The starting question is how fresh the data needs to be, and what staleness costs when it falls short of that, rather than which architecture sounds more advanced.
What real-time covers, and what it usually does not require
The phrase covers a range wide enough to be nearly meaningless without qualification. At one end are sub-second requirements: fraud checks on a card transaction, dynamic pricing engines, risk systems that must react before an action completes. Stale data in these cases has a direct and measurable cost. In the middle sit second-to-minute windows, which serve live operations dashboards and inventory screens that a small team actively monitors. At the other end, hours-level freshness covers most business intelligence and reporting work, where hourly or even daily updates are entirely adequate.
The useful question to ask early in any design conversation is simple: how does a decision change if the data is five minutes old instead of one hour old? Often the honest answer is that it does not change. The dashboard is reviewed once in the morning, the decision it informs plays out over days, and slightly fresher numbers would not have shifted the outcome. In those cases, the desire is for the dashboard to feel responsive rather than for the business to genuinely react faster.
The cost of staleness is the main criterion. If old data causes missed fraud, a wrong price, or a bad inventory call, latency matters and the operational overhead of streaming is justified. If it does not, the latency advantage of a real streaming system is mostly a cost line with no corresponding benefit.
How much ground micro-batch covers
A large share of requirements labelled real-time are fully satisfied by running batch more frequently. The same SQL transforms, the same orchestration, and the same test suite keep working. Engines like Spark Structured Streaming can operate in a triggered micro-batch mode, making a few-minute refresh available without rebuilding a stack around continuous processing.
For a person looking at a BI dashboard, a one-minute refresh is difficult to distinguish from a live stream. Some teams move a scheduled batch job to a continuously running micro-batch model not to achieve per-record latency, but to remove scheduling gaps and make the pipeline more predictable. This near-real-time processing comes at significantly lower operational complexity than full streaming. The capability is commonly underestimated in architecture conversations, partly because micro-batch sounds less compelling than streaming, and partly because the people pushing for real-time rarely sit with the team that will run it at 3 a.m.
What a production Kafka and Flink stack costs to run
A production streaming setup has two layers. The transport layer, most often Kafka, is a durable, append-only log that moves and stores events. The processing layer, whether Flink, Kafka Streams, or Spark Structured Streaming, performs data manipulation on those events.
The element that gets overlooked in architecture discussions is the always-on nature of streaming jobs. A batch job runs, completes, and releases its resources. A streaming job runs indefinitely, consuming resources around the clock even during low-volume periods. Any system that never stops has more ways to fail than one that starts, does its work, and halts.
Consumer lag is the metric that surfaces streaming failures in practice. It measures how many records the processing consumer is behind the data source. Streaming process failures tend to be quiet: a batch job that fails does not complete, and the absence of results is visible. A streaming job can degrade silently, with lag growing each iteration until nobody notices. At that point, the supposedly real-time pipeline has effectively become a batch system, unintentionally and without warning.
The technical complexity of correct streaming
Speed is rarely the difficult part of streaming. Correctness over data that never stops arriving is the harder problem. The core issue is the gap between event time, when something happened, and processing time, when the system saw it. An event from 12:00:01 can reach the processor after an event from 12:00:03 because of network delays. If aggregations are keyed on processing time, the results quietly diverge from reality.
Watermarks handle this. A watermark tells the system to stop waiting for events older than a specific point in time, allowing time windows to close and produce results. The complementary approach is bounded out-of-orderness: take the latest event time and subtract a configured tolerance. That tolerance is a trade-off between completeness and latency. Set it too tight, and late-arriving events are dropped. Set it too loose, and every result is delayed while the system keeps waiting. Late events will occur regardless of tuning: a source outage, a data backfill, a device that was offline. The safe pattern is to allow some lateness within the window and route anything that arrives later to a side output, capturing it rather than losing it silently.
Processing guarantees deserve the same attention. Exactly-once processing is expensive to implement and operate, and in many systems it is not needed. The right choice depends on the sink. If the sink is idempotent, meaning writing the same record multiple times does not change its final state, at-least-once processing is sufficient. Exactly-once is necessary for payment flows, billing records, and card charging, where duplicate processing causes direct financial errors.
The operational model nobody plans for at design time
Infrastructure is half the cost of streaming. People and the on-call model make up the other half. Streaming has substantially more moving parts than batch: offsets, state stores, checkpoints, watermarks, and schema changes on a live system. These are not set-and-forget configurations.
Teams that operate streaming well treat it as a platform with good abstractions, a schema registry, an SQL interface, and shared monitoring, rather than a collection of individual jobs stitched together. A small group can keep such a system running without requiring one specialist to babysit each pipeline. Distributing operational responsibility across a team is more resilient than concentrating it. The failure mode of concentrated ownership is familiar: a single engineer who understands the streaming jobs gets sick or leaves, and the on-call rotation has no depth.
Failures in streaming are inevitable and can happen at any hour. Leaving a degraded pipeline until morning has consequences. Planning for this from the start, rather than as a reaction after the first production incident, changes the architecture conversation in ways most initial design sessions never surface.
Migrating from batch to streaming: parallel first, cutover later
If streaming is genuinely justified, the migration works far better as a progressive shift than as a single cutover. A sensible starting point is a small set of pipelines where freshness clearly delivers value, chosen specifically to exclude the most critical and most complex jobs. Migrating from batch to streaming changes the failure model and the operational model at the same time, well beyond simply making the same logic run faster. Treating it as a speed upgrade is how teams find themselves debugging watermark configuration in production at a bad hour.
The approach that works is to run both systems in parallel and compare their outputs. Start with a few pipelines, keep the batch system running, and have the streaming system write to a separate output. Validate results against each other before switching any consumers to the streaming output. This parallel-run phase is the most important step before decommissioning the batch pipeline, and it is the step most commonly skipped.
The parallel run is particularly important for catching late-arriving events. Daily aggregates often look correct during normal operation and only diverge after a source outage or backfill, when the old batch numbers fail to agree. Without the parallel run, that disagreement never surfaces, and the wrong numbers reach production.
FAQ
When should a team choose streaming over batch for analytics?
Streaming is justified when the cost of data staleness is concrete and immediate: missed fraud, a wrong price, an inventory error that propagates before correction. If the decision driven by the data plays out over hours or days, batch or micro-batch at a shorter interval is almost always the lower-cost and lower-risk option.
What is the difference between micro-batch and real-time streaming?
Micro-batch runs the same processing logic on a short, recurring interval, typically seconds to minutes, and preserves existing tooling. Real-time streaming processes each event continuously as it arrives, achieving lower latency but requiring offsets, state management, watermarks, and an always-on operational model with dedicated on-call coverage.
How does watermarking work in stream processing?
A watermark tells the engine to stop waiting for events older than a specific point in time, allowing time windows to close and produce results. Setting the tolerance too tight drops late events. Setting it too loose delays results. Events that arrive after the threshold should be routed to a side output rather than silently discarded.
When is exactly-once processing required in streaming?
Exactly-once is required when the sink is not idempotent and duplicate writes produce incorrect outcomes, with payment processing, billing systems, and card charging as the primary examples. For idempotent sinks, at-least-once processing is sufficient and considerably cheaper to implement and operate.
What is consumer lag and why does it matter for streaming pipelines?
Consumer lag measures how many records the processing consumer is behind the data source. Growing lag is the primary signal of a degraded or failing pipeline. Streaming degradation is not always visible immediately: a pipeline with hours of accumulated lag has effectively become a batch system. Continuous lag monitoring is essential.
How should a team approach migration from batch to streaming?
Start with low-criticality pipelines where data freshness delivers clear value. Run the streaming pipeline in parallel with the batch system, writing to a separate output, and compare results before switching any consumers. Keep the batch pipeline running until the streaming output has been validated through normal operation and at least one source-disruption event.