Real-Time vs. Batch Analytics: Making the Right Architecture Choice

Jan 20
10 min read

In my recent posts about building autonomous analytics systems and first-party data strategy, I've focused on what to build and why. Now let's talk about a critical architectural decision that affects cost, complexity, and capabilities: real-time versus batch processing.

Everyone wants real-time analytics. The promise is compelling: see what's happening right now, respond instantly to changes, make decisions with the most current data possible. But here's what I've learned after building both types of systems: most organizations don't actually need real-time analytics, and many that implement it end up with expensive infrastructure solving problems they don't have.

This post is about making the right choice for your specific situation, understanding the trade-offs, and knowing when each approach makes sense.

Understanding the Spectrum

First, let's clarify what we're actually talking about, because "real-time" means different things to different people.

Batch Processing

Batch processing analyzes data in scheduled intervals—typically hourly, daily, or weekly. Data accumulates, then gets processed all at once.

Example: Your GA4 data exports to BigQuery once daily at 3 AM. Your dashboard updates every morning with yesterday's complete data.

Characteristics:

Processes complete datasets
Scheduled execution (daily, hourly)
Optimized for throughput, not latency
Lower cost and complexity
Data is always somewhat "stale"

Near Real-Time (Micro-Batch)

Near real-time processes data in small, frequent batches—typically every few minutes to every 15 minutes.

Example: Your system checks for new events every 5 minutes, processes them together, and updates metrics.

Characteristics:

Small batches processed frequently
Minutes of latency, not hours
Balances freshness with efficiency
Moderate cost and complexity
Good enough for most "real-time" needs

True Real-Time (Streaming)

True real-time processes each event individually as it arrives, typically with sub-second latency.

Example: Click happens → processed immediately → dashboard updates instantly → alerts fire within seconds.

Characteristics:

Individual event processing
Sub-second to few-seconds latency
Optimized for speed over throughput
High cost and complexity
Necessary for specific use cases

The key insight: what most people call "real-time" is actually near real-time, and that's usually what they actually need.

The Real-Time Hype: When It's Justified

Let's start with when you actually do need real-time or near real-time capabilities, because these use cases are real and important.

Fraud Detection and Security

When someone's trying to use a stolen credit card or compromise an account, seconds matter. You need to detect suspicious patterns and block transactions before they complete.

Latency requirement: Sub-second to few seconds
Why batch doesn't work: Fraud happens fast. By tomorrow morning, the damage is done.
Architecture needed: True real-time streaming

Operational Monitoring and Alerting

If your website goes down, you need to know immediately, not in tomorrow's daily report. Same for critical infrastructure failures, performance degradation, or security incidents.

Latency requirement: Seconds to minutes
Why batch doesn't work: Hours of downtime before you notice is unacceptable.
Architecture needed: Near real-time to real-time, depending on criticality

Live Personalization

Showing different content based on what someone just clicked requires processing that interaction before they navigate to the next page.

Latency requirement: Sub-second
Why batch doesn't work: The moment has passed by the time batch processing runs.
Architecture needed: True real-time for in-session personalization

High-Frequency Trading and Bidding

Ad exchanges, financial trading, and similar environments where milliseconds affect outcomes require true real-time processing.

Latency requirement: Milliseconds to sub-second
Why batch doesn't work: Opportunities vanish instantly.
Architecture needed: True real-time with extreme optimization

Real-Time Dashboards for Operations

Customer service teams, operations centers, or logistics coordinators who make decisions based on current state need fresh data.

Latency requirement: Minutes
Why batch doesn't work: Decisions need to be based on current reality, not yesterday's snapshot.
Architecture needed: Near real-time is usually sufficient

Notice what's not on this list: most analytics and reporting use cases. That's intentional.

When Batch Processing Is Actually Better

Here's the uncomfortable truth: for most analytics use cases, batch processing is not only sufficient—it's often superior.

Daily Business Reporting

Your executive reviewing yesterday's revenue doesn't need sub-second updates. They need accurate, complete data for making strategic decisions.

Why batch wins:

Complete data (all events processed and reconciled)
Lower cost (efficient bulk processing)
Simpler architecture (fewer failure modes)
More reliable (proven, stable patterns)

What you lose with real-time: Nothing meaningful for this use case

Marketing Performance Analysis

Understanding which campaigns drove results over the past week doesn't require real-time processing. You're analyzing trends and patterns, not reacting to individual events.

Why batch wins:

Can run complex calculations efficiently
Attribution models need complete journey data
Joins across multiple data sources easier
Statistical significance requires accumulation

What you lose with real-time: Unnecessary complexity and cost

Cohort and Retention Analysis

Analyzing user cohorts over weeks or months is inherently backward-looking. Real-time processing adds no value.

Why batch wins:

Large-scale aggregations more efficient in batch
Historical comparisons natural in batch processing
Complete data ensures accurate calculations

What you lose with real-time: Nothing

Machine Learning Model Training

Training predictive models on historical data doesn't benefit from real-time processing. You need complete, clean datasets.

Why batch wins:

Can process massive datasets efficiently
Complex feature engineering easier
Validation and testing require complete data
Version control and reproducibility simpler

What you lose with real-time: Nothing for training (inference is different)

Financial Reconciliation

Month-end closes, financial reporting, and reconciliation require complete, accurate data. Speed is less important than correctness.

Why batch wins:

Can ensure all transactions processed
Complex calculations and checks feasible
Audit trails easier to maintain
Regulatory requirements often specify batch processing

What you lose with real-time: Nothing—and you avoid risk of incomplete data

The Cost Reality Nobody Talks About

Let's talk about what real-time analytics actually costs, because this often gets glossed over in the sales pitch.

Infrastructure Costs

Batch processing:

Run compute only when needed
Can use spot instances or preemptible VMs (much cheaper)
Scale down or shut off between runs
Optimize for throughput (efficient resource use)

Real-time processing:

Always-on infrastructure (24/7 costs)
Must provision for peak load (can't scale down)
Requires redundancy for reliability
Optimized for latency (less efficient resource use)

Cost multiplier: 3-10x for equivalent processing volume

Operational Complexity

Batch processing:

Well-understood failure modes
Can retry entire batches if something fails
Debugging with complete logs
Scheduled maintenance windows

Real-time processing:

Complex failure scenarios (what if stream falls behind?)
Can't easily replay without complex checkpointing
Debugging requires trace sampling
No maintenance windows (must maintain uptime)

Engineering time multiplier: 2-5x for building and maintaining

Data Quality Trade-Offs

Batch processing:

Can validate entire datasets before processing
Easy to implement complex quality checks
Reprocess if errors found
Complete data ensures accurate results

Real-time processing:

Limited validation per event
Late-arriving data complicates analysis
Difficult to correct errors after processing
May need separate reconciliation process

Hidden cost: Data quality issues often require batch reconciliation anyway

Actual Cost Example

Let's make this concrete with a real scenario: processing 100 million events per day.

Batch approach:

Daily BigQuery batch job: ~$50/month in compute
Storage: ~$20/month
Monitoring and orchestration: ~$10/month
Total: ~$80/month

Real-time approach:

Streaming infrastructure (Kafka/Pub-Sub): ~$500/month
Always-on processing workers: ~$800/month
Storage (still need it): ~$20/month
Monitoring and observability: ~$200/month
Total: ~$1,520/month

That's a 19x cost increase for real-time processing of the same data volume. Is your use case worth $17,000 per year in additional costs?

The Middle Ground: Near Real-Time

For many use cases that seem to require real-time processing, near real-time is actually the better answer.

What Near Real-Time Provides

Data fresher than daily batch (minutes instead of hours)
Significantly lower cost than true real-time
Simpler architecture than streaming
Good enough for most monitoring and alerting needs

Implementation Patterns

Micro-batch processing:

Run batch jobs every 5-15 minutes instead of daily
Use the same batch processing code and patterns
Much easier to implement than streaming
Dramatically lower cost than real-time

Example: My anomaly detection system runs every hour. That's near real-time enough—I don't need to know about an anomaly in the next 60 seconds, but I do want to know within a few hours.

Incremental processing:

Process only new data since last run
Maintain state about what's been processed
Combine with batch infrastructure
Update results incrementally

Hybrid approach:

Batch for complete, accurate daily reporting
Near real-time for monitoring and alerting
Best of both worlds at reasonable cost

This hybrid pattern is what most organizations actually need but often don't consider because they're focused on the extremes.

Making the Right Choice: A Decision Framework

Here's a practical framework for deciding which approach to use.

Ask These Questions

1. What's the actual decision latency?

How quickly do humans or systems need to act on this data?

Sub-second: True real-time
Minutes: Near real-time
Hours: Batch is fine
Days: Definitely batch

2. What's the cost of being wrong?

What happens if you decide based on data that's a few hours old?

Catastrophic: Real-time justified
Significant: Consider near real-time
Minor: Batch is appropriate

3. Do you need complete data?

Is accuracy more important than freshness?

Yes: Batch provides better accuracy
No: Real-time might be acceptable

4. What's your budget and team size?

Can you afford the infrastructure and engineering costs?

Limited budget, small team: Start with batch
Healthy budget, experienced team: Can consider real-time for key use cases
Unlimited budget: Still should be selective about real-time

5. What's the data volume?

Low volume (<1M events/day): Either approach works
Medium volume (1M-100M/day): Cost differential becomes significant
High volume (>100M/day): Real-time very expensive, batch more efficient

Decision Matrix

Use Case	Decision Latency	Complete Data Needed	Volume	Recommendation
Daily revenue reporting	Daily	Yes	High	Batch
Fraud detection	Seconds	No	High	Real-time
Marketing dashboard	Hourly	Somewhat	Medium	Near real-time
Anomaly alerting	Minutes	No	Medium	Near real-time
Financial reconciliation	Daily	Yes	High	Batch
System monitoring	Seconds	No	High	Real-time
Customer analytics	Daily	Yes	High	Batch
A/B test analysis	Daily	Yes	High	Batch
Real-time personalization	Sub-second	No	Very high	Real-time
BI dashboards	Hourly	Somewhat	Medium	Near real-time or Batch

Common Mistakes and How to Avoid Them

Having seen organizations implement both approaches, here are the mistakes to watch for.

Mistake 1: Premature Real-Time

Building streaming infrastructure before you have working batch processing and know what you actually need.

Why it happens: Real-time sounds better, engineers want to learn streaming tech, vendors push it.

Better approach: Start with batch, identify pain points, add real-time only where truly needed.

Mistake 2: Real-Time Everything

Assuming all use cases need the same latency and building everything as streaming.

Why it happens: Simplicity of a single architecture, "while we're building streaming..."

Better approach: Match architecture to use case requirements. It's okay to have both batch and streaming.

Mistake 3: Ignoring Data Quality

Prioritizing speed over accuracy, then being surprised when nobody trusts the data.

Why it happens: Real-time requirements push for fast processing, quality checks take time.

Better approach: Define quality requirements upfront, implement validation even in streaming, accept that some use cases need batch for accuracy.

Mistake 4: Underestimating Operational Complexity

Building real-time systems without adequate monitoring, alerting, and on-call procedures.

Why it happens: Focus on building the system, operational needs only apparent after deployment.

Better approach: Plan for operations from the start. Real-time systems need real-time monitoring and support.

Mistake 5: No Reconciliation Process

Running real-time processing without periodic batch reconciliation to catch errors.

Why it happens: Seems redundant to run both real-time and batch for the same data.

Better approach: Even with real-time, run periodic batch reconciliation to ensure accuracy and catch processing errors.

Real-World Examples

Let me share some concrete examples from organizations I've worked with.

E-commerce Company: Right-Sized Their Approach

Initial state: Everything in batch, daily updates
Pain point: Customer service couldn't see today's orders in dashboards
What they built: Near real-time dashboard updating every 15 minutes for customer service, kept batch for everything else
Result: Solved the problem for <10% the cost of full real-time, customer service happy

Lesson: Most "real-time" needs are actually "more frequent batch" needs.

SaaS Platform: Where Streaming Mattered

Initial state: Batch processing, daily aggregations
Pain point: System outages not detected until next morning, costing money and reputation
What they built: Real-time monitoring with second-level alerting for critical systems
Result: Catch and fix issues in minutes instead of hours, fully justified the cost

Lesson: When latency directly impacts operations, real-time is worth it.

Media Company: Hybrid Success

Initial state: Trying to build everything as streaming, bogged down in complexity
Pain point: Six months in, still didn't have working analytics
What they built: Batch for all reporting and analysis, real-time only for content recommendation engine
Result: Got analytics working in 6 weeks with batch, added streaming for recommendations later

Lesson: Don't let perfect (real-time everything) be the enemy of good (working batch).

Financial Services: Batch for Compliance

Initial state: Wanted real-time financial reporting
Pain point: Regulatory requirements demanded complete, auditable daily reconciliation
What they built: Batch for compliance reporting, near real-time for operational monitoring
Result: Met regulatory requirements while giving operations teams current visibility

Lesson: Sometimes batch isn't just cheaper, it's required for correctness.

Connecting to Your Broader Strategy

This decision about batch versus real-time connects directly to the broader data and AI strategy I've been discussing in recent posts.

First-party data strategy: The architecture you choose affects how quickly you can activate your first-party data. Real-time personalization requires streaming. Strategic reporting works fine with batch.
Anomaly detection: My autonomous anomaly detection system runs daily because that matches the use case. System monitoring anomalies would need real-time. Match the architecture to the decision latency.
Team capabilities: Real-time systems require different skills than batch. Consider your team's current capabilities and learning curve when choosing architecture.
Cost and sustainability: Real-time has ongoing costs. Ensure the value justifies the expense before committing to streaming infrastructure.

The right architecture choice enables your analytics and AI systems to deliver value efficiently. The wrong choice creates expensive complexity that doesn't solve actual problems.

Making Your Decision

Here's my recommendation for most organizations reading this:

Start with batch. Get your data pipelines working, your warehouse set up, your dashboards delivering value. This is the foundation everything else builds on.
Identify specific pain points where batch latency is genuinely causing problems. Document them clearly. Quantify the impact.
Try near real-time first. Run your batch jobs more frequently. This solves many "real-time" needs at a fraction of the cost and complexity.
Implement real-time streaming only for use cases where:
- The value clearly justifies the cost
- Decision latency genuinely requires seconds or minutes
- You have the team and budget to support it properly
- Near real-time isn't sufficient

Maintain both. Even with streaming, keep batch processing for reconciliation, complex analysis, and as a fallback when streaming has issues.

The goal isn't to have the most sophisticated architecture. It's to have the right architecture for your specific needs—one that delivers value at reasonable cost with acceptable complexity.

Real-time analytics sounds impressive. But solving real business problems efficiently is what actually matters.

What's your experience with real-time versus batch processing? I'm particularly interested in cases where near real-time turned out to be the right answer, or where switching from real-time back to batch improved things. The architecture landscape is evolving, and learning from each other's experiences helps everyone make better decisions.

Understanding the Spectrum

Batch Processing

Near Real-Time (Micro-Batch)

True Real-Time (Streaming)

The Real-Time Hype: When It's Justified

Fraud Detection and Security

Operational Monitoring and Alerting

Live Personalization

High-Frequency Trading and Bidding

Real-Time Dashboards for Operations

When Batch Processing Is Actually Better

Daily Business Reporting

Marketing Performance Analysis

Cohort and Retention Analysis

Machine Learning Model Training

Financial Reconciliation

The Cost Reality Nobody Talks About

Infrastructure Costs

Operational Complexity

Data Quality Trade-Offs

Actual Cost Example

The Middle Ground: Near Real-Time

What Near Real-Time Provides

Implementation Patterns

Making the Right Choice: A Decision Framework

Ask These Questions

Decision Matrix

Common Mistakes and How to Avoid Them

Mistake 1: Premature Real-Time

Mistake 2: Real-Time Everything

Mistake 3: Ignoring Data Quality

Mistake 4: Underestimating Operational Complexity

Mistake 5: No Reconciliation Process

Real-World Examples

E-commerce Company: Right-Sized Their Approach

SaaS Platform: Where Streaming Mattered

Media Company: Hybrid Success

Financial Services: Batch for Compliance

Connecting to Your Broader Strategy

Making Your Decision

Comments