Pipeline Architecture System Design
π§΅ Most People Overcomplicate This System Design Question
βDesign a pipeline: Process 1 β Process 2 β Final Outputβ
I got asked this. Hereβs everything I learned β including where I went wrong, what the interviewer was actually probing, and the mental models that finally made it click.
First: What the Interviewer Is Really Testing
Before you touch queues or workers, the interviewer expects you to ask:
Is Process 2 dependent on Process 1 for the same item?
Does order matter?
Can Process 1 and 2 run in parallel on different items?
What happens on failure or retry?
Most people skip this. Thatβs the first mistake.
Good system design starts with clarifying the problem, not announcing a solution.
My Initial Answer β What Was Right
My instinct was: βUse a queue-based, write-heavy architecture. Assign workers to each process.β
Thatβs not wrong.
Queue-based architecture is the correct category here. Users initiate requests, messages are stored and queued, and workers pull messages from the queue, process them, and store results. The core benefits:
- Backpressure handling
- Retry and failure isolation
- Scalability
- Write-heavy friendliness
So the instinct was right. The problem was the next part.
Where It Got Weak β Single Queue
I said: βSingle queue, with workers assigned to the processes.β
The interviewer pushed: βSingle or multiple queues?β
I said single queue. And thatβs where I under-explained.
A single queue without explicit state management implies:
Queue
βββ Job A (P1? P2? Who knows?)
βββ Job B
βββ Job C
Workers must:
- Check which stage the job is in
- Branch their logic
- Manage ordering carefully
- Risk running P2 before P1
This works β but it pushes complexity into code, not architecture.
What the Interviewer Wanted to Hear
Option A (The Expected Answer): Multi-Queue Pipeline
Queue_P1 β Workers_P1 β Queue_P2 β Workers_P2 β Output
This is the textbook answer, and itβs textbook for good reason.
Pipeline architecture breaks work into stages β like an assembly line. Each stage is independent: it reads from an input queue, transforms data, and writes to an output queue. While Stage 1 processes new incoming events, Stage 2 is processing the previous batch simultaneously, improving throughput.
Why interviewers love multi-queue:
Where is the job? β Which queue it's in tells you
What stage is it in? β The queue IS the stage
How do you retry? β Retry within that queue only
How do you scale? β Add workers per queue independently
The insight that finally clicked:
Multi-queue = workflow in architecture. Single queue = workflow in code.
Option B: Single Queue β But Only with Explicit State
If you say single queue, you must immediately add the state model:
{
"job_id": "abc123",
"stage": "PROCESS_1"
}
Worker logic:
Pull message
Check stage
β If PROCESS_1: do P1 β re-enqueue with PROCESS_2
β If PROCESS_2: do P2 β finalize
This works. But now:
- Workers are smarter (more complex)
- Failures are harder to isolate
- Observability requires extra tooling
The Queue vs. Scheduler Question
Midway through, I asked: βCan we use a scheduler for this?β
Good question to ask. Wrong tool for this use case.
In an event-driven pipeline, the pipeline processes events immediately as they occur. A scheduler is designed for time-based execution β running code at fixed times or intervals.
Your pipeline is event-driven, not time-driven:
Queue β reacts to events β
for Process 1 β Process 2
Scheduler β reacts to time β
for retries, batch, nightly jobs
Where a scheduler makes sense alongside the pipeline:
Queue_P1 β Workers β Queue_P2 β Workers β Output
β
Scheduler (retries + batch jobs only)
When to use the scheduler:
- Re-run failed jobs after N minutes
- Batch expensive LLM calls at off-peak hours
- Nightly reprocessing jobs
Pipelines use queues. Schedulers are for time-based edge cases.
The Scaling Question β Where I Was Right
Later I said: βIf Process 1 needs more computation, Iβd give more workers to that process.β
Thatβs correct. But it answered a different layer of the question.
The interviewer was asking about user visibility β how does the user know itβs happening?
I was answering infrastructure.
Hereβs how to connect both:
"Because each stage can scale independently, I'd also expose
per-stage progress to the user β for example, showing that
Process 1 is complete and Process 2 is still running."
During a big sale, if thousands of orders come in per minute, they just queue up and all the downstream services process at whatever rate they can. The user immediately gets an order confirmation page because the front-end isnβt waiting on all those processes to finish.
Thatβs the user experience model to follow.
User Visibility β The Follow-Up Question
After the pipeline design, they asked:
βHow would you tell the user this is being done?β
This is about state exposure, not queues.
Step 1: Create a Job record when user triggers the request
POST /generate-file β returns job_id
Step 2: Update job status as workers complete stages
QUEUED
β
PROCESS_1_RUNNING (30%)
β
PROCESS_1_DONE
β
PROCESS_2_RUNNING (70%)
β
COMPLETED (100%)
Step 3: User polls or receives push updates
GET /jobs/{job_id}/status
β { "status": "PROCESS_2_RUNNING", "progress": 70 }
Three valid approaches:
| Approach | When to use |
|---|---|
| Polling | Most common, simplest, always acceptable |
| WebSockets / SSE | Long-running tasks, real-time UX matters |
| Notification on completion | Very long jobs (email, push notification) |
What candidates miss β always mention failure:
{
"status": "FAILED",
"error": "Process 2 timed out",
"retry_available": true
}
You never expose queues to users. You expose state.
Why Multi-Queue Simplifies Everything
The key insight β visually:
Single queue:
GroceryOrder βββ
FoodOrder ββββββ€βββΆ Workers check stage β branch logic β manage state
MedicineOrder ββ (workflow lives in code)
Multi-queue:
Queue_P1 β Workers_P1 (do one thing) β Queue_P2 β Workers_P2
(workflow lives in architecture)
Design your pipeline to handle failures gracefully without compromising data integrity β implement retries, dead-letter queues, and circuit breakers.
With multi-queue, all of this becomes stage-isolated:
- Failure in P2 β retry within Queue_P2 only
- P1 is untouched
- Dead-letter queue per stage
- Scale each stage independently
The Trade-Off Table (What Interviewers Want)
| Β | Multi-Queue | Single Queue |
|---|---|---|
| Complexity | In architecture | In code |
| Retries | Stage-isolated | Manual state management |
| Scaling | Per-queue, obvious | Filtered by worker type |
| Observability | Queue depth = progress | Needs external state |
| Infra overhead | Higher | Lower |
| Good for | Complex pipelines | Simple, 2-stage flows |
Start with multi-queue. Move to single queue only when infra cost matters more than clarity.
The Interview-Ready Answer (Say This Next Time)
βIβd default to multiple queues forming a pipeline β one for Process 1 and one for Process 2. This keeps stages isolated, simplifies retries, and lets me scale each step independently based on its resource profile.
A single queue could work if each message carries explicit stage metadata, but that moves complexity into worker logic and makes failure handling harder.
For user visibility, Iβd track job state in a DB, expose a status endpoint, and let the frontend poll or listen via WebSocket.β
That answer covers infrastructure, trade-offs, and UX in one go.
Key Takeaways
1. Queues move work
State tracks progress
Users see state β not queues
2. Multi-queue = workflow in architecture
Single queue = workflow in code
3. Pipelines use queues (event-driven)
Schedulers are for time-based work
4. Scale workers per stage, not globally
5. Always mention:
β Retries
β Dead-letter queues
β Idempotency
β User-facing status
Common Mistakes to Avoid
β Using a scheduler for real-time pipeline steps
β Single queue without explaining stage/state
β Not mentioning failure handling
β Answering infra without connecting to user experience
β Over-engineering before clarifying the problem
Further Reading
- Pipeline Architecture Patterns β SystemOverflow β excellent assembly line analogy for stage-based pipelines
- Message Queue Architecture for System Design β DesignGurus β covers Kafka vs RabbitMQ vs SQS trade-offs
- Queue-Based Architecture on AWS β Particular Docs β real implementation with autoscaling workers
- Event-Driven vs Scheduled Pipelines β Prefect β clear breakdown of when to use each
- Data Pipeline Orchestration Guide β Mage AI β idempotency, retries, and monitoring in production
If youβre preparing for system design interviews β the gap is rarely knowledge. Itβs connecting infrastructure decisions to user experience. Thatβs the layer interviewers are always probing.