How we transformed our MCP-as-a-Service platform from fragile OS process management to a robust, containerized architecture that improves reliability and enables future growth.
📑 Read the code
You can find the full changeset for the Docker rewrite on our GitHub repository: See the full changeset →
The Challenge: Architectural Dead End
Our original proof-of-concept architecture relied on managing multiple OS processes directly on the host system:
- Xvfb – Virtual display server for headless browser environments
- Playwright MCP – Node.js-based MCP server for browser automation
- x11vnc – VNC server for remote desktop access
- websockify – WebSocket proxy for browser-based VNC access
While this approach worked for initial testing and development, it revealed fundamental architectural limitations that made it unsuitable for serious production use:
- Dynamic port management – Each instance required unique host ports, leading to port conflicts and complex allocation logic
- Process fragility – OS processes could crash silently, requiring complex monitoring and restart logic
- Resource isolation – Limited isolation between instances, potential for resource contention
- Deployment complexity – Difficult to ensure consistent environments across development and production
- Scaling limitations – Adding new MCP instance types required significant infrastructure changes
- Architectural dead end – The approach fundamentally couldn’t provide the reliability and scalability needed for production
Our Solution: Complete Architectural Overhaul
Rather than trying to patch the fundamentally flawed OS process approach, we decided to completely redesign the architecture from the ground up. We replaced the entire system with a Docker-centric, subdomain-based solution that provides proper isolation, reliability, and scalability.
Container Design
Each MCP instance container encapsulates all the necessary components:
- Base Image – Node.js 22 on Debian Bookworm for optimal Playwright compatibility
- Process Management – Supervisord to orchestrate the four core processes in the correct order
- Fixed Internal Ports – MCP (8080), VNC (5900), noVNC (6080) – no more dynamic host port mapping
- Resource Limits – 1GB memory limit per container with automatic restart policies
- Health Checks – Built-in Docker health checks for noVNC endpoint availability
Networking Revolution with Traefik
The most significant improvement came from replacing nginx-based routing with Traefik:
- Subdomain-based routing – Each instance gets predictable URLs:
mcp-{id}.mcp-as-a-service.com
andvnc-{id}.mcp-as-a-service.com
- Dynamic discovery – Traefik automatically discovers containers via Docker labels, eliminating manual nginx configuration
- TLS termination – Centralized SSL/TLS handling with existing wildcard certificates
- ForwardAuth integration – Bearer token validation at the edge for MCP endpoints
Technical Implementation Details
Supervisor Process Orchestration
Inside each container, Supervisord manages the process lifecycle with proper dependencies:
- Xvfb (Priority 10) – Virtual display server starts first
- Playwright MCP (Priority 20) – Waits for Xvfb, starts MCP server on port 8080
- x11vnc (Priority 30) – Waits for Xvfb, starts VNC server on port 5900
- websockify (Priority 40) – Waits for x11vnc, proxies VNC to port 6080 for browser access
Traefik Label Configuration
Each container automatically configures its own routing via Docker labels:
# MCP endpoint routing
traefik.http.routers.mcp-{instance}.rule=Host(`mcp-{instance}.mcp-as-a-service.com`)
traefik.http.services.mcp-{instance}.loadbalancer.server.port=8080
traefik.http.middlewares.mcp-{instance}-auth.forwardauth.address=https://app.mcp-as-a-service.com/auth/mcp-bearer-check
# VNC endpoint routing
traefik.http.routers.vnc-{instance}.rule=Host(`vnc-{instance}.mcp-as-a-service.com`)
traefik.http.services.vnc-{instance}.loadbalancer.server.port=6080
ForwardAuth Security
We implemented a custom ForwardAuth endpoint in Symfony that validates MCP bearer tokens:
- Edge enforcement – Authentication happens before requests reach the MCP container
- Instance identification – Extracts instance ID from subdomain for token validation
- Performance optimized – 5-minute in-memory cache for static bearer tokens
- Security hardened – Constant-time comparison and rate limiting
Production Architecture
Our production deployment uses a hybrid approach that maximizes performance while maintaining reliability:
Internet (80/443) → Traefik Container → {
app.mcp-as-a-service.com → nginx:8090 (native Symfony webapp)
mcp-{id}.mcp-as-a-service.com → mcp-instance-container:8080
vnc-{id}.mcp-as-a-service.com → mcp-instance-container:6080
}
- Native webapp – Symfony application runs natively for optimal performance
- Containerized MCP instances – Isolated, scalable browser environments
- Traefik reverse proxy – Handles all HTTP traffic and TLS termination
- Shared Docker network – Enables secure communication between Traefik and MCP containers
Results and Technical Improvements
Reliability Improvements
- Process stability – Container isolation prevents cascading failures
- Automatic recovery – Docker restart policies handle crashes gracefully
- Health monitoring – Built-in health checks provide real-time status
- Predictable lifecycle – Start/stop/restart operations are atomic and reliable
Operational Improvements
- Simplified deployment – No more dynamic port allocation or nginx configuration
- Better observability – Docker logs and metrics provide clear visibility
- Easier debugging – Container isolation makes issues easier to reproduce and fix
- Scaling foundation – Easy to add new MCP instance types and scale horizontally
Developer Experience Improvements
- Consistent environments – Development and production use identical container images
- Local testing – Docker Compose setup enables full local development
- Predictable URLs – Subdomain-based routing eliminates port confusion
- Simplified CLI – Docker commands replace complex process management
The Docker rewrite has transformed the MCP-as-a-Service platform from a proof-of-concept into a robust, scalable foundation. By recognizing the architectural dead end early and making the decision to completely redesign the system, we’ve created a platform that not only meets our current needs but also provides a solid base for future development.