Deployment Guide
Everything you need to deploy, manage, and scale the Vantrexia RPM platform in production. This guide covers the full production stack, CI/CD pipeline, SSL configuration, secrets management, cost optimization, and disaster recovery planning.
Production Architecture Overview
Vantrexia runs on an automated CI/CD pipeline that deploys containerized services to an AWS EC2 instance. The deployment flow follows a fully automated path from code merge to production traffic:
The production environment runs on an AWS EC2 t4g.medium instance (ARM64 Graviton2 processor) providing a strong balance of cost efficiency and performance. All services are containerized using Docker and orchestrated with Docker Compose, making the entire stack reproducible and portable.
Key architectural decisions:
- ARM64 (Graviton2) — 20% better price-performance vs. x86 instances
- Docker Compose — Simple orchestration without Kubernetes overhead, suitable for current scale
- Single-host deployment — All application containers on one EC2 instance; database on managed RDS
- GHCR (GitHub Container Registry) — Docker images stored alongside source code for unified access control
Production Stack
The production environment consists of six core services running as Docker containers, plus a managed PostgreSQL database on AWS RDS:
| Service | Technology | Role | Port |
|---|---|---|---|
| Nginx | Nginx 1.25 (Alpine) | Reverse proxy, SSL termination, static file serving, rate limiting | 80, 443 |
| Frontend | React 18 SPA (Vite build) | Provider dashboard, patient management UI, billing interface | 3000 |
| Backend | Django 4.2 / DRF 3.15 | REST API, business logic, FHIR integration, authentication | 8000 |
| Celery Worker | Celery 5.3 | Async task processing: eCW sync, notifications, report generation | — |
| Celery Beat | Celery 5.3 | Periodic task scheduler: billing cycles, compliance audits, data sync | — |
| Redis | Redis 7 (Alpine) | Celery broker, caching layer, session store, rate limiting backend | 6379 |
| PostgreSQL | AWS RDS (db.t4g.micro) | Primary database with automated backups, encryption at rest | 5432 |
Docker Compose Production
The production Docker Compose file defines all application services with health checks, resource limits, restart policies, and logging configuration. The database runs on AWS RDS and is connected via environment variables.
version: "3.8"
services:
nginx:
image: ghcr.io/highlandpc/vantrexia/nginx:${TAG:-latest}
ports:
- "80:80"
- "443:443"
volumes:
- ./docker/nginx/nginx.prod.conf:/etc/nginx/nginx.conf:ro
- ./docker/nginx/ssl:/etc/nginx/ssl:ro
- static_volume:/app/staticfiles:ro
- media_volume:/app/mediafiles:ro
depends_on:
backend:
condition: service_healthy
frontend:
condition: service_started
restart: always
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
frontend:
image: ghcr.io/highlandpc/vantrexia/frontend:${TAG:-latest}
expose:
- "3000"
environment:
- VITE_API_BASE_URL=${API_URL}
restart: always
backend:
image: ghcr.io/highlandpc/vantrexia/backend:${TAG:-latest}
expose:
- "8000"
env_file:
- .env
volumes:
- static_volume:/app/staticfiles
- media_volume:/app/mediafiles
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
restart: always
depends_on:
redis:
condition: service_healthy
celery:
image: ghcr.io/highlandpc/vantrexia/backend:${TAG:-latest}
command: celery -A config worker -l info --concurrency=4
env_file:
- .env
depends_on:
backend:
condition: service_healthy
restart: always
celery-beat:
image: ghcr.io/highlandpc/vantrexia/backend:${TAG:-latest}
command: celery -A config beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
env_file:
- .env
depends_on:
backend:
condition: service_healthy
restart: always
redis:
image: redis:7-alpine
expose:
- "6379"
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
restart: always
volumes:
static_volume:
media_volume:
redis_data:
.env file is never committed to source control. It is deployed via the CI/CD pipeline from GitHub Secrets and AWS Secrets Manager. See the Secrets Management section below.
Zero-Downtime Deployment
Vantrexia uses a rolling deployment strategy to ensure zero downtime during production releases. The entire process completes in 3–5 minutes:
Scale Up
A new container is started alongside the existing running container using docker compose up -d --no-deps --scale backend=2 backend. The old container continues serving traffic while the new one initializes.
Health Check
The CI/CD pipeline polls the new container's /health/ endpoint every 5 seconds for up to 60 seconds. The health check verifies database connectivity, Redis availability, and Celery worker responsiveness.
Switch Traffic
Nginx upstream configuration is reloaded to route all new requests to the healthy new container. In-flight requests on the old container are allowed to complete with a 30-second drain period.
Run Migrations
Database migrations are executed against production RDS: python manage.py migrate --noinput. Migrations are designed to be backward-compatible so the old container can continue processing during this step.
Smoke Tests
Five automated smoke tests validate the deployment:
- Health endpoint —
GET /health/returns200 OK - Auth flow — Token acquisition and refresh cycle completes
- Patient API —
GET /api/v1/patients/returns valid JSON response - Static assets — Frontend
index.htmlloads with correct bundle hash - SSL verification — HTTPS certificate is valid and not expiring within 30 days
Cleanup
The old container is stopped and removed. Unused Docker images are pruned to free disk space. Total deployment time: 3–5 minutes.
#!/bin/bash
set -euo pipefail
TAG="${GITHUB_SHA:-latest}"
# Pull latest images
docker compose -f docker-compose.prod.yml pull
# Scale up new container alongside old
docker compose -f docker-compose.prod.yml up -d --no-deps --scale backend=2 backend
# Wait for health check
for i in {1..12}; do
if curl -sf http://localhost:8000/health/ > /dev/null; then
echo "✓ New container healthy"
break
fi
sleep 5
done
# Run migrations
docker compose -f docker-compose.prod.yml exec backend python manage.py migrate --noinput
# Scale back down (removes old container)
docker compose -f docker-compose.prod.yml up -d --no-deps --scale backend=1 backend
# Run smoke tests
./scripts/smoke-tests.sh
# Cleanup
docker image prune -f
echo "✓ Deployment complete"
SSL Configuration
SSL/TLS is terminated at the Nginx reverse proxy using a Cloudflare Origin Certificate. This provides end-to-end encryption between Cloudflare's edge network and the origin server, with additional benefits of DDoS protection and CDN caching.
The SSL certificate and private key are stored as GitHub Secrets (CLOUDFLARE_ORIGIN_CERT and CLOUDFLARE_ORIGIN_KEY) and deployed to the server during CI/CD. The Nginx configuration enforces TLS 1.2+ and uses modern cipher suites:
server {
listen 443 ssl http2;
server_name api.vantrexia.com;
ssl_certificate /etc/nginx/ssl/cloudflare-origin.pem;
ssl_certificate_key /etc/nginx/ssl/cloudflare-origin.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
ssl_session_tickets off;
# HSTS (handled by Cloudflare, but also set at origin)
add_header Strict-Transport-Security "max-age=63072000" always;
}
Cost Breakdown
Vantrexia's production infrastructure is optimized for a small-to-medium RPM practice. The total monthly cost is approximately $57/month, providing a HIPAA-compliant, fully managed production environment:
| Service | Tier / Size | Purpose | Monthly Cost |
|---|---|---|---|
| EC2 | t4g.medium (2 vCPU, 4 GB RAM) | Application host (all Docker containers) | ~$30.00 |
| RDS | db.t4g.micro (2 vCPU, 1 GB RAM) | PostgreSQL 15, automated backups, encryption | ~$14.00 |
| S3 | Standard | Database backups, media files, static assets | ~$3.00 |
| CloudWatch | Basic + custom metrics | Logs, alarms, performance monitoring | ~$5.00 |
| Route 53 | Hosted zone + health checks | DNS management, failover routing | ~$5.00 |
| Total Estimated Monthly Cost | ~$57.00 | ||
Secrets Management
Vantrexia uses a two-tier secrets management strategy to keep credentials secure while enabling automated deployments:
- AWS Secrets Manager — Source of truth for all production secrets. Secrets are rotated on a 90-day cycle and accessed at runtime by the application.
- GitHub Secrets — CI/CD pipeline secrets used during build and deploy. Includes SSH keys, registry credentials, and deployment configuration.
Key Production Secrets
| Secret Name | Store | Description |
|---|---|---|
DATABASE_URL |
AWS Secrets Manager | PostgreSQL connection string for RDS (includes credentials) |
REDIS_URL |
AWS Secrets Manager | Redis connection URI for Celery broker and caching |
SECRET_KEY |
AWS Secrets Manager | 256-bit secret key for Django session signing and CSRF tokens |
ECW_CLIENT_ID |
AWS Secrets Manager | eClinicalWorks FHIR API client identifier |
ECW_CLIENT_SECRET |
AWS Secrets Manager | eClinicalWorks API client secret for OAuth 2.0 authentication |
CLOUDFLARE_CERT |
GitHub Secrets | Cloudflare Origin Certificate (PEM) for SSL termination |
CLOUDFLARE_KEY |
GitHub Secrets | Cloudflare Origin Certificate private key |
EC2_SSH_KEY |
GitHub Secrets | SSH private key for CI/CD deployment to EC2 |
GHCR_TOKEN |
GitHub Secrets | Personal access token for pushing images to GHCR |
::add-mask:: to redact secret values. AWS Secrets Manager access is restricted via IAM roles with least-privilege policies.
CI/CD Pipeline
The production CI/CD pipeline runs on GitHub Actions and is triggered by pushes to the main branch. The pipeline consists of five sequential stages, with automatic rollback on failure:
Lint & Test
Runs in parallel: ruff linting for Python, eslint for TypeScript/React, pytest with 85%+ coverage requirement, and vitest for frontend unit tests. Fails fast if any check does not pass.
Build Docker Images
Multi-platform Docker images (linux/arm64) are built for backend, frontend, and nginx services using Docker Buildx. Images are tagged with the Git commit SHA and latest.
Push to GHCR
Built images are pushed to GitHub Container Registry (ghcr.io/highlandpc/vantrexia/*). Image layers are cached across builds for faster subsequent deploys.
Deploy to EC2 via SSH
The pipeline SSHs into the production EC2 instance, pulls the new images, and executes the zero-downtime deployment script described above. Environment variables are written from GitHub Secrets.
Health Check & Smoke Tests
Automated post-deploy verification confirms all services are healthy. If smoke tests fail, the pipeline executes an automatic rollback to the previous image tag. Deployment status is reported to the GitHub commit and Slack.
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run backend tests
run: |
cd backend
pip install -r requirements.txt
pytest --cov --cov-fail-under=85
- name: Run frontend tests
run: |
cd frontend
npm ci && npm run test
build-and-push:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Login to GHCR
run: echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
- name: Build and push images
run: |
docker buildx build --platform linux/arm64 \
-t ghcr.io/highlandpc/vantrexia/backend:${{ github.sha }} \
-t ghcr.io/highlandpc/vantrexia/backend:latest \
--push ./backend
deploy:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- name: Deploy to EC2
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.EC2_HOST }}
username: ubuntu
key: ${{ secrets.EC2_SSH_KEY }}
script: |
cd /opt/vantrexia
export TAG=${{ github.sha }}
./scripts/deploy.sh
smoke-test:
needs: deploy
runs-on: ubuntu-latest
steps:
- name: Run smoke tests
run: |
curl -sf https://api.vantrexia.com/health/ || exit 1
curl -sf https://app.vantrexia.com/ || exit 1
echo "✓ All smoke tests passed"
Disaster Recovery
Vantrexia's disaster recovery plan is designed for a healthcare platform where data integrity and availability are critical for patient safety and HIPAA compliance.
Recovery Objectives
| Metric | Target | Description |
|---|---|---|
| RTO (Recovery Time Objective) | 4 hours | Maximum time to restore full service after a catastrophic failure |
| RPO (Recovery Point Objective) | 24 hours | Maximum acceptable data loss window (worst case: 1 day of data) |
| Availability Target | 99.9% | Allows approximately 8.77 hours of downtime per year |
Backup Strategy
| Backup Type | Frequency | Retention | Storage |
|---|---|---|---|
| RDS Automated Snapshots | Daily | 7 days | AWS RDS (same region) |
| Custom pg_dump Backups | Daily | 30 days | S3 (encrypted, versioned) |
| Weekly Full Backups | Weekly (Sunday 2 AM UTC) | 90 days | S3 (cross-region replica) |
| EBS Snapshots | Daily | 14 days | AWS EBS Snapshots |
| Audit Log Archive | Monthly | 7 years (HIPAA) | S3 Glacier Deep Archive |
Recovery Procedures
- Application failure — Docker containers auto-restart via
restart: alwayspolicy. If a container fails repeatedly, CloudWatch alarms trigger and the on-call engineer is notified via PagerDuty. - EC2 instance failure — Launch a replacement t4g.medium from the latest EBS snapshot. Run
docker compose up -dto restore all services. Estimated recovery: 30–60 minutes. - Database corruption — Restore from the most recent RDS automated snapshot or S3 backup using point-in-time recovery. Estimated recovery: 1–2 hours.
- Complete region failure — Restore S3 cross-region backup to a new RDS instance in the failover region. Deploy application stack to a new EC2 instance. Update Route 53 DNS. Estimated recovery: 2–4 hours.
vantrexia-backup-role IAM role. Backup restoration is tested quarterly and documented in compliance audit logs.
Was this page helpful?