Deployment Guide - Vantrexia Documentation

Production Architecture Overview

Vantrexia runs on an automated CI/CD pipeline that deploys containerized services to an AWS EC2 instance. The deployment flow follows a fully automated path from code merge to production traffic:

Deployment Flow: GitHub Actions CI/CD → Build & Push to GHCR → SSH Deploy to EC2 t4g.medium (ARM64) → Docker Compose orchestration → Automated health checks & smoke tests.

The production environment runs on an AWS EC2 t4g.medium instance (ARM64 Graviton2 processor) providing a strong balance of cost efficiency and performance. All services are containerized using Docker and orchestrated with Docker Compose, making the entire stack reproducible and portable.

Key architectural decisions:

ARM64 (Graviton2) — 20% better price-performance vs. x86 instances
Docker Compose — Simple orchestration without Kubernetes overhead, suitable for current scale
Single-host deployment — All application containers on one EC2 instance; database on managed RDS
GHCR (GitHub Container Registry) — Docker images stored alongside source code for unified access control

Production Stack

The production environment consists of six core services running as Docker containers, plus a managed PostgreSQL database on AWS RDS:

Service	Technology	Role	Port
Nginx	Nginx 1.25 (Alpine)	Reverse proxy, SSL termination, static file serving, rate limiting	80, 443
Frontend	React 18 SPA (Vite build)	Provider dashboard, patient management UI, billing interface	3000
Backend	Django 4.2 / DRF 3.15	REST API, business logic, FHIR integration, authentication	8000
Celery Worker	Celery 5.3	Async task processing: eCW sync, notifications, report generation	—
Celery Beat	Celery 5.3	Periodic task scheduler: billing cycles, compliance audits, data sync	—
Redis	Redis 7 (Alpine)	Celery broker, caching layer, session store, rate limiting backend	6379
PostgreSQL	AWS RDS (db.t4g.micro)	Primary database with automated backups, encryption at rest	5432

Docker Compose Production

The production Docker Compose file defines all application services with health checks, resource limits, restart policies, and logging configuration. The database runs on AWS RDS and is connected via environment variables.

docker-compose.prod.yml

version: "3.8"

services:
  nginx:
    image: ghcr.io/highlandpc/vantrexia/nginx:${TAG:-latest}
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./docker/nginx/nginx.prod.conf:/etc/nginx/nginx.conf:ro
      - ./docker/nginx/ssl:/etc/nginx/ssl:ro
      - static_volume:/app/staticfiles:ro
      - media_volume:/app/mediafiles:ro
    depends_on:
      backend:
        condition: service_healthy
      frontend:
        condition: service_started
    restart: always
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  frontend:
    image: ghcr.io/highlandpc/vantrexia/frontend:${TAG:-latest}
    expose:
      - "3000"
    environment:
      - VITE_API_BASE_URL=${API_URL}
    restart: always

  backend:
    image: ghcr.io/highlandpc/vantrexia/backend:${TAG:-latest}
    expose:
      - "8000"
    env_file:
      - .env
    volumes:
      - static_volume:/app/staticfiles
      - media_volume:/app/mediafiles
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: always
    depends_on:
      redis:
        condition: service_healthy

  celery:
    image: ghcr.io/highlandpc/vantrexia/backend:${TAG:-latest}
    command: celery -A config worker -l info --concurrency=4
    env_file:
      - .env
    depends_on:
      backend:
        condition: service_healthy
    restart: always

  celery-beat:
    image: ghcr.io/highlandpc/vantrexia/backend:${TAG:-latest}
    command: celery -A config beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
    env_file:
      - .env
    depends_on:
      backend:
        condition: service_healthy
    restart: always

  redis:
    image: redis:7-alpine
    expose:
      - "6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: always

volumes:
  static_volume:
  media_volume:
  redis_data:

Important: The .env file is never committed to source control. It is deployed via the CI/CD pipeline from GitHub Secrets and AWS Secrets Manager. See the Secrets Management section below.

Zero-Downtime Deployment

Vantrexia uses a rolling deployment strategy to ensure zero downtime during production releases. The entire process completes in 3–5 minutes:

Scale Up

A new container is started alongside the existing running container using docker compose up -d --no-deps --scale backend=2 backend. The old container continues serving traffic while the new one initializes.

Health Check

The CI/CD pipeline polls the new container's /health/ endpoint every 5 seconds for up to 60 seconds. The health check verifies database connectivity, Redis availability, and Celery worker responsiveness.

Switch Traffic

Nginx upstream configuration is reloaded to route all new requests to the healthy new container. In-flight requests on the old container are allowed to complete with a 30-second drain period.

Run Migrations

Database migrations are executed against production RDS: python manage.py migrate --noinput. Migrations are designed to be backward-compatible so the old container can continue processing during this step.

Smoke Tests

Five automated smoke tests validate the deployment:

Health endpoint — GET /health/ returns 200 OK
Auth flow — Token acquisition and refresh cycle completes
Patient API — GET /api/v1/patients/ returns valid JSON response
Static assets — Frontend index.html loads with correct bundle hash
SSL verification — HTTPS certificate is valid and not expiring within 30 days

Cleanup

The old container is stopped and removed. Unused Docker images are pruned to free disk space. Total deployment time: 3–5 minutes.

deploy.sh (simplified)

#!/bin/bash
set -euo pipefail

TAG="${GITHUB_SHA:-latest}"

# Pull latest images
docker compose -f docker-compose.prod.yml pull

# Scale up new container alongside old
docker compose -f docker-compose.prod.yml up -d --no-deps --scale backend=2 backend

# Wait for health check
for i in {1..12}; do
  if curl -sf http://localhost:8000/health/ > /dev/null; then
    echo "✓ New container healthy"
    break
  fi
  sleep 5
done

# Run migrations
docker compose -f docker-compose.prod.yml exec backend python manage.py migrate --noinput

# Scale back down (removes old container)
docker compose -f docker-compose.prod.yml up -d --no-deps --scale backend=1 backend

# Run smoke tests
./scripts/smoke-tests.sh

# Cleanup
docker image prune -f
echo "✓ Deployment complete"

SSL Configuration

SSL/TLS is terminated at the Nginx reverse proxy using a Cloudflare Origin Certificate. This provides end-to-end encryption between Cloudflare's edge network and the origin server, with additional benefits of DDoS protection and CDN caching.

Certificate Strategy: Cloudflare Origin Certificates are valid for up to 15 years and are automatically trusted by Cloudflare's edge. They are not trusted by browsers directly, ensuring that all traffic must pass through Cloudflare's protection layer.

The SSL certificate and private key are stored as GitHub Secrets (CLOUDFLARE_ORIGIN_CERT and CLOUDFLARE_ORIGIN_KEY) and deployed to the server during CI/CD. The Nginx configuration enforces TLS 1.2+ and uses modern cipher suites:

nginx SSL configuration

server {
    listen 443 ssl http2;
    server_name api.vantrexia.com;

    ssl_certificate     /etc/nginx/ssl/cloudflare-origin.pem;
    ssl_certificate_key /etc/nginx/ssl/cloudflare-origin.key;

    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;

    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;

    # HSTS (handled by Cloudflare, but also set at origin)
    add_header Strict-Transport-Security "max-age=63072000" always;
}

Cost Breakdown

Vantrexia's production infrastructure is optimized for a small-to-medium RPM practice. The total monthly cost is approximately $57/month, providing a HIPAA-compliant, fully managed production environment:

Service	Tier / Size	Purpose	Monthly Cost
EC2	t4g.medium (2 vCPU, 4 GB RAM)	Application host (all Docker containers)	~$30.00
RDS	db.t4g.micro (2 vCPU, 1 GB RAM)	PostgreSQL 15, automated backups, encryption	~$14.00
S3	Standard	Database backups, media files, static assets	~$3.00
CloudWatch	Basic + custom metrics	Logs, alarms, performance monitoring	~$5.00
Route 53	Hosted zone + health checks	DNS management, failover routing	~$5.00
Total Estimated Monthly Cost			~$57.00

Cost Optimization: The ARM64 Graviton2 instance (t4g) provides approximately 20% better price-performance compared to equivalent x86 instances (t3). Additional savings come from using Cloudflare's free tier for CDN and DDoS protection, and GHCR's free storage for public container images.

Secrets Management

Vantrexia uses a two-tier secrets management strategy to keep credentials secure while enabling automated deployments:

AWS Secrets Manager — Source of truth for all production secrets. Secrets are rotated on a 90-day cycle and accessed at runtime by the application.
GitHub Secrets — CI/CD pipeline secrets used during build and deploy. Includes SSH keys, registry credentials, and deployment configuration.

Key Production Secrets

Secret Name	Store	Description
`DATABASE_URL`	AWS Secrets Manager	PostgreSQL connection string for RDS (includes credentials)
`REDIS_URL`	AWS Secrets Manager	Redis connection URI for Celery broker and caching
`SECRET_KEY`	AWS Secrets Manager	256-bit secret key for Django session signing and CSRF tokens
`ECW_CLIENT_ID`	AWS Secrets Manager	eClinicalWorks FHIR API client identifier
`ECW_CLIENT_SECRET`	AWS Secrets Manager	eClinicalWorks API client secret for OAuth 2.0 authentication
`CLOUDFLARE_CERT`	GitHub Secrets	Cloudflare Origin Certificate (PEM) for SSL termination
`CLOUDFLARE_KEY`	GitHub Secrets	Cloudflare Origin Certificate private key
`EC2_SSH_KEY`	GitHub Secrets	SSH private key for CI/CD deployment to EC2
`GHCR_TOKEN`	GitHub Secrets	Personal access token for pushing images to GHCR

Security: Never log, print, or expose secrets in CI/CD output. All GitHub Actions workflows use ::add-mask:: to redact secret values. AWS Secrets Manager access is restricted via IAM roles with least-privilege policies.

CI/CD Pipeline

The production CI/CD pipeline runs on GitHub Actions and is triggered by pushes to the main branch. The pipeline consists of five sequential stages, with automatic rollback on failure:

Lint & Test

Runs in parallel: ruff linting for Python, eslint for TypeScript/React, pytest with 85%+ coverage requirement, and vitest for frontend unit tests. Fails fast if any check does not pass.

Build Docker Images

Multi-platform Docker images (linux/arm64) are built for backend, frontend, and nginx services using Docker Buildx. Images are tagged with the Git commit SHA and latest.

Push to GHCR

Built images are pushed to GitHub Container Registry (ghcr.io/highlandpc/vantrexia/*). Image layers are cached across builds for faster subsequent deploys.

Deploy to EC2 via SSH

The pipeline SSHs into the production EC2 instance, pulls the new images, and executes the zero-downtime deployment script described above. Environment variables are written from GitHub Secrets.

Health Check & Smoke Tests

Automated post-deploy verification confirms all services are healthy. If smoke tests fail, the pipeline executes an automatic rollback to the previous image tag. Deployment status is reported to the GitHub commit and Slack.

.github/workflows/deploy-production.yml (key steps)

name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run backend tests
        run: |
          cd backend
          pip install -r requirements.txt
          pytest --cov --cov-fail-under=85
      - name: Run frontend tests
        run: |
          cd frontend
          npm ci && npm run test

  build-and-push:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Login to GHCR
        run: echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
      - name: Build and push images
        run: |
          docker buildx build --platform linux/arm64 \
            -t ghcr.io/highlandpc/vantrexia/backend:${{ github.sha }} \
            -t ghcr.io/highlandpc/vantrexia/backend:latest \
            --push ./backend

  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to EC2
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.EC2_HOST }}
          username: ubuntu
          key: ${{ secrets.EC2_SSH_KEY }}
          script: |
            cd /opt/vantrexia
            export TAG=${{ github.sha }}
            ./scripts/deploy.sh

  smoke-test:
    needs: deploy
    runs-on: ubuntu-latest
    steps:
      - name: Run smoke tests
        run: |
          curl -sf https://api.vantrexia.com/health/ || exit 1
          curl -sf https://app.vantrexia.com/ || exit 1
          echo "✓ All smoke tests passed"

Disaster Recovery

Vantrexia's disaster recovery plan is designed for a healthcare platform where data integrity and availability are critical for patient safety and HIPAA compliance.

Recovery Objectives

Metric	Target	Description
RTO (Recovery Time Objective)	4 hours	Maximum time to restore full service after a catastrophic failure
RPO (Recovery Point Objective)	24 hours	Maximum acceptable data loss window (worst case: 1 day of data)
Availability Target	99.9%	Allows approximately 8.77 hours of downtime per year

Backup Strategy

Backup Type	Frequency	Retention	Storage
RDS Automated Snapshots	Daily	7 days	AWS RDS (same region)
Custom pg_dump Backups	Daily	30 days	S3 (encrypted, versioned)
Weekly Full Backups	Weekly (Sunday 2 AM UTC)	90 days	S3 (cross-region replica)
EBS Snapshots	Daily	14 days	AWS EBS Snapshots
Audit Log Archive	Monthly	7 years (HIPAA)	S3 Glacier Deep Archive

Recovery Procedures

Application failure — Docker containers auto-restart via restart: always policy. If a container fails repeatedly, CloudWatch alarms trigger and the on-call engineer is notified via PagerDuty.
EC2 instance failure — Launch a replacement t4g.medium from the latest EBS snapshot. Run docker compose up -d to restore all services. Estimated recovery: 30–60 minutes.
Database corruption — Restore from the most recent RDS automated snapshot or S3 backup using point-in-time recovery. Estimated recovery: 1–2 hours.
Complete region failure — Restore S3 cross-region backup to a new RDS instance in the failover region. Deploy application stack to a new EC2 instance. Update Route 53 DNS. Estimated recovery: 2–4 hours.

HIPAA Requirement: All backups are encrypted at rest using AES-256. S3 buckets enforce server-side encryption (SSE-S3) and have versioning enabled. Access to backup buckets is restricted to the vantrexia-backup-role IAM role. Backup restoration is tested quarterly and documented in compliance audit logs.

Was this page helpful?