Skip to main content

Command Palette

Search for a command to run...

OpenEdX on AWS EKS - Complete Production Deployment Guide

Published
β€’57 min read
OpenEdX on AWS EKS - Complete Production Deployment Guide

Al Nafi Assessment | Battle-Tested | Zero-Debugging

Status: Production-Ready βœ…
Platform: AWS EKS (Kubernetes 1.34)
OpenEdX: Tutor 21.0.1 (Latest)
Domain: Your-domain.com (replace throughout)
Deployment Time: 4-7 hours (with debugging)
Monthly Cost: ~$270


πŸ“‹ Table of Contents

  1. What You'll Build

  2. Architecture

  3. Why These Choices?

  4. Prerequisites

  5. PART 0: Environment Setup

  6. PART 1: EKS Cluster

  7. PART 2: MySQL Database (RDS)

  8. PART 3: MongoDB (EC2)

  9. PART 4: Redis & OpenSearch

  10. PART 5: Storage (S3 + EBS)

  11. PART 6: Deploy OpenEdX

  12. PART 7: Nginx Ingress

  13. PART 8: SSL/TLS (cert-manager)

  14. PART 9: CloudFront + WAF

  15. PART 10: Monitoring (Prometheus/Grafana)

  16. PART 11: HPA & Scaling

  17. PART 12: DNS Configuration

  18. Verification & Testing

  19. Backup Strategy

  20. Troubleshooting Guide

  21. Deliverables Checklist


What You'll Build

A production-grade OpenEdX Learning Management System with:

βœ… Core Platform

  • AWS EKS 1.34 (latest Kubernetes)

  • OpenEdX Tutor 21.0.1 (latest stable)

  • 3-node cluster (t3.medium) with auto-scaling

βœ… External Databases (All outside Kubernetes)

  • MySQL 8.0.45 (RDS) - Application data

  • MongoDB 8.0 (EC2 t2.medium) - Course content

  • Redis 7.1 (ElastiCache) - Caching

  • OpenSearch 2.11 - Search & analytics

βœ… Web & Security

  • Nginx Ingress (replaces Caddy) with HTTP/2

  • Let's Encrypt SSL/TLS (cert-manager)

  • AWS CloudFront CDN for static files

  • AWS WAF with DDoS protection

βœ… Operations

  • Horizontal Pod Autoscaling (HPA)

  • Prometheus + Grafana monitoring

  • Centralized logging

  • Automated backups

  • Health probes on all services


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          SECURITY LAYER                            β”‚
β”‚    Cloudflare DNS β†’ AWS WAF (us-east-1) β†’ CloudFront (S3)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          INGRESS LAYER                             β”‚
β”‚   AWS NLB β†’ Nginx Ingress Controller (HTTP/2, TLS termination)    β”‚
β”‚              cert-manager (Let's Encrypt SSL)                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       APPLICATION LAYER (EKS)                      β”‚
β”‚  Namespace: openedx                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚   LMS    β”‚   CMS    β”‚   Workers  β”‚   MFE    β”‚                 β”‚
β”‚  β”‚ (2-5)    β”‚ (1-3)    β”‚  (1 each)  β”‚  (1)     β”‚                 β”‚
β”‚  β”‚ HPA      β”‚ HPA      β”‚            β”‚          β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      DATA LAYER (External)                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚ MySQL RDS  β”‚ MongoDB EC2 β”‚ Redis     β”‚ OpenSearch   β”‚         β”‚
β”‚  β”‚ 8.0.45     β”‚ 8.0         β”‚ 7.1       β”‚ 2.11         β”‚         β”‚
β”‚  β”‚ db.t3.med  β”‚ t2.medium   β”‚ t3.micro  β”‚ t3.small     β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         STORAGE LAYER                              β”‚
β”‚  S3 Bucket (Static Files) | EBS gp3 Volumes (PV/PVC)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Traffic Flow:

User Request
    ↓
Cloudflare DNS (resolves domain)
    ↓
AWS WAF (security checks)
    ↓
CloudFront (serves static files from S3)
    ↓
AWS Network Load Balancer
    ↓
Nginx Ingress Controller (TLS termination, HTTP/2)
    ↓
OpenEdX Pods (LMS/CMS/MFE based on hostname)
    ↓
External Databases (MySQL/MongoDB/Redis/OpenSearch)

Why These Choices?

External Databases (NOT in Kubernetes)

Why: Databases need persistence, backups, and managed services provide:

  • Automated backups and point-in-time recovery

  • Managed updates and patching

  • Better performance isolation

  • Easier scaling

  • No risk of data loss if pods crash

MongoDB on EC2 (not Atlas)

Why:

  • Single EC2 instance simpler than Atlas setup

  • Full control over configuration

  • No external dependencies

  • Cost-effective for learning platform

  • Easy to backup (EBS snapshots)

Nginx over Caddy

Why:

  • Industry standard with extensive documentation

  • Better performance for high traffic

  • More control over SSL/TLS configuration

  • HTTP/2 support out of the box

  • Requirement from Al Nafi JD

cert-manager for SSL

Why:

  • Automated Let's Encrypt certificate management

  • Auto-renewal before expiry

  • Industry standard for Kubernetes SSL

  • Free SSL certificates

gp3 over gp2 Storage

Why:

  • Same or lower cost

  • 3000 baseline IOPS (vs gp2's 3 IOPS/GB)

  • Better performance for databases

  • 125 MiB/s baseline throughput

Tutor 21.0.1 (Latest)

Why:

  • Latest features and security patches

  • Better MFE (Micro Frontend) support

  • Improved performance

  • Active community support


Prerequisites

AWS Account

  • Admin access or PowerUser + IAM permissions

  • Credit card for AWS services (~$270/month)

  • Service limits:

    • 3 t3.medium EC2 instances (EKS nodes)

    • 1 db.t3.medium RDS instance

    • 1 t2.medium EC2 instance (MongoDB)

Domain Name

  • Any domain registrar (Namecheap, GoDaddy, etc.)

  • Will configure with Cloudflare (free account)

  • Example: yourdomain.com

Local Machine

  • Ubuntu 22.04 (or similar Linux)

  • 4GB RAM minimum

  • 20GB free disk space

  • Stable internet connection

Skills Needed

  • Basic Linux command line

  • Basic understanding of Kubernetes concepts

  • AWS console navigation

  • Copy-paste ability (most important!)

Time

  • Setup: 30 minutes

  • Deployment: 2-3 hours

  • Configuration: 30 minutes

  • Total: 3-4 hours (with breaks)


PART 0: Environment Setup

What This Does

Creates a persistent configuration file that survives terminal restarts and contains all your deployment variables. This was the #1 issue we solved - without this, you lose all variables when terminal closes!

Step 0.1: Create Persistent Config File

Run on your Ubuntu machine:

# Create config directory
mkdir -p ~/.openedx-config
chmod 700 ~/.openedx-config

# Create the config file with all variables
cat > ~/.openedx-config/settings.sh <<'EOF'
#!/bin/bash

# AWS Configuration
export AWS_REGION="us-east-1"
export AWS_ACCOUNT_ID=""
export PROJECT_NAME="openedx-prod"

# Domain Configuration (CHANGE THESE!)
export DOMAIN="yourdomain.com"
export STUDIO_DOMAIN="studio.yourdomain.com"
export MFE_DOMAIN="apps.yourdomain.com"
export CDN_DOMAIN="cdn.yourdomain.com"

# Admin Email (CHANGE THIS!)
export ADMIN_EMAIL="your-email@example.com"

# Auto-generated Passwords (will be filled during deployment)
export MYSQL_PASSWORD=""
export MONGO_PASSWORD=""

# Infrastructure IDs (will be filled during deployment)
export VPC_ID=""
export EKS_CLUSTER_NAME="openedx-prod"
export MYSQL_HOST=""
export MONGO_HOST=""
export MONGO_IP=""
export MONGO_INSTANCE_ID=""
export REDIS_HOST=""
export OPENSEARCH_HOST=""
export S3_BUCKET_NAME=""
export CLOUDFRONT_URL=""
export CLOUDFRONT_ID=""
export WAF_ARN=""
export LB_HOSTNAME=""
EOF

# Make it secure (only you can read/write)
chmod 600 ~/.openedx-config/settings.sh

# Add auto-load to your shell
echo 'source ~/.openedx-config/settings.sh 2>/dev/null' >> ~/.bashrc

# Load it now
source ~/.openedx-config/settings.sh

echo "βœ… Persistent config created at ~/.openedx-config/settings.sh"
echo "⚠️  IMPORTANT: Edit this file and change DOMAIN and ADMIN_EMAIL!"

Before proceeding, edit the config file:

nano ~/.openedx-config/settings.sh

Change these lines:

export DOMAIN="yourdomain.com"          # Your actual domain
export STUDIO_DOMAIN="studio.yourdomain.com"
export MFE_DOMAIN="apps.yourdomain.com"
export CDN_DOMAIN="cdn.yourdomain.com"
export ADMIN_EMAIL="your-email@example.com"  # Your email

Save and exit (Ctrl+X, Y, Enter).

Why this matters: Every variable is stored here. If your terminal crashes or you logout, just run source ~/.openedx-config/settings.sh and everything is back!

Step 0.2: Install Required Tools

What This Does: Installs all the command-line tools we'll need: AWS CLI, kubectl, eksctl, Helm, and Tutor.

#!/bin/bash
set -e

echo "Installing AWS CLI..."
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
sudo ./aws/install --update
rm -rf aws awscliv2.zip

echo "Installing kubectl..."
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install kubectl /usr/local/bin/
rm kubectl

echo "Installing eksctl..."
curl -sLO "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz"
tar -xzf eksctl_*.tar.gz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
rm eksctl_*.tar.gz

echo "Installing Helm..."
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

echo "Installing Tutor 21.0.1..."
sudo apt update && sudo apt install -y python3-pip python3-venv
python3 -m pip install --user --upgrade pip
python3 -m pip install --user "tutor[full]==21.0.1"

# Add Tutor to PATH
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
export PATH="$HOME/.local/bin:$PATH"

# Enable Kubernetes plugin
tutor plugins enable k8s

echo "βœ… All tools installed successfully!"
echo ""
echo "Verify installations:"
aws --version
kubectl version --client
eksctl version
helm version
tutor --version

Verify output shows:

  • AWS CLI: aws-cli/2.x.x

  • kubectl: v1.29+

  • eksctl: 0.x.x

  • Helm: v3.x.x

  • Tutor: 21.0.1

Step 0.3: Configure AWS Credentials

What This Does: Connects your terminal to your AWS account.

# Configure AWS CLI
aws configure

# You'll be prompted for:
# AWS Access Key ID: (paste from AWS Console β†’ IAM β†’ Security Credentials)
# AWS Secret Access Key: (paste from AWS Console)
# Default region: us-east-1
# Default output format: json

# Test connection
aws sts get-caller-identity

# Should show your AWS Account ID and user ARN

Save your Account ID to config:

source ~/.openedx-config/settings.sh

sed -i "s/export AWS_ACCOUNT_ID=\"\"/export AWS_ACCOUNT_ID=\"$(aws sts get-caller-identity --query Account --output text)\"/" ~/.openedx-config/settings.sh

source ~/.openedx-config/settings.sh

echo "AWS Account ID: $AWS_ACCOUNT_ID"

Step 0.4: Create Project Structure

What This Does: Organizes all our files in a clean structure.

mkdir -p ~/openedx-project/{k8s,scripts,docs,evidence}
cd ~/openedx-project

echo "βœ… Project structure created at ~/openedx-project/"
tree ~/openedx-project/

PART 1: EKS Cluster

What This Does

Creates a managed Kubernetes cluster on AWS with 3 worker nodes. This is where all OpenEdX pods will run. Uses EKS 1.34 (latest version as of Feb 2026).

Why 3 nodes?

  • High availability (if one node fails, others continue)

  • Resource distribution for LMS, CMS, and workers

  • Allows HPA (Horizontal Pod Autoscaling) to work properly

Step 1.1: Create EKS Cluster

This takes 15-20 minutes. AWS is creating VPC, subnets, security groups, and Kubernetes control plane.

source ~/.openedx-config/settings.sh

echo "Creating EKS 1.34 cluster (15-20 min)..."
echo "Cluster name: $EKS_CLUSTER_NAME"
echo "Region: $AWS_REGION"

eksctl create cluster \
  --name $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --version 1.34 \
  --nodegroup-name openedx-workers \
  --node-type t3.medium \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 5 \
  --managed \
  --with-oidc

echo "βœ… EKS cluster created!"

What each flag does:

  • --version 1.34: Latest Kubernetes (released late 2025)

  • --node-type t3.medium: 2 vCPU, 4GB RAM per node (right size for OpenEdX)

  • --nodes 3: Start with 3 nodes

  • --nodes-min 2: Auto-scaling minimum

  • --nodes-max 5: Auto-scaling maximum

  • --managed: AWS manages OS updates and patching

  • --with-oidc: Enables IAM roles for service accounts (needed for S3 access)

Step 1.2: Save VPC Information

What This Does: Gets the VPC ID created by EKS and saves it for database configuration.

source ~/.openedx-config/settings.sh

# Get VPC ID
VPC_ID=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --query "cluster.resourcesVpcConfig.vpcId" \
  --output text)

# Save to config
sed -i "s|export VPC_ID=\"\"|export VPC_ID=\"$VPC_ID\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… VPC ID: $VPC_ID"

Step 1.3: Create OpenEdX Namespace

What This Does: Creates isolated namespace for all OpenEdX components.

kubectl create namespace openedx

echo "βœ… Namespace created"
kubectl get namespaces

Verification

# Check cluster is ready
kubectl get nodes

# Should show 3 nodes in "Ready" status:
# NAME                         STATUS   ROLES    AGE   VERSION
# ip-xxx.ec2.internal          Ready    <none>   5m    v1.34.x
# ip-yyy.ec2.internal          Ready    <none>   5m    v1.34.x
# ip-zzz.ec2.internal          Ready    <none>   5m    v1.34.x

# Check namespace
kubectl get ns openedx
# Should show: openedx   Active   1m

Screenshot for evidence: Take screenshot of kubectl get nodes output.


PART 2: MySQL Database (RDS)

What This Does

Creates a managed MySQL database for OpenEdX application data (users, courses, enrollments, grades). Uses RDS (managed service) for automatic backups, patching, and high availability.

Why RDS?

  • Automated backups: Daily snapshots + 1-day retention

  • Managed updates: AWS handles security patches

  • Better performance: Dedicated instance, not competing with pods

  • Disaster recovery: Easy point-in-time restore

Critical Lessons Learned

  1. Must create DB subnet group first (or you get "InvalidSubnet" error)

  2. MySQL needs TWO sets of credentials:

    • admin user (for migrations and admin tasks)

    • openedx user (for application)

  3. Tutor requires root credentials: Set MYSQL_ROOT_USERNAME and MYSQL_ROOT_PASSWORD

Step 2.1: Generate MySQL Password

What This Does: Creates a strong random password for MySQL.

source ~/.openedx-config/settings.sh

# Generate 24-character password (letters and numbers only)
MYSQL_PASSWORD=$(openssl rand -base64 24 | tr -dc 'a-zA-Z0-9' | head -c 24)

# Save to config
sed -i "s|export MYSQL_PASSWORD=\"\"|export MYSQL_PASSWORD=\"$MYSQL_PASSWORD\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… MySQL password generated and saved"
echo "Password: $MYSQL_PASSWORD"
echo "⚠️  Save this password securely!"

Step 2.2: Configure Security Groups

What This Does: Allows EKS pods to connect to MySQL on port 3306.

source ~/.openedx-config/settings.sh

# Get security groups (recalculate - don't trust memory!)
DEFAULT_SG=$(aws ec2 describe-security-groups \
  --filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
  --region $AWS_REGION \
  --query 'SecurityGroups[0].GroupId' --output text)

EKS_SG=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
  --output text)

echo "Security Groups:"
echo "  Default SG: $DEFAULT_SG"
echo "  EKS SG: $EKS_SG"

# Allow MySQL traffic from EKS to Default SG
aws ec2 authorize-security-group-ingress \
  --group-id $DEFAULT_SG \
  --protocol tcp \
  --port 3306 \
  --source-group $EKS_SG \
  --region $AWS_REGION 2>/dev/null || echo "Rule already exists"

echo "βœ… MySQL port 3306 opened for EKS"

Step 2.3: Create DB Subnet Group (CRITICAL!)

What This Does: Tells RDS which subnets it can use. Without this, you get "InvalidSubnet" error!

Why: EKS creates VPC without default subnets. RDS needs explicit subnet group.

source ~/.openedx-config/settings.sh

# Get private subnets (recalculate each time!)
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --region $AWS_REGION \
  --query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
  --output text | tr '\t' ' ')

echo "Private subnets: $PRIVATE_SUBNETS"

# Error check
if [ -z "$PRIVATE_SUBNETS" ]; then
    echo "❌ ERROR: No private subnets found!"
    exit 1
fi

# Create DB subnet group
echo "Creating DB subnet group..."
aws rds create-db-subnet-group \
  --db-subnet-group-name ${PROJECT_NAME}-db-subnet \
  --db-subnet-group-description "OpenEdX database subnet group" \
  --subnet-ids $PRIVATE_SUBNETS \
  --region $AWS_REGION 2>/dev/null || echo "Subnet group already exists"

echo "βœ… DB subnet group created"

Step 2.4: Create MySQL RDS Instance

What This Does: Creates MySQL 8.0.45 database with gp3 storage (faster than gp2).

This takes 10-15 minutes.

source ~/.openedx-config/settings.sh

echo "Creating MySQL RDS 8.0.45 (10-15 min)..."

aws rds create-db-instance \
  --db-instance-identifier ${PROJECT_NAME}-mysql \
  --db-instance-class db.t3.medium \
  --engine mysql \
  --engine-version 8.0.45 \
  --master-username admin \
  --master-user-password "$MYSQL_PASSWORD" \
  --allocated-storage 20 \
  --storage-type gp3 \
  --iops 3000 \
  --db-subnet-group-name ${PROJECT_NAME}-db-subnet \
  --vpc-security-group-ids $DEFAULT_SG \
  --no-publicly-accessible \
  --backup-retention-period 1 \
  --region $AWS_REGION

echo "Waiting for MySQL to become available..."
aws rds wait db-instance-available \
  --db-instance-identifier ${PROJECT_NAME}-mysql \
  --region $AWS_REGION

echo "βœ… MySQL RDS created!"

What each flag does:

  • --db-instance-class db.t3.medium: 2 vCPU, 4GB RAM (right size for OpenEdX)

  • --engine-version 8.0.45: Latest MySQL 8.0 minor version

  • --storage-type gp3: Faster than gp2 (3000 baseline IOPS)

  • --no-publicly-accessible: Security - only accessible from VPC

  • --backup-retention-period 1: Keep 1 day of automated backups

Step 2.5: Get MySQL Endpoint

What This Does: Gets the connection hostname for MySQL.

source ~/.openedx-config/settings.sh

MYSQL_HOST=$(aws rds describe-db-instances \
  --db-instance-identifier ${PROJECT_NAME}-mysql \
  --region $AWS_REGION \
  --query 'DBInstances[0].Endpoint.Address' \
  --output text)

# Save to config
sed -i "s|export MYSQL_HOST=\"\"|export MYSQL_HOST=\"$MYSQL_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… MySQL Endpoint: $MYSQL_HOST"

Step 2.6: Create OpenEdX Database and User

What This Does:

  1. Creates openedx database with UTF8 encoding

  2. Creates openedx user with FULL permissions (needed for migrations)

Why UTF8MB4: Supports emoji and international characters in course content.

source ~/.openedx-config/settings.sh

echo "Creating OpenEdX database and user..."

kubectl run mysql-setup --rm -i --image=mysql:8.0 -n openedx -- \
  mysql -h $MYSQL_HOST -u admin -p"$MYSQL_PASSWORD" <<EOSQL
-- Create database with proper encoding
CREATE DATABASE IF NOT EXISTS openedx 
  CHARACTER SET utf8mb4 
  COLLATE utf8mb4_unicode_ci;

-- Create openedx user
CREATE USER IF NOT EXISTS 'openedx'@'%' 
  IDENTIFIED BY '$MYSQL_PASSWORD';

-- Grant FULL permissions (migrations need this!)
GRANT ALL PRIVILEGES ON openedx.* 
  TO 'openedx'@'%' 
  WITH GRANT OPTION;

-- Apply changes
FLUSH PRIVILEGES;

-- Verify
SELECT User, Host FROM mysql.user WHERE User='openedx';
SHOW DATABASES;
EOSQL

echo "βœ… Database and user created with full CRUD permissions"

What permissions are granted:

  • SELECT, INSERT, UPDATE, DELETE (basic CRUD)

  • CREATE, DROP, ALTER, INDEX (schema changes for migrations)

  • CREATE VIEW, SHOW VIEW (for analytics)

  • CREATE ROUTINE, ALTER ROUTINE (for stored procedures)

  • LOCK TABLES, CREATE TEMPORARY TABLES (for bulk operations)

  • WITH GRANT OPTION (allows Tutor to manage permissions)

Verification

source ~/.openedx-config/settings.sh

# Test connection
kubectl run mysql-test --rm -i --image=mysql:8.0 -n openedx -- \
  mysql -h $MYSQL_HOST -u openedx -p"$MYSQL_PASSWORD" -e "SHOW DATABASES;"

# Should show: openedx database

Screenshot for evidence:

  • RDS Console showing running instance

  • Output of SHOW DATABASES;


PART 3: MongoDB (EC2)

What This Does

Creates a single MongoDB 8.0 instance on EC2 for storing course content, modulestore data, and user-generated content.

Why EC2 instead of Atlas?

  1. Simpler setup: No external service signup

  2. Full control: Configure as needed

  3. Cost-effective: t2.medium is ~$35/month

  4. Easy backup: EBS snapshots

  5. No complexity: Single instance (no replica set needed for assessment)

Architecture Decision

Single Instance vs Replica Set:

  • Production would use 3-node replica set for high availability

  • For this assessment, single instance is acceptable

  • Can be upgraded to replica set later without data loss

Step 3.1: Generate MongoDB Password

source ~/.openedx-config/settings.sh

# Generate 24-character password
MONGO_PASSWORD=$(openssl rand -base64 24 | tr -dc 'a-zA-Z0-9' | head -c 24)

# Save to config
sed -i "s|export MONGO_PASSWORD=\"\"|export MONGO_PASSWORD=\"$MONGO_PASSWORD\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… MongoDB password generated"
echo "Password: $MONGO_PASSWORD"

Step 3.2: Get Ubuntu AMI

What This Does: Finds the latest Ubuntu 22.04 image in your region.

source ~/.openedx-config/settings.sh

AMI_ID=$(aws ec2 describe-images \
  --owners amazon \
  --filters \
    "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
    "Name=state,Values=available" \
  --region $AWS_REGION \
  --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \
  --output text)

echo "Ubuntu AMI: $AMI_ID"

Step 3.3: Create MongoDB Security Group

What This Does: Creates firewall rules for MongoDB (port 27017).

source ~/.openedx-config/settings.sh

# Recalculate security groups
DEFAULT_SG=$(aws ec2 describe-security-groups \
  --filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
  --region $AWS_REGION \
  --query 'SecurityGroups[0].GroupId' --output text)

EKS_SG=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
  --output text)

# Create MongoDB security group
aws ec2 create-security-group \
  --group-name ${PROJECT_NAME}-mongo-sg \
  --description "MongoDB for OpenEdX" \
  --vpc-id $VPC_ID \
  --region $AWS_REGION 2>/dev/null || echo "Security group exists"

MONGO_SG=$(aws ec2 describe-security-groups \
  --filters \
    "Name=group-name,Values=${PROJECT_NAME}-mongo-sg" \
    "Name=vpc-id,Values=$VPC_ID" \
  --region $AWS_REGION \
  --query 'SecurityGroups[0].GroupId' \
  --output text)

echo "MongoDB SG: $MONGO_SG"

# Allow MongoDB port 27017 from EKS
aws ec2 authorize-security-group-ingress \
  --group-id $MONGO_SG \
  --protocol tcp \
  --port 27017 \
  --source-group $EKS_SG \
  --region $AWS_REGION 2>/dev/null || echo "Rule already exists"

echo "βœ… MongoDB security group configured"

Step 3.4: Create User Data Script

What This Does: Creates a script that automatically installs and configures MongoDB when EC2 starts.

This is CRITICAL - the script runs on first boot and sets up everything!

source ~/.openedx-config/settings.sh

# Recalculate private subnets
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --region $AWS_REGION \
  --query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
  --output text | tr '\t' ' ')

# Error check
if [ -z "$PRIVATE_SUBNETS" ]; then
    echo "❌ ERROR: No private subnets found!"
    exit 1
fi

# Get first subnet
MONGO_SUBNET=$(echo $PRIVATE_SUBNETS | awk '{print $1}')
echo "Using subnet: $MONGO_SUBNET"

# Create user data script (runs on first boot)
USER_DATA=$(cat <<'USERDATA'
#!/bin/bash
set -e
exec > >(tee /var/log/user-data.log)
exec 2>&1

echo "=== Starting MongoDB 8.0 Installation ==="
date

# Install MongoDB 8.0 official repository
echo "Installing MongoDB repository..."
apt-get update
apt-get install -y gnupg curl

curl -fsSL https://www.mongodb.org/static/pgp/server-8.0.asc | \
  gpg --dearmor -o /usr/share/keyrings/mongodb-server-8.0.gpg

echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-8.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/8.0 multiverse" | \
  tee /etc/apt/sources.list.d/mongodb-org-8.0.list

# Install MongoDB
echo "Installing MongoDB 8.0..."
apt-get update
apt-get install -y mongodb-org

# Configure MongoDB to listen on all interfaces
echo "Configuring MongoDB..."
cat > /etc/mongod.conf <<'MONGOCONF'
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
net:
  port: 27017
  bindIp: 0.0.0.0
processManagement:
  timeZoneInfo: /usr/share/zoneinfo
MONGOCONF

# Start MongoDB
echo "Starting MongoDB..."
systemctl start mongod
systemctl enable mongod

# Wait for MongoDB to be ready
echo "Waiting for MongoDB to start..."
sleep 10

# Create admin user
echo "Creating admin user..."
mongosh <<'MONGOJS'
use admin
db.createUser({
  user: "admin",
  pwd: "REPLACE_PASSWORD",
  roles: [ 
    { role: "root", db: "admin" },
    { role: "userAdminAnyDatabase", db: "admin" },
    { role: "dbAdminAnyDatabase", db: "admin" },
    { role: "readWriteAnyDatabase", db: "admin" }
  ]
})
MONGOJS

# Enable authentication
echo "Enabling authentication..."
cat >> /etc/mongod.conf <<'AUTHCONF'
security:
  authorization: enabled
AUTHCONF

# Restart MongoDB with authentication
echo "Restarting MongoDB with authentication..."
systemctl restart mongod

# Wait for restart
sleep 5

# Verify
echo "Verifying MongoDB is running..."
systemctl status mongod --no-pager

echo "=== MongoDB Installation Complete ==="
date
USERDATA
)

# Replace password in user data
USER_DATA="${USER_DATA//REPLACE_PASSWORD/$MONGO_PASSWORD}"

echo "βœ… User data script created"

What the script does:

  1. Installs MongoDB 8.0 from official repository

  2. Configures MongoDB to listen on all interfaces (0.0.0.0)

  3. Starts MongoDB and enables auto-start on boot

  4. Creates admin user with full permissions

  5. Enables authentication for security

  6. Restarts MongoDB with authentication enabled

Step 3.5: Launch MongoDB EC2 Instance

What This Does: Launches t2.medium EC2 instance with MongoDB auto-installed.

This takes 3-4 minutes to launch + 2-3 minutes for MongoDB installation.

source ~/.openedx-config/settings.sh

echo "Launching MongoDB EC2 instance..."

MONGO_INSTANCE_ID=$(aws ec2 run-instances \
  --image-id $AMI_ID \
  --instance-type t2.medium \
  --subnet-id $MONGO_SUBNET \
  --security-group-ids $MONGO_SG \
  --user-data "$USER_DATA" \
  --block-device-mappings '[
    {
      "DeviceName":"/dev/sda1",
      "Ebs":{
        "VolumeSize":30,
        "VolumeType":"gp3",
        "Iops":3000,
        "Encrypted":true,
        "DeleteOnTermination":false
      }
    }
  ]' \
  --tag-specifications 'ResourceType=instance,Tags=[
    {Key=Name,Value=openedx-mongodb},
    {Key=Project,Value=openedx},
    {Key=Type,Value=database}
  ]' \
  --region $AWS_REGION \
  --query 'Instances[0].InstanceId' \
  --output text)

echo "Instance ID: $MONGO_INSTANCE_ID"

# Save to config
sed -i "s|export MONGO_INSTANCE_ID=\"\"|export MONGO_INSTANCE_ID=\"$MONGO_INSTANCE_ID\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

# Wait for instance to be running
echo "Waiting for instance to start (1-2 min)..."
aws ec2 wait instance-running \
  --instance-ids $MONGO_INSTANCE_ID \
  --region $AWS_REGION

echo "βœ… Instance is running"

What each setting does:

  • --instance-type t2.medium: 2 vCPU, 4GB RAM (sufficient for OpenEdX)

  • --block-device-mappings: 30GB gp3 storage with 3000 IOPS

  • Encrypted:true: Encryption at rest (security best practice)

  • DeleteOnTermination:false: Keep volume if instance terminates (data safety)

Step 3.6: Get MongoDB IP and Build Connection String

What This Does: Gets private IP and creates MongoDB connection string for Tutor.

source ~/.openedx-config/settings.sh

# Wait for user-data script to complete MongoDB installation
echo "Waiting for MongoDB installation to complete (2-3 min)..."
sleep 180

# Get private IP
MONGO_IP=$(aws ec2 describe-instances \
  --instance-ids $MONGO_INSTANCE_ID \
  --region $AWS_REGION \
  --query 'Reservations[0].Instances[0].PrivateIpAddress' \
  --output text)

echo "MongoDB private IP: $MONGO_IP"

# Build MongoDB connection string
# Format: mongodb://username:password@host:port/database?authSource=admin
MONGO_HOST="mongodb://admin:${MONGO_PASSWORD}@${MONGO_IP}:27017/openedx?authSource=admin"

# Save to config
sed -i "s|export MONGO_IP=\"\"|export MONGO_IP=\"$MONGO_IP\"|" ~/.openedx-config/settings.sh
sed -i "s|export MONGO_HOST=\"\"|export MONGO_HOST=\"$MONGO_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… MongoDB connection string created"
echo "IP: $MONGO_IP"
echo "Connection: mongodb://admin:***@$MONGO_IP:27017/openedx"

Connection String Explained:

mongodb://        Protocol
admin:password    Username and password
@192.168.x.x      Private IP (only accessible from VPC)
:27017            MongoDB port
/openedx          Database name
?authSource=admin Authentication database

Step 3.7: Verify MongoDB Installation

What This Does: Tests that MongoDB is installed, running, and accepting connections.

source ~/.openedx-config/settings.sh

echo "Testing MongoDB connection from Kubernetes..."

kubectl run mongo-test --rm -i --image=mongo:8.0 -n openedx -- \
  mongosh "$MONGO_HOST" --eval "
    db.adminCommand({ping: 1});
    db.version();
    db.getMongo();
  "

echo "βœ… MongoDB connection verified!"

Expected output:

{ ok: 1 }
8.0.x
mongodb://admin:***@192.168.x.x:27017/openedx?authSource=admin

Troubleshooting MongoDB

If connection fails:

# Check instance is running
aws ec2 describe-instances \
  --instance-ids $MONGO_INSTANCE_ID \
  --query 'Reservations[0].Instances[0].State.Name'

# Check user-data script logs (need SSM or SSH)
aws ec2 get-console-output \
  --instance-id $MONGO_INSTANCE_ID \
  --output text

# Check security group allows port 27017
aws ec2 describe-security-groups \
  --group-ids $MONGO_SG \
  --query 'SecurityGroups[0].IpPermissions[?ToPort==`27017`]'

Screenshot for Evidence

  • EC2 Console showing running MongoDB instance

  • Output of mongo-test pod showing successful connection

  • MongoDB version output


PART 4: Redis & OpenSearch

What This Does

Creates caching (Redis) and search (OpenSearch) services. Both are AWS managed services for reliability.

Why These?

  • Redis: Session caching, API response caching, background job queue

  • OpenSearch: Full-text course search, analytics, reporting

Step 4.1: Create Redis (ElastiCache)

What This Does: Creates a managed Redis 7.1 instance for caching.

source ~/.openedx-config/settings.sh

echo "Creating Redis 7.1 (5-10 min)..."

# Recalculate security groups
DEFAULT_SG=$(aws ec2 describe-security-groups \
  --filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
  --region $AWS_REGION \
  --query 'SecurityGroups[0].GroupId' --output text)

EKS_SG=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
  --output text)

# Allow Redis port 6379
aws ec2 authorize-security-group-ingress \
  --group-id $DEFAULT_SG \
  --protocol tcp \
  --port 6379 \
  --source-group $EKS_SG \
  --region $AWS_REGION 2>/dev/null || echo "Redis rule exists"

# Recalculate private subnets
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --region $AWS_REGION \
  --query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
  --output text | tr '\t' ' ')

# Create cache subnet group
aws elasticache create-cache-subnet-group \
  --cache-subnet-group-name ${PROJECT_NAME}-redis \
  --cache-subnet-group-description "Redis subnet for OpenEdX" \
  --subnet-ids $PRIVATE_SUBNETS \
  --region $AWS_REGION 2>/dev/null || echo "Subnet group exists"

# Create Redis cluster
aws elasticache create-cache-cluster \
  --cache-cluster-id ${PROJECT_NAME}-redis \
  --cache-node-type cache.t3.micro \
  --engine redis \
  --engine-version 7.1 \
  --num-cache-nodes 1 \
  --cache-subnet-group-name ${PROJECT_NAME}-redis \
  --security-group-ids $DEFAULT_SG \
  --region $AWS_REGION

# Wait for Redis to be available
echo "Waiting for Redis (5-10 min)..."
aws elasticache wait cache-cluster-available \
  --cache-cluster-id ${PROJECT_NAME}-redis \
  --region $AWS_REGION

echo "βœ… Redis cluster created"

Step 4.2: Get Redis Endpoint

source ~/.openedx-config/settings.sh

REDIS_HOST=$(aws elasticache describe-cache-clusters \
  --cache-cluster-id ${PROJECT_NAME}-redis \
  --show-cache-node-info \
  --region $AWS_REGION \
  --query 'CacheClusters[0].CacheNodes[0].Endpoint.Address' \
  --output text)

# Save to config
sed -i "s|export REDIS_HOST=\"\"|export REDIS_HOST=\"$REDIS_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… Redis endpoint: $REDIS_HOST"

Step 4.3: Create OpenSearch Domain

What This Does: Creates managed OpenSearch 2.11 for course search and analytics.

This takes 15-20 minutes and runs in background.

source ~/.openedx-config/settings.sh

echo "Creating OpenSearch 2.11 (15-20 min, background)..."

# Recalculate security groups and subnets
DEFAULT_SG=$(aws ec2 describe-security-groups \
  --filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
  --region $AWS_REGION \
  --query 'SecurityGroups[0].GroupId' --output text)

EKS_SG=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
  --output text)

# Allow OpenSearch port 443
aws ec2 authorize-security-group-ingress \
  --group-id $DEFAULT_SG \
  --protocol tcp \
  --port 443 \
  --source-group $EKS_SG \
  --region $AWS_REGION 2>/dev/null || echo "OpenSearch rule exists"

# Get private subnets
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=$VPC_ID" \
  --region $AWS_REGION \
  --query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
  --output text | tr '\t' ' ')

# Get first subnet for OpenSearch (single-node)
OPENSEARCH_SUBNET=$(echo $PRIVATE_SUBNETS | awk '{print $1}')

# Create OpenSearch domain
aws opensearch create-domain \
  --domain-name ${PROJECT_NAME}-search \
  --engine-version OpenSearch_2.11 \
  --cluster-config \
    InstanceType=t3.small.search,InstanceCount=1 \
  --ebs-options \
    EBSEnabled=true,VolumeType=gp3,VolumeSize=10,Iops=3000 \
  --vpc-options \
    "SubnetIds=$OPENSEARCH_SUBNET,SecurityGroupIds=$DEFAULT_SG" \
  --access-policies '{
    "Version":"2012-10-17",
    "Statement":[{
      "Effect":"Allow",
      "Principal":{"AWS":"*"},
      "Action":"es:*",
      "Resource":"*"
    }]
  }' \
  --region $AWS_REGION

echo "βœ… OpenSearch domain creation started (15-20 min)"
echo "Continuing with other tasks while it creates..."

Step 4.4: Create OpenSearch Check Script

What This Does: Creates a script to check when OpenSearch is ready.

cat > ~/.openedx-config/check-opensearch.sh <<'CHECK'
#!/bin/bash
source ~/.openedx-config/settings.sh

STATUS=$(aws opensearch describe-domain \
  --domain-name ${PROJECT_NAME}-search \
  --region $AWS_REGION \
  --query 'DomainStatus.Processing' \
  --output text)

if [ "$STATUS" = "False" ]; then
    OPENSEARCH_HOST=$(aws opensearch describe-domain \
      --domain-name ${PROJECT_NAME}-search \
      --region $AWS_REGION \
      --query 'DomainStatus.Endpoints.vpc' \
      --output text)

    sed -i "s|export OPENSEARCH_HOST=\"\"|export OPENSEARCH_HOST=\"$OPENSEARCH_HOST\"|" ~/.openedx-config/settings.sh
    source ~/.openedx-config/settings.sh

    echo "βœ… OpenSearch ready: https://$OPENSEARCH_HOST"
    exit 0
else
    echo "⏳ OpenSearch still creating... ($STATUS)"
    exit 1
fi
CHECK

chmod +x ~/.openedx-config/check-opensearch.sh

echo "βœ… OpenSearch check script created"
echo "Run: ~/.openedx-config/check-opensearch.sh to check status"

Use this script later before deploying OpenEdX!

Verification

# Check Redis
kubectl run redis-test --rm -i --image=redis:7.1 -n openedx -- \
  redis-cli -h $REDIS_HOST ping
# Should return: PONG

# Check OpenSearch status
~/.openedx-config/check-opensearch.sh

Screenshot for Evidence

  • ElastiCache Console showing Redis cluster

  • OpenSearch Console showing domain

  • Output of redis-cli ping test


PART 5: Storage (S3 + EBS)

What This Does

Sets up storage for static files (CSS, JS, images) in S3 and persistent volumes for uploads in EBS.

Why S3?

  • Cost-effective: Pay only for what you use

  • Scalable: No size limits

  • Fast: Can be served via CloudFront CDN

  • Durable: 99.999999999% durability (11 nines)

Step 5.1: Create S3 Bucket

What This Does: Creates encrypted S3 bucket for static files.

source ~/.openedx-config/settings.sh

# Create unique bucket name with timestamp
S3_BUCKET_NAME="${PROJECT_NAME}-static-$(date +%s)"

echo "Creating S3 bucket: $S3_BUCKET_NAME"

# Create bucket
aws s3api create-bucket \
  --bucket $S3_BUCKET_NAME \
  --region $AWS_REGION

# Enable versioning (keep file history)
aws s3api put-bucket-versioning \
  --bucket $S3_BUCKET_NAME \
  --versioning-configuration Status=Enabled

# Block all public access (security)
aws s3api put-public-access-block \
  --bucket $S3_BUCKET_NAME \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

# Enable encryption at rest
aws s3api put-bucket-encryption \
  --bucket $S3_BUCKET_NAME \
  --server-side-encryption-configuration '{
    "Rules":[{
      "ApplyServerSideEncryptionByDefault":{
        "SSEAlgorithm":"AES256"
      }
    }]
  }'

# Save to config
sed -i "s|export S3_BUCKET_NAME=\"\"|export S3_BUCKET_NAME=\"$S3_BUCKET_NAME\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… S3 bucket created: $S3_BUCKET_NAME"

Step 5.2: Create IAM Policy for S3 Access

What This Does: Creates permissions for OpenEdX pods to read/write S3.

source ~/.openedx-config/settings.sh

cat > /tmp/s3-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:*"],
    "Resource": [
      "arn:aws:s3:::$S3_BUCKET_NAME",
      "arn:aws:s3:::$S3_BUCKET_NAME/*"
    ]
  }]
}
EOF

# Create IAM policy
aws iam create-policy \
  --policy-name ${PROJECT_NAME}-s3-policy \
  --policy-document file:///tmp/s3-policy.json \
  2>/dev/null || echo "Policy already exists"

echo "βœ… IAM policy created"

Step 5.3: Create IAM Role for Service Account

What This Does: Links IAM permissions to Kubernetes service account using IRSA (IAM Roles for Service Accounts).

Why IRSA?

  • No AWS credentials in pods (security)

  • Automatic credential rotation

  • Fine-grained permissions per pod

source ~/.openedx-config/settings.sh

eksctl create iamserviceaccount \
  --name openedx-s3-sa \
  --namespace openedx \
  --cluster $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --role-name ${PROJECT_NAME}-s3-role \
  --attach-policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${PROJECT_NAME}-s3-policy \
  --approve \
  --override-existing-serviceaccounts

echo "βœ… Service account created with S3 access"

Step 5.4: Configure gp3 Storage Class

What This Does: Sets gp3 as default storage class for persistent volumes.

Why gp3 over gp2?

  • Same or lower cost

  • 3000 baseline IOPS (vs gp2's 3 IOPS/GB)

  • 125 MiB/s baseline throughput

  • Better performance for databases and file uploads

cat > ~/openedx-project/k8s/storageclass-gp3.yaml <<'EOF'
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
EOF

# Install EBS CSI driver
eksctl create addon \
  --name aws-ebs-csi-driver \
  --cluster $EKS_CLUSTER_NAME \
  --region $AWS_REGION \
  --force

# Wait for driver to be ready
echo "Waiting for EBS CSI driver (30 sec)..."
sleep 30

# Remove gp2 as default
kubectl annotate storageclass gp2 \
  storageclass.kubernetes.io/is-default-class=false \
  --overwrite 2>/dev/null || true

# Apply gp3 storage class
kubectl apply -f ~/openedx-project/k8s/storageclass-gp3.yaml

echo "βœ… gp3 storage class configured as default"

Verification

# Check storage classes
kubectl get storageclass

# Should show:
# NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE
# gp2             kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer
# gp3 (default)   ebs.csi.aws.com         Retain          WaitForFirstConsumer

# Check S3 bucket
aws s3 ls | grep $S3_BUCKET_NAME

# Check service account
kubectl get serviceaccount openedx-s3-sa -n openedx

Screenshot for Evidence

  • S3 Console showing bucket with encryption enabled

  • Output of kubectl get storageclass showing gp3 as default

  • IAM Console showing policy and role


PART 6: Deploy OpenEdX

What This Does

Deploys OpenEdX using Tutor 21.0.1 with all external databases configured.

Critical Lessons Learned

  1. S3 plugin is BROKEN in Tutor 21.0.1 - causes STORAGES error

  2. Meilisearch must be disabled - causes blank page issues

  3. Caddy must be disabled properly - use ENABLE_WEB_PROXY=false

  4. MySQL needs root credentials - set MYSQL_ROOT_USERNAME and MYSQL_ROOT_PASSWORD

  5. Migrations don't run automatically - must run manually from worker pod

Step 6.1: Check OpenSearch is Ready

CRITICAL: OpenSearch must be ready before deploying!

source ~/.openedx-config/settings.sh

echo "Checking if OpenSearch is ready..."
~/.openedx-config/check-opensearch.sh

# If not ready, wait and check again
while ! ~/.openedx-config/check-opensearch.sh; do
    echo "Waiting 60 seconds..."
    sleep 60
done

source ~/.openedx-config/settings.sh
echo "βœ… OpenSearch ready: $OPENSEARCH_HOST"

Step 6.2: Configure Tutor

What This Does: Configures Tutor with all external services and disables problematic plugins.

source ~/.openedx-config/settings.sh
cd ~/openedx-project

echo "Configuring Tutor 21.0.1..."

# Initialize Tutor configuration
tutor config save

# CRITICAL: Disable problematic features
echo "Disabling Caddy (replaced by Nginx)..."
tutor config save \
  --set ENABLE_WEB_PROXY=false \
  --set CADDY_HTTP_PORT=81

# Disable internal services (using external)
echo "Configuring external services..."
tutor config save \
  --set RUN_MYSQL=false \
  --set RUN_MONGODB=false \
  --set RUN_REDIS=false \
  --set RUN_ELASTICSEARCH=false \
  --set RUN_MEILISEARCH=false \
  --set RUN_SMTP=false \
  --set ENABLE_HTTPS=true \
  --set K8S_NAMESPACE=openedx

# MySQL configuration (BOTH app and root credentials!)
echo "Configuring MySQL..."
tutor config save \
  --set MYSQL_HOST=$MYSQL_HOST \
  --set MYSQL_PORT=3306 \
  --set MYSQL_DATABASE=openedx \
  --set MYSQL_USERNAME=openedx \
  --set MYSQL_PASSWORD=$MYSQL_PASSWORD \
  --set MYSQL_ROOT_USERNAME=admin \
  --set MYSQL_ROOT_PASSWORD=$MYSQL_PASSWORD

# MongoDB configuration
echo "Configuring MongoDB..."
tutor config save \
  --set MONGODB_HOST=$MONGO_HOST

# Redis configuration  
echo "Configuring Redis..."
tutor config save \
  --set REDIS_HOST=$REDIS_HOST \
  --set REDIS_PORT=6379

# OpenSearch configuration (use elasticsearch settings)
echo "Configuring OpenSearch..."
tutor config save \
  --set SEARCH_ENGINE=elasticsearch \
  --set ELASTICSEARCH_HOST=$OPENSEARCH_HOST \
  --set ELASTICSEARCH_PORT=443 \
  --set ELASTICSEARCH_SCHEME=https

# Domain configuration
echo "Configuring domains..."
tutor config save \
  --set LMS_HOST=$DOMAIN \
  --set CMS_HOST=$STUDIO_DOMAIN \
  --set MFE_HOST=$MFE_DOMAIN

# Session cookie configuration (None = use domain from request)
tutor config save \
  --set OPENEDX_COMMON_SESSION_COOKIE_DOMAIN=None \
  --set OPENEDX_COMMON_CSRF_COOKIE_DOMAIN=None

echo "βœ… Tutor configured"

What each setting does:

  • ENABLE_WEB_PROXY=false: Disables Caddy (we use Nginx)

  • RUN_*=false: Disables internal services (using external)

  • RUN_MEILISEARCH=false: Critical! Prevents blank page issues

  • SEARCH_ENGINE=elasticsearch: Use OpenSearch (compatible with Elasticsearch API)

  • MYSQL_ROOT_*: Required for Tutor's init jobs

  • SESSION_COOKIE_DOMAIN=None: Allows cookies to work across subdomains

Step 6.3: Deploy OpenEdX to Kubernetes

What This Does: Creates all Kubernetes resources (deployments, services, configmaps).

tutor k8s start

echo "Waiting for pods to start (2 min)..."
sleep 120

# Wait for LMS to be ready
kubectl wait --for=condition=ready \
  pod -l app.kubernetes.io/name=lms \
  -n openedx \
  --timeout=600s

echo "βœ… OpenEdX deployed to Kubernetes"

Step 6.4: Verify Caddy is Not Running

What This Does: Ensures Caddy is completely removed (we use Nginx).

# Check if Caddy deployment exists
if kubectl get deployment caddy -n openedx 2>&1 | grep -q "NotFound"; then
    echo "βœ… Caddy correctly disabled"
else
    echo "⚠️  Caddy still exists, removing..."
    kubectl delete deployment caddy service caddy -n openedx 2>/dev/null || true
fi

# Remove any Caddy configmaps
kubectl delete configmap -l app.kubernetes.io/name=caddy -n openedx 2>/dev/null || true

echo "βœ… Caddy removed"

Step 6.5: Run Database Migrations Manually

What This Does: Creates all database tables. Tutor's k8s init doesn't work properly, so we run migrations from worker pod.

Why from worker pod?

  • Worker pods are stable (not restarting)

  • Same code as LMS/CMS

  • Same database connections

  • Django locks prevent concurrent migrations

source ~/.openedx-config/settings.sh

echo "Running LMS migrations (creates ~300 database tables)..."
echo "This takes 5-10 minutes..."

# Get LMS worker pod name
LMS_WORKER=$(kubectl get pod -l app.kubernetes.io/name=lms-worker \
  -n openedx \
  -o jsonpath='{.items[0].metadata.name}')

echo "Using worker pod: $LMS_WORKER"

# Run LMS migrations
kubectl exec -it $LMS_WORKER -n openedx -- \
  ./manage.py lms migrate --noinput

echo "βœ… LMS migrations complete"

echo "Running CMS migrations..."

# Get CMS worker pod name
CMS_WORKER=$(kubectl get pod -l app.kubernetes.io/name=cms-worker \
  -n openedx \
  -o jsonpath='{.items[0].metadata.name}')

echo "Using worker pod: $CMS_WORKER"

# Run CMS migrations
kubectl exec -it $CMS_WORKER -n openedx -- \
  ./manage.py cms migrate --noinput

echo "βœ… CMS migrations complete"
echo "βœ… All database tables created"

What migrations do:

  • Create ~300 tables in MySQL (users, courses, enrollments, grades, etc.)

  • Create CMS-specific tables (course authoring, content library)

  • Set up initial data (waffle switches, site configuration)

Step 6.6: Restart LMS and CMS Pods

What This Does: Restarts application pods so they can connect to newly-migrated database.

echo "Restarting LMS and CMS pods..."

kubectl rollout restart deployment lms cms -n openedx

# Wait for new pods to be ready
echo "Waiting for pods to restart (2 min)..."
sleep 120

kubectl wait --for=condition=ready \
  pod -l app.kubernetes.io/name=lms \
  -n openedx \
  --timeout=600s

kubectl wait --for=condition=ready \
  pod -l app.kubernetes.io/name=cms \
  -n openedx \
  --timeout=600s

echo "βœ… Pods restarted and ready"

Step 6.7: Create Admin User

What This Does: Creates superuser account for logging into OpenEdX.

source ~/.openedx-config/settings.sh

# Get LMS pod
LMS_POD=$(kubectl get pod -l app.kubernetes.io/name=lms \
  -n openedx \
  -o jsonpath='{.items[0].metadata.name}')

echo "Creating admin user..."

# Create user with staff and superuser permissions
kubectl exec -it $LMS_POD -n openedx -- \
  ./manage.py lms manage_user \
    admin \
    $ADMIN_EMAIL \
    --staff \
    --superuser

echo "Setting admin password..."

# Set password (will prompt you to enter password twice)
kubectl exec -it $LMS_POD -n openedx -- \
  ./manage.py lms changepassword admin

echo "βœ… Admin user created: admin / [your-password]"
echo "⚠️  SAVE THIS PASSWORD - you'll need it to login!"

Verification

# Check all pods are running
kubectl get pods -n openedx

# Should show:
# NAME                         READY   STATUS    RESTARTS   AGE
# cms-xxx                      1/1     Running   0          5m
# cms-worker-xxx               1/1     Running   0          5m
# lms-xxx                      1/1     Running   0          5m
# lms-worker-xxx               1/1     Running   0          5m
# mfe-xxx                      1/1     Running   0          5m

# Check services
kubectl get svc -n openedx

# Should show LMS, CMS, MFE services on port 8000/8002

# Test LMS API internally
kubectl run test --rm -i --image=curlimages/curl -n openedx -- \
  curl -I http://lms:8000/api/user/v1/me

# Should return: HTTP/1.1 401 Unauthorized (correct - needs auth)

Screenshot for Evidence

  • Output of kubectl get pods -n openedx showing all Running

  • Output of LMS migrations showing "OK" for each migration

  • Admin user creation confirmation


PART 7: Nginx Ingress

What This Does

Replaces Caddy with Nginx Ingress Controller for HTTP/2 support and industry-standard reverse proxy.

Why Nginx?

  • Industry standard: Well-documented, widely used

  • HTTP/2 support: Faster page loads

  • Better performance: Handles high traffic efficiently

  • Al Nafi requirement: Specifically requested in JD

Step 7.1: Install Nginx Ingress Controller

What This Does: Deploys Nginx Ingress Controller with AWS Network Load Balancer.

source ~/.openedx-config/settings.sh

echo "Installing Nginx Ingress Controller 4.14.3..."

# Add Helm repository
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

# Install Nginx Ingress
helm install nginx-ingress ingress-nginx/ingress-nginx \
  --version 4.14.3 \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.service.type=LoadBalancer \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"="nlb" \
  --set controller.config.use-http2="true" \
  --set controller.config.enable-http3="true" \
  --set controller.config.ssl-protocols="TLSv1.2 TLSv1.3" \
  --set controller.config.proxy-body-size="100m"

echo "Waiting for Load Balancer (2 min)..."
sleep 120

echo "βœ… Nginx Ingress installed"

What each setting does:

  • --version 4.14.3: Latest version supporting Kubernetes 1.34

  • service.type=LoadBalancer: Creates AWS NLB

  • aws-load-balancer-type=nlb: Network Load Balancer (Layer 4)

  • use-http2=true: Enable HTTP/2 protocol

  • enable-http3=true: Enable HTTP/3 (QUIC) support

  • ssl-protocols: TLS 1.2 and 1.3 only (security)

  • proxy-body-size=100m: Allow large file uploads

Step 7.2: Get Load Balancer Hostname

What This Does: Gets AWS NLB DNS name for configuring Cloudflare.

source ~/.openedx-config/settings.sh

LB_HOSTNAME=$(kubectl get svc nginx-ingress-ingress-nginx-controller \
  -n ingress-nginx \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Save to config
sed -i "s|export LB_HOSTNAME=\"\"|export LB_HOSTNAME=\"$LB_HOSTNAME\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… Load Balancer: $LB_HOSTNAME"
echo "This will be used in Cloudflare DNS"

Step 7.3: Create Ingress Resource

What This Does: Configures routing rules for LMS, CMS, and MFE based on hostname.

source ~/.openedx-config/settings.sh

cat > ~/openedx-project/k8s/ingress.yaml <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: openedx-ingress
  namespace: openedx
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
  ingressClassName: nginx
  rules:
  - host: $DOMAIN
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: lms
            port:
              number: 8000
  - host: $STUDIO_DOMAIN
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: cms
            port:
              number: 8000
  - host: $MFE_DOMAIN
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: mfe
            port:
              number: 8002
EOF

kubectl apply -f ~/openedx-project/k8s/ingress.yaml

echo "βœ… Ingress rules configured"

Routing explained:

Request to yourdomain.com
    ↓
Nginx reads Host header: yourdomain.com
    ↓
Matches rule #1
    ↓
Routes to LMS service (port 8000)

Request to studio.yourdomain.com
    ↓
Nginx reads Host header: studio.yourdomain.com
    ↓
Matches rule #2
    ↓
Routes to CMS service (port 8000)

Request to apps.yourdomain.com
    ↓
Nginx reads Host header: apps.yourdomain.com
    ↓
Matches rule #3
    ↓
Routes to MFE service (port 8002)

Verification

# Check Nginx pods
kubectl get pods -n ingress-nginx

# Should show:
# nginx-ingress-ingress-nginx-controller-xxx   1/1   Running

# Check ingress resource
kubectl get ingress -n openedx

# Should show:
# NAME              CLASS   HOSTS                                    ADDRESS
# openedx-ingress   nginx   yourdomain.com,studio...,apps...         xxx.elb.amazonaws.com

# Test Nginx config
kubectl exec -it \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
  -n ingress-nginx -- \
  nginx -t

# Should return: configuration file /etc/nginx/nginx.conf test is successful

Screenshot for Evidence

  • Output of kubectl get ingress -n openedx

  • AWS EC2 Load Balancers console showing NLB

  • Nginx controller logs showing HTTP/2 enabled


PART 8: SSL/TLS (cert-manager)

What This Does

Automates SSL certificate management using cert-manager and Let's Encrypt.

Why cert-manager?

  • Free SSL certificates from Let's Encrypt

  • Automatic renewal (90-day certs renewed at 60 days)

  • Industry standard for Kubernetes SSL

  • Zero maintenance after setup

Step 8.1: Install cert-manager

What This Does: Installs cert-manager CRDs and controller.

echo "Installing cert-manager 1.14.4..."

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml

echo "Waiting for cert-manager to be ready (1 min)..."
sleep 60

# Verify cert-manager is running
kubectl get pods -n cert-manager

# Should show 3 pods:
# cert-manager-xxx
# cert-manager-cainjector-xxx
# cert-manager-webhook-xxx

echo "βœ… cert-manager installed"

Step 8.2: Create Let's Encrypt Issuer

What This Does: Configures cert-manager to use Let's Encrypt for SSL certificates.

source ~/.openedx-config/settings.sh

cat > ~/openedx-project/k8s/letsencrypt-issuer.yaml <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # Let's Encrypt production server
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email for expiry notifications
    email: $ADMIN_EMAIL
    # Secret to store account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # HTTP-01 challenge (proves domain ownership)
    solvers:
    - http01:
        ingress:
          class: nginx
EOF

kubectl apply -f ~/openedx-project/k8s/letsencrypt-issuer.yaml

echo "βœ… Let's Encrypt issuer configured"

How it works:

  1. cert-manager requests certificate from Let's Encrypt

  2. Let's Encrypt sends HTTP challenge: "Prove you own this domain"

  3. cert-manager creates temporary Ingress route for challenge

  4. Let's Encrypt verifies domain ownership via HTTP request

  5. Certificate issued and stored in Kubernetes Secret

  6. Nginx uses certificate for TLS termination

Step 8.3: Update Ingress with TLS

What This Does: Adds TLS configuration to Ingress, triggering automatic certificate issuance.

source ~/.openedx-config/settings.sh

cat > ~/openedx-project/k8s/ingress.yaml <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: openedx-ingress
  namespace: openedx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - $DOMAIN
    - $STUDIO_DOMAIN
    - $MFE_DOMAIN
    secretName: openedx-tls
  rules:
  - host: $DOMAIN
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: lms
            port:
              number: 8000
  - host: $STUDIO_DOMAIN
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: cms
            port:
              number: 8000
  - host: $MFE_DOMAIN
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: mfe
            port:
              number: 8002
EOF

kubectl apply -f ~/openedx-project/k8s/ingress.yaml

echo "Certificate issuance triggered..."
echo "Waiting for certificate (2-3 min)..."
sleep 180

# Check certificate status
kubectl get certificate -n openedx

# Should show:
# NAME          READY   SECRET        AGE
# openedx-tls   True    openedx-tls   2m

echo "βœ… SSL certificates issued"

Step 8.4: Verify SSL Certificate

# Check certificate details
kubectl describe certificate openedx-tls -n openedx

# Should show:
#   Status:
#     Conditions:
#       Type:    Ready
#       Status:  True
#   Not After:   [3 months from now]

# Test HTTPS (replace with your domain)
curl -I https://$DOMAIN

# Should return: HTTP/2 200

Verification

# Check cert-manager pods
kubectl get pods -n cert-manager

# Check certificate
kubectl get certificate -n openedx
# Should show: openedx-tls   True    openedx-tls   

# Check TLS secret
kubectl get secret openedx-tls -n openedx
# Should show secret with tls.crt and tls.key

# Verify certificate expiry (should be ~90 days)
kubectl get certificate openedx-tls -n openedx -o jsonpath='{.status.notAfter}'

Screenshot for Evidence

  • Output of kubectl get certificate -n openedx showing READY=True

  • Browser showing green padlock on your domain

  • SSL Labs test showing A+ rating (optional)


PART 9: CloudFront + WAF

What This Does

Sets up CDN for static files and Web Application Firewall for security.

Why CloudFront + WAF?

  • Faster load times: Serve static files from edge locations

  • Reduced origin load: S3 serves files, not application servers

  • DDoS protection: WAF rate limiting and bot detection

  • Cost savings: Cheaper bandwidth from CloudFront than EKS

Step 9.1: Create CloudFront Origin Access Identity

What This Does: Allows CloudFront to access private S3 bucket.

source ~/.openedx-config/settings.sh

echo "Creating CloudFront Origin Access Identity..."

OAI_ID=$(aws cloudfront create-cloud-front-origin-access-identity \
  --cloud-front-origin-access-identity-config \
    CallerReference=$(date +%s),Comment="OpenEdX Static Files" \
  --query 'CloudFrontOriginAccessIdentity.Id' \
  --output text)

echo "OAI ID: $OAI_ID"

# Update S3 bucket policy to allow CloudFront
cat > /tmp/s3-cloudfront-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity $OAI_ID"
    },
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::$S3_BUCKET_NAME/*"
  }]
}
EOF

aws s3api put-bucket-policy \
  --bucket $S3_BUCKET_NAME \
  --policy file:///tmp/s3-cloudfront-policy.json

echo "βœ… S3 bucket policy updated for CloudFront"

Step 9.2: Create CloudFront Distribution

What This Does: Creates CDN distribution for S3 static files.

source ~/.openedx-config/settings.sh

echo "Creating CloudFront distribution..."

cat > /tmp/cloudfront-config.json <<EOF
{
  "CallerReference": "$(date +%s)",
  "Comment": "OpenEdX Static Files CDN",
  "Enabled": true,
  "Origins": {
    "Quantity": 1,
    "Items": [{
      "Id": "S3-$S3_BUCKET_NAME",
      "DomainName": "$S3_BUCKET_NAME.s3.$AWS_REGION.amazonaws.com",
      "S3OriginConfig": {
        "OriginAccessIdentity": "origin-access-identity/cloudfront/$OAI_ID"
      }
    }]
  },
  "DefaultCacheBehavior": {
    "TargetOriginId": "S3-$S3_BUCKET_NAME",
    "ViewerProtocolPolicy": "redirect-to-https",
    "AllowedMethods": {
      "Quantity": 2,
      "Items": ["GET", "HEAD"],
      "CachedMethods": {
        "Quantity": 2,
        "Items": ["GET", "HEAD"]
      }
    },
    "ForwardedValues": {
      "QueryString": false,
      "Cookies": {"Forward": "none"}
    },
    "MinTTL": 0,
    "DefaultTTL": 86400,
    "MaxTTL": 31536000,
    "Compress": true,
    "TrustedSigners": {
      "Enabled": false,
      "Quantity": 0
    }
  },
  "PriceClass": "PriceClass_100",
  "ViewerCertificate": {
    "CloudFrontDefaultCertificate": true
  },
  "HttpVersion": "http2and3"
}
EOF

aws cloudfront create-distribution \
  --distribution-config file:///tmp/cloudfront-config.json \
  > /tmp/cloudfront-output.json

CF_ID=$(jq -r '.Distribution.Id' /tmp/cloudfront-output.json)
CLOUDFRONT_URL=$(jq -r '.Distribution.DomainName' /tmp/cloudfront-output.json)

# Save to config
sed -i "s|export CLOUDFRONT_ID=\"\"|export CLOUDFRONT_ID=\"$CF_ID\"|" ~/.openedx-config/settings.sh
sed -i "s|export CLOUDFRONT_URL=\"\"|export CLOUDFRONT_URL=\"$CLOUDFRONT_URL\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… CloudFront distribution created"
echo "Distribution ID: $CF_ID"
echo "CloudFront URL: $CLOUDFRONT_URL"

What each setting does:

  • ViewerProtocolPolicy: redirect-to-https: Force HTTPS

  • DefaultTTL: 86400: Cache for 24 hours

  • Compress: true: Enable gzip compression

  • HttpVersion: http2and3: Enable HTTP/2 and HTTP/3

  • PriceClass_100: Use only US, Canada, Europe edge locations (cheapest)

Step 9.3: Create WAF Web ACL

What This Does: Creates Web Application Firewall with rate limiting and DDoS protection.

WAF MUST be in us-east-1 for CloudFront!

source ~/.openedx-config/settings.sh

echo "Creating WAF Web ACL in us-east-1..."

aws wafv2 create-web-acl \
  --name ${PROJECT_NAME}-waf \
  --scope CLOUDFRONT \
  --default-action Allow={} \
  --rules '[
    {
      "Name": "RateLimit",
      "Priority": 1,
      "Statement": {
        "RateBasedStatement": {
          "Limit": 2000,
          "AggregateKeyType": "IP"
        }
      },
      "Action": {"Block": {}},
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "RateLimit"
      }
    },
    {
      "Name": "AWSManagedRulesCommonRuleSet",
      "Priority": 2,
      "Statement": {
        "ManagedRuleGroupStatement": {
          "VendorName": "AWS",
          "Name": "AWSManagedRulesCommonRuleSet"
        }
      },
      "OverrideAction": {"None": {}},
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "CommonRuleSet"
      }
    },
    {
      "Name": "AWSManagedRulesKnownBadInputsRuleSet",
      "Priority": 3,
      "Statement": {
        "ManagedRuleGroupStatement": {
          "VendorName": "AWS",
          "Name": "AWSManagedRulesKnownBadInputsRuleSet"
        }
      },
      "OverrideAction": {"None": {}},
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "KnownBadInputs"
      }
    },
    {
      "Name": "AWSManagedRulesAmazonIpReputationList",
      "Priority": 4,
      "Statement": {
        "ManagedRuleGroupStatement": {
          "VendorName": "AWS",
          "Name": "AWSManagedRulesAmazonIpReputationList"
        }
      },
      "OverrideAction": {"None": {}},
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "IpReputation"
      }
    }
  ]' \
  --visibility-config \
    SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=openedx-waf \
  --region us-east-1 \
  > /tmp/waf-output.json

WAF_ARN=$(jq -r '.Summary.ARN' /tmp/waf-output.json)

# Save to config
sed -i "s|export WAF_ARN=\"\"|export WAF_ARN=\"$WAF_ARN\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh

echo "βœ… WAF Web ACL created"
echo "WAF ARN: $WAF_ARN"

WAF Rules Explained:

  1. Rate Limiting: Block IPs making >2000 requests per 5 minutes

  2. Common Rule Set: Protect against SQL injection, XSS, LFI

  3. Known Bad Inputs: Block malformed requests

  4. IP Reputation List: Block known malicious IPs

Step 9.4: Associate WAF with CloudFront

What This Does: Attaches WAF to CloudFront distribution.

source ~/.openedx-config/settings.sh

echo "Waiting for CloudFront distribution to deploy (5-10 min)..."

# Wait for CloudFront to be fully deployed
aws cloudfront wait distribution-deployed \
  --id $CF_ID

echo "CloudFront deployed, attaching WAF..."

# Get current distribution config
aws cloudfront get-distribution-config \
  --id $CF_ID \
  > /tmp/cf-current.json

ETAG=$(jq -r '.ETag' /tmp/cf-current.json)

# Add WAF to config
jq --arg waf "$WAF_ARN" \
  '.DistributionConfig.WebACLId = $waf | .DistributionConfig' \
  /tmp/cf-current.json \
  > /tmp/cf-updated.json

# Update distribution
aws cloudfront update-distribution \
  --id $CF_ID \
  --if-match $ETAG \
  --distribution-config file:///tmp/cf-updated.json

echo "βœ… WAF attached to CloudFront"
echo "Waiting for distribution update (5 min)..."
sleep 300

echo "βœ… CloudFront + WAF fully configured"

Verification

# Check CloudFront distribution
aws cloudfront get-distribution --id $CF_ID \
  --query 'Distribution.DistributionConfig.Enabled'
# Should return: true

# Check WAF is attached
aws cloudfront get-distribution --id $CF_ID \
  --query 'Distribution.DistributionConfig.WebACLId'
# Should return: your WAF ARN

# Test CloudFront URL
curl -I https://$CLOUDFRONT_URL
# Should return: HTTP/2 200

Screenshot for Evidence

  • CloudFront Console showing distribution

  • WAF Console showing Web ACL with 4 rules

  • CloudWatch metrics showing WAF activity


PART 10: Monitoring (Prometheus/Grafana)

What This Does

Sets up centralized monitoring and metrics visualization.

Why Prometheus + Grafana?

  • Industry standard: Most popular Kubernetes monitoring stack

  • Real-time metrics: CPU, memory, network, pod health

  • Custom dashboards: Visualize application performance

  • Alerting: Get notified of issues

Step 10.1: Install Metrics Server

What This Does: Enables kubectl top and HPA (Horizontal Pod Autoscaler).

echo "Installing Metrics Server..."

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

echo "Waiting for Metrics Server (1 min)..."
sleep 60

# Verify metrics are available
kubectl top nodes

# Should show CPU and memory usage for each node

echo "βœ… Metrics Server installed"

Step 10.2: Install Prometheus + Grafana Stack

What This Does: Installs complete monitoring stack with pre-configured dashboards.

echo "Installing Prometheus + Grafana..."

# Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=admin \
  --set prometheus.prometheusSpec.retention=7d \
  --set prometheus.prometheusSpec.resources.requests.memory=1Gi \
  --set grafana.service.type=LoadBalancer

echo "Waiting for Prometheus and Grafana (2 min)..."
sleep 120

echo "βœ… Prometheus + Grafana installed"

What this includes:

  • Prometheus: Metrics collection and storage

  • Grafana: Visualization dashboards

  • AlertManager: Alert routing and notifications

  • Node Exporter: Node-level metrics

  • kube-state-metrics: Kubernetes object metrics

  • Pre-built dashboards: Kubernetes cluster, pod, and node dashboards

Step 10.3: Get Grafana URL

What This Does: Gets the Load Balancer URL for accessing Grafana dashboard.

echo "Getting Grafana URL..."

# Wait for LoadBalancer to be provisioned
sleep 60

GRAFANA_URL=$(kubectl get svc prometheus-grafana \
  -n monitoring \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

echo ""
echo "════════════════════════════════════════════════════════════"
echo "           GRAFANA DASHBOARD ACCESS                          "
echo "════════════════════════════════════════════════════════════"
echo ""
echo "URL:      http://$GRAFANA_URL"
echo "Username: admin"
echo "Password: admin"
echo ""
echo "⚠️  IMPORTANT: Change password after first login!"
echo ""
echo "════════════════════════════════════════════════════════════"

Step 10.4: Access Grafana and View Dashboards

Steps to access Grafana:

  1. Open browser and go to: http://[GRAFANA_URL]

  2. Login with username: admin, password: admin

  3. Change password when prompted

  4. View dashboards:

    • Click "Dashboards" in left menu

    • Select "Kubernetes / Compute Resources / Cluster"

    • This shows overall cluster health

Available Pre-built Dashboards:

  • Kubernetes / Compute Resources / Cluster: Overall cluster metrics

  • Kubernetes / Compute Resources / Namespace (Pods): Pod-level metrics

  • Kubernetes / Compute Resources / Node (Pods): Node-level metrics

  • Kubernetes / Networking / Cluster: Network traffic

  • Node Exporter / Nodes: Detailed node metrics

Step 10.5: Create Custom OpenEdX Dashboard

What This Does: Creates a custom dashboard for monitoring OpenEdX specifically.

cat > ~/openedx-project/k8s/grafana-openedx-dashboard.json <<'EOF'
{
  "dashboard": {
    "title": "OpenEdX Production Monitoring",
    "tags": ["openedx", "lms", "cms"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "LMS Pod CPU Usage",
        "type": "graph",
        "targets": [{
          "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"openedx\",pod=~\"lms.*\"}[5m])) by (pod)",
          "legendFormat": "{{pod}}"
        }]
      },
      {
        "id": 2,
        "title": "LMS Pod Memory Usage",
        "type": "graph",
        "targets": [{
          "expr": "sum(container_memory_usage_bytes{namespace=\"openedx\",pod=~\"lms.*\"}) by (pod)",
          "legendFormat": "{{pod}}"
        }]
      },
      {
        "id": 3,
        "title": "CMS Pod CPU Usage",
        "type": "graph",
        "targets": [{
          "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"openedx\",pod=~\"cms.*\"}[5m])) by (pod)",
          "legendFormat": "{{pod}}"
        }]
      },
      {
        "id": 4,
        "title": "HTTP Request Rate",
        "type": "graph",
        "targets": [{
          "expr": "sum(rate(nginx_ingress_controller_requests[5m])) by (host)",
          "legendFormat": "{{host}}"
        }]
      }
    ]
  }
}
EOF

echo "βœ… Custom OpenEdX dashboard created"
echo "Import this dashboard in Grafana:"
echo "1. Go to Dashboards β†’ Import"
echo "2. Upload: ~/openedx-project/k8s/grafana-openedx-dashboard.json"

Step 10.6: View Prometheus Metrics

Steps to access Prometheus:

# Port-forward Prometheus UI
kubectl port-forward -n monitoring \
  svc/prometheus-kube-prometheus-prometheus \
  9090:9090 &

echo "Prometheus UI: http://localhost:9090"

Useful Prometheus Queries:

# Total pod count in openedx namespace
count(kube_pod_info{namespace="openedx"})

# CPU usage by pod
rate(container_cpu_usage_seconds_total{namespace="openedx"}[5m])

# Memory usage by pod
container_memory_usage_bytes{namespace="openedx"}

# Pod restart count
kube_pod_container_status_restarts_total{namespace="openedx"}

# HTTP requests per second
rate(nginx_ingress_controller_requests[5m])

Verification

# Check monitoring pods
kubectl get pods -n monitoring

# Should show:
# alertmanager-xxx
# prometheus-xxx
# grafana-xxx
# prometheus-kube-state-metrics-xxx
# prometheus-prometheus-node-exporter-xxx

# Check Grafana service
kubectl get svc -n monitoring

# Test metrics endpoint
kubectl top pods -n openedx

# Should show CPU and memory usage for each pod

Screenshot for Evidence

  • Grafana dashboard showing OpenEdX pod metrics

  • Prometheus targets page showing all targets "UP"

  • kubectl top pods -n openedx output


PART 11: HPA & Scaling

What This Does

Configures Horizontal Pod Autoscaling for automatic scaling based on CPU usage.

Why HPA?

  • Handles traffic spikes: Automatically adds pods during high load

  • Cost optimization: Scales down during low traffic

  • High availability: Multiple pods provide redundancy

  • Performance: Distributes load across pods

Step 11.1: Create HPA for LMS

What This Does: Auto-scales LMS pods from 2 to 5 based on 70% CPU threshold.

cat > ~/openedx-project/k8s/hpa-lms.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lms-hpa
  namespace: openedx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lms
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 30
      selectPolicy: Max
EOF

kubectl apply -f ~/openedx-project/k8s/hpa-lms.yaml

echo "βœ… LMS HPA configured"

Configuration explained:

  • minReplicas: 2: Always run at least 2 pods (high availability)

  • maxReplicas: 5: Scale up to maximum 5 pods

  • averageUtilization: 70: Trigger scaling at 70% CPU

  • scaleDown.stabilizationWindowSeconds: 300: Wait 5 min before scaling down (prevent flapping)

  • scaleUp.stabilizationWindowSeconds: 0: Scale up immediately

  • scaleUp.policies: Can double pods or add 2 pods at a time (whichever is more)

Step 11.2: Create HPA for CMS

What This Does: Auto-scales CMS pods from 1 to 3 (lower than LMS since less traffic).

cat > ~/openedx-project/k8s/hpa-cms.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cms-hpa
  namespace: openedx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cms
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
    scaleUp:
      stabilizationWindowSeconds: 0
EOF

kubectl apply -f ~/openedx-project/k8s/hpa-cms.yaml

echo "βœ… CMS HPA configured"

Step 11.3: Scale Down Single Pods

What This Does: Ensures HPA controls replica count (remove any manual scaling).

# Let HPA manage LMS replicas
kubectl scale deployment lms --replicas=2 -n openedx

# Let HPA manage CMS replicas  
kubectl scale deployment cms --replicas=1 -n openedx

echo "Waiting for HPA to take control (30 sec)..."
sleep 30

echo "βœ… Deployments scaled down, HPA in control"

Step 11.4: Test Auto-Scaling

What This Does: Generates load to trigger HPA scaling.

source ~/.openedx-config/settings.sh

echo "Testing auto-scaling with load..."

# Create load generator pod
kubectl run load-generator --rm -i --image=busybox -n openedx -- /bin/sh -c "
  while true; do
    wget -q -O- http://lms:8000 > /dev/null
  done
"

# In another terminal, watch HPA:
# kubectl get hpa -n openedx -w

# You should see:
# - CPU usage increase
# - HPA change from 2 to 3 to 4 pods as CPU crosses 70%
# - After stopping load, pods scale back down to 2

# Stop load generator: Ctrl+C

Verification

# Check HPA status
kubectl get hpa -n openedx

# Should show:
# NAME       REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS
# lms-hpa    Deployment/lms   45%/70%   2         5         2
# cms-hpa    Deployment/cms   30%/70%   1         3         1

# Check current pod count
kubectl get pods -n openedx | grep -E '(lms|cms)-[a-z0-9]+-' | wc -l

# Watch HPA in real-time
kubectl get hpa -n openedx -w

# Check HPA events
kubectl describe hpa lms-hpa -n openedx

Screenshot for Evidence

  • Output of kubectl get hpa -n openedx

  • Grafana dashboard during load test showing CPU spike

  • kubectl get pods -n openedx during scale-up showing multiple LMS pods


PART 12: DNS Configuration

What This Does

Configures Cloudflare DNS to point your domains to the Load Balancer.

Why Cloudflare?

  • Free plan works perfectly

  • DNS management is simple

  • Additional features: DDoS protection, SSL, caching

  • Fast DNS resolution: 99.99% uptime

Step 12.1: Add Domain to Cloudflare

Manual steps (do in browser):

  1. Go to: https://www.cloudflare.com/

  2. Sign up or log in

  3. Click: "Add a Site"

  4. Enter your domain: yourdomain.com

  5. Select plan: Free

  6. Click: "Continue"

  7. Cloudflare scans existing DNS records (if any)

  8. Click: "Continue"

  9. Cloudflare shows nameservers:

     ava.ns.cloudflare.comkal.ns.cloudflare.com
    
  10. Copy these nameservers

Step 12.2: Update Nameservers at Domain Registrar

Where your domain is registered (GoDaddy, Namecheap, etc.):

  1. Log in to your domain registrar

  2. Find "Manage DNS" or "Nameservers"

  3. Change from "Default" to "Custom"

  4. Enter Cloudflare nameservers:

     ava.ns.cloudflare.comkal.ns.cloudflare.com
    
  5. Save changes

  6. Wait 2-24 hours for DNS propagation (usually ~1 hour)

Step 12.3: Configure DNS Records in Cloudflare

In Cloudflare Dashboard β†’ DNS β†’ Records:

source ~/.openedx-config/settings.sh

echo ""
echo "════════════════════════════════════════════════════════════"
echo "           CLOUDFLARE DNS CONFIGURATION                      "
echo "════════════════════════════════════════════════════════════"
echo ""
echo "Add these DNS records in Cloudflare:"
echo ""
echo "1. LMS (Main Site)"
echo "   Type:    CNAME"
echo "   Name:    @"
echo "   Content: $LB_HOSTNAME"
echo "   Proxy:   DNS only (gray cloud)"
echo "   TTL:     Auto"
echo ""
echo "2. Studio (Course Authoring)"
echo "   Type:    CNAME"
echo "   Name:    studio"
echo "   Content: $LB_HOSTNAME"
echo "   Proxy:   DNS only (gray cloud)"
echo "   TTL:     Auto"
echo ""
echo "3. MFE (Login/Register)"
echo "   Type:    CNAME"
echo "   Name:    apps"
echo "   Content: $LB_HOSTNAME"
echo "   Proxy:   DNS only (gray cloud)"
echo "   TTL:     Auto"
echo ""
echo "4. CDN (Static Files)"
echo "   Type:    CNAME"
echo "   Name:    cdn"
echo "   Content: $CLOUDFRONT_URL"
echo "   Proxy:   DNS only (gray cloud)"
echo "   TTL:     Auto"
echo ""
echo "════════════════════════════════════════════════════════════"

Important: Use "DNS only" (gray cloud), NOT "Proxied" (orange cloud)

Why DNS only?

  • SSL termination happens at Nginx (not Cloudflare)

  • Prevents double SSL termination

  • Cloudflare proxy would interfere with cert-manager

Step 12.4: Configure Cloudflare SSL Settings

In Cloudflare Dashboard β†’ SSL/TLS:

  1. Set SSL/TLS encryption mode:

    • Go to: SSL/TLS β†’ Overview

    • Select: "Full (strict)"

    • This ensures end-to-end encryption

  2. Enable Always Use HTTPS:

    • Go to: SSL/TLS β†’ Edge Certificates

    • Toggle ON: "Always Use HTTPS"

    • This redirects HTTP to HTTPS

  3. Enable Automatic HTTPS Rewrites:

    • Toggle ON: "Automatic HTTPS Rewrites"

    • Fixes mixed content warnings

  4. Enable HTTP/2:

    • Toggle ON: "HTTP/2"

    • Faster page loads

  5. Enable HTTP/3 (QUIC):

    • Toggle ON: "HTTP/3 (with QUIC)"

    • Even faster, uses UDP

  6. Enable Brotli Compression:

    • Go to: Speed β†’ Optimization

    • Toggle ON: "Brotli"

    • Smaller file sizes

Step 12.5: Verify DNS Propagation

Wait 5-30 minutes, then test:

source ~/.openedx-config/settings.sh

echo "Testing DNS resolution..."

# Test main domain
nslookup $DOMAIN

# Should return Load Balancer IP addresses

# Test studio
nslookup $STUDIO_DOMAIN

# Should return Load Balancer IP addresses (same as above)

# Test apps
nslookup $MFE_DOMAIN

# Should return Load Balancer IP addresses (same as above)

# Test CDN
nslookup $CDN_DOMAIN

# Should return CloudFront IP addresses (different from above)

echo "βœ… DNS configured"

Verification

# Test HTTPS on all domains
curl -I https://$DOMAIN
# Should return: HTTP/2 200

curl -I https://$STUDIO_DOMAIN
# Should return: HTTP/2 200

curl -I https://$MFE_DOMAIN/authn/login
# Should return: HTTP/2 200

curl -I https://$CDN_DOMAIN
# Should return: HTTP/2 200 (from CloudFront)

# Check SSL certificate
echo | openssl s_client -connect $DOMAIN:443 -servername $DOMAIN 2>/dev/null | \
  openssl x509 -noout -dates

# Should show: Let's Encrypt certificate valid for 90 days

Screenshot for Evidence

  • Cloudflare DNS records page

  • Output of nslookup showing correct IPs

  • Browser showing green padlock on all domains


Verification & Testing

Complete System Check

Run this comprehensive verification:

source ~/.openedx-config/settings.sh

echo ""
echo "════════════════════════════════════════════════════════════"
echo "           OPENEDX PRODUCTION VERIFICATION                   "
echo "════════════════════════════════════════════════════════════"
echo ""

# 1. Kubernetes Cluster
echo "1. KUBERNETES CLUSTER"
kubectl get nodes
echo ""

# 2. OpenEdX Pods
echo "2. OPENEDX PODS"
kubectl get pods -n openedx
echo ""

# 3. External Databases
echo "3. EXTERNAL DATABASES"
echo "MySQL:     $MYSQL_HOST"
echo "MongoDB:   $MONGO_IP (t2.medium)"
echo "Redis:     $REDIS_HOST"
echo "OpenSearch: $OPENSEARCH_HOST"
echo ""

# 4. Ingress & Load Balancer
echo "4. INGRESS & LOAD BALANCER"
kubectl get ingress -n openedx
echo "Load Balancer: $LB_HOSTNAME"
echo ""

# 5. SSL Certificates
echo "5. SSL CERTIFICATES"
kubectl get certificate -n openedx
echo ""

# 6. HPA (Auto-scaling)
echo "6. HORIZONTAL POD AUTOSCALING"
kubectl get hpa -n openedx
echo ""

# 7. Storage
echo "7. STORAGE"
echo "S3 Bucket: $S3_BUCKET_NAME"
kubectl get storageclass
echo ""

# 8. CDN & Security
echo "8. CDN & SECURITY"
echo "CloudFront: $CLOUDFRONT_URL"
echo "WAF: Enabled (4 rules)"
echo ""

# 9. Monitoring
echo "9. MONITORING"
kubectl get pods -n monitoring | grep -E '(prometheus|grafana)'
echo "Grafana: http://$(kubectl get svc prometheus-grafana -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')"
echo ""

# 10. Endpoints
echo "10. PUBLIC ENDPOINTS"
echo "LMS:       https://$DOMAIN"
echo "Studio:    https://$STUDIO_DOMAIN"
echo "Login:     https://$MFE_DOMAIN/authn/login"
echo "Admin:     https://$DOMAIN/admin (username: admin)"
echo ""

echo "════════════════════════════════════════════════════════════"

Functional Testing

Test each component:

source ~/.openedx-config/settings.sh

# 1. Test LMS Homepage
echo "Testing LMS..."
curl -I https://$DOMAIN
# Should return: HTTP/2 200

# 2. Test Studio
echo "Testing Studio..."
curl -I https://$STUDIO_DOMAIN
# Should return: HTTP/2 302 (redirect to login)

# 3. Test MFE Login
echo "Testing MFE Login..."
curl -I https://$MFE_DOMAIN/authn/login
# Should return: HTTP/2 200

# 4. Test API
echo "Testing LMS API..."
curl -I https://$DOMAIN/api/user/v1/me
# Should return: HTTP/2 401 (correct - needs authentication)

# 5. Test Static Files via CDN
echo "Testing CDN..."
curl -I https://$CDN_DOMAIN
# Should return: HTTP/2 200 (from CloudFront)

echo "βœ… All endpoints responding correctly"

Browser Testing

Open in browser and verify:

  1. LMS Homepage: https://yourdomain.com

    • Should load OpenEdX homepage

    • Check SSL (green padlock)

    • Check Network tab: HTTP/2 protocol

  2. Login Page: https://apps.yourdomain.com/authn/login

    • Should load login form

    • Test login with admin credentials

    • Should redirect to dashboard

  3. Studio: https://studio.yourdomain.com

    • Should redirect to login

    • After login, should show Studio homepage

  4. Admin Panel: https://yourdomain.com/admin

    • Login with admin credentials

    • Should show Django admin interface

Performance Testing

Test auto-scaling:

# Generate load
kubectl run -i --tty load-generator --rm \
  --image=busybox \
  --restart=Never \
  -n openedx -- /bin/sh -c \
  "while sleep 0.01; do wget -q -O- http://lms:8000; done"

# In another terminal, watch scaling
kubectl get hpa -n openedx -w

# Should see:
# - CPU usage increase
# - REPLICAS increase from 2 to 3, 4, 5
# - After stopping load, scale back down to 2

Security Testing

Verify WAF is working:

# Test rate limiting (make >2000 requests in 5 minutes)
for i in {1..2100}; do
  curl -s https://$DOMAIN > /dev/null &
done
wait

# Check WAF metrics in AWS Console:
# WAF β†’ Web ACLs β†’ openedx-prod-waf β†’ Metrics
# Should see blocked requests

# Test SQL injection protection
curl "https://$DOMAIN/?id=1' OR '1'='1"
# Should be blocked by WAF (returns 403)

Screenshot Checklist

Take screenshots of:

  1. βœ… kubectl get nodes - 3 nodes Ready

  2. βœ… kubectl get pods -n openedx - all Running

  3. βœ… kubectl get hpa -n openedx - HPA configured

  4. βœ… kubectl get certificate -n openedx - SSL cert Ready

  5. βœ… AWS RDS Console - MySQL instance running

  6. βœ… AWS EC2 Console - MongoDB instance running

  7. βœ… AWS ElastiCache Console - Redis cluster

  8. βœ… AWS OpenSearch Console - domain active

  9. βœ… AWS CloudFront Console - distribution deployed

  10. βœ… AWS WAF Console - Web ACL with 4 rules

  11. βœ… Cloudflare DNS records

  12. βœ… Grafana dashboard showing metrics

  13. βœ… Browser showing OpenEdX homepage with SSL

  14. βœ… Browser showing Studio with SSL

  15. βœ… Browser showing MFE login with SSL


Backup Strategy

Automated Daily Backups

Create backup script:

cat > ~/openedx-project/scripts/backup-daily.sh <<'BACKUP'
#!/bin/bash
set -e
source ~/.openedx-config/settings.sh

DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR=~/openedx-backups/$DATE

mkdir -p $BACKUP_DIR

echo "Starting backup: $DATE"

# 1. MySQL Backup (RDS snapshot)
echo "Backing up MySQL..."
aws rds create-db-snapshot \
  --db-instance-identifier ${PROJECT_NAME}-mysql \
  --db-snapshot-identifier mysql-backup-$DATE \
  --region $AWS_REGION

# 2. Redis Backup (ElastiCache snapshot)
echo "Backing up Redis..."
aws elasticache create-snapshot \
  --cache-cluster-id ${PROJECT_NAME}-redis \
  --snapshot-name redis-backup-$DATE \
  --region $AWS_REGION

# 3. MongoDB Backup (EBS snapshot)
echo "Backing up MongoDB..."
MONGO_VOL=$(aws ec2 describe-instances \
  --instance-ids $MONGO_INSTANCE_ID \
  --query 'Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId' \
  --output text)

aws ec2 create-snapshot \
  --volume-id $MONGO_VOL \
  --description "MongoDB backup $DATE" \
  --region $AWS_REGION

# 4. OpenSearch Backup (manual snapshot)
echo "Backing up OpenSearch..."
curl -X PUT "https://$OPENSEARCH_HOST/_snapshot/backup-$DATE" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "s3",
    "settings": {
      "bucket": "'$S3_BUCKET_NAME'",
      "region": "'$AWS_REGION'",
      "base_path": "opensearch-backups/'$DATE'"
    }
  }'

# 5. Kubernetes Config Backup
echo "Backing up Kubernetes configs..."
kubectl get all -n openedx -o yaml > $BACKUP_DIR/k8s-resources.yaml
kubectl get configmap -n openedx -o yaml > $BACKUP_DIR/k8s-configmaps.yaml
kubectl get secret -n openedx -o yaml > $BACKUP_DIR/k8s-secrets.yaml
kubectl get pvc -n openedx -o yaml > $BACKUP_DIR/k8s-pvcs.yaml

# 6. Tutor Config Backup
echo "Backing up Tutor config..."
cp ~/.local/share/tutor/config.yml $BACKUP_DIR/tutor-config.yml
cp -r ~/.local/share/tutor/env $BACKUP_DIR/tutor-env

# 7. Project Files Backup
echo "Backing up project files..."
tar -czf $BACKUP_DIR/project-files.tar.gz ~/openedx-project/

echo "βœ… Backup complete: $BACKUP_DIR"
echo ""
echo "Backup contents:"
ls -lh $BACKUP_DIR/

BACKUP
chmod +x ~/openedx-project/scripts/backup-daily.sh

echo "βœ… Backup script created"

Schedule Automated Backups

Set up daily cron job:

# Add to crontab
(crontab -l 2>/dev/null; echo "0 2 * * * ~/openedx-project/scripts/backup-daily.sh >> ~/openedx-backups/backup.log 2>&1") | crontab -

echo "βœ… Daily backups scheduled for 2 AM"

Manual Backup

Run backup manually:

~/openedx-project/scripts/backup-daily.sh

Restore Procedure

Document how to restore from backup:

cat > ~/openedx-project/docs/RESTORE.md <<'RESTORE'
# OpenEdX Disaster Recovery

## Restore from Backup

### 1. Restore MySQL
```bash
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier openedx-prod-mysql-restored \
  --db-snapshot-identifier mysql-backup-YYYYMMDD-HHMMSS

2. Restore Redis

aws elasticache create-cache-cluster \
  --cache-cluster-id openedx-prod-redis-restored \
  --snapshot-name redis-backup-YYYYMMDD-HHMMSS

3. Restore MongoDB

# Create volume from snapshot
aws ec2 create-volume \
  --snapshot-id snap-xxx \
  --availability-zone us-east-1a

# Attach to new EC2 instance
# (See full MongoDB setup in main guide)

4. Restore Kubernetes Resources

kubectl apply -f ~/openedx-backups/YYYYMMDD-HHMMSS/k8s-resources.yaml
kubectl apply -f ~/openedx-backups/YYYYMMDD-HHMMSS/k8s-configmaps.yaml

5. Restore Tutor Config

cp ~/openedx-backups/YYYYMMDD-HHMMSS/tutor-config.yml \
   ~/.local/share/tutor/config.yml

RESTORE

echo "βœ… Restore documentation created"


---

## Troubleshooting Guide

### Common Issues and Solutions

#### 1. Pods Stuck in "Pending" State

**Symptom:**

kubectl get pods -n openedx NAME READY STATUS RESTARTS AGE lms-xxx 0/1 Pending 0 5m


**Cause:** Insufficient resources (CPU/memory)

**Solution:**
```bash
# Check events
kubectl describe pod lms-xxx -n openedx

# If "Insufficient memory":
# Delete old pods to free resources
kubectl delete pod -l app.kubernetes.io/name=lms-worker -n openedx

# Or scale up cluster
eksctl scale nodegroup \
  --cluster=openedx-prod \
  --name=openedx-workers \
  --nodes=4

2. Pods Crashing with "CrashLoopBackOff"

Symptom:

NAME                         READY   STATUS             RESTARTS   AGE
lms-xxx                      0/1     CrashLoopBackOff   5          10m

Solution:

# Check logs for error
kubectl logs lms-xxx -n openedx --tail=50

# Common errors:

# Error: "Table 'openedx.waffle_switch' doesn't exist"
# Solution: Run migrations (see Part 6, Step 6.5)

# Error: "OperationalError: (2003, \"Can't connect to MySQL\")"
# Solution: Check MySQL security group allows port 3306 from EKS
aws ec2 describe-security-groups --group-ids $DEFAULT_SG

# Error: "STORAGES is not defined"
# Solution: S3 plugin is enabled - disable it
tutor plugins disable s3
tutor k8s stop && tutor k8s start

3. SSL Certificate Not Issuing

Symptom:

kubectl get certificate -n openedx
NAME          READY   SECRET        AGE
openedx-tls   False   openedx-tls   10m

Solution:

# Check certificate status
kubectl describe certificate openedx-tls -n openedx

# Common issues:

# Issue: "Waiting for HTTP-01 challenge propagation"
# Solution: Check ingress is accessible
curl http://$DOMAIN/.well-known/acme-challenge/test

# Issue: "DNS problem: NXDOMAIN"
# Solution: DNS not propagated yet - wait 30 minutes

# Issue: "CAA record prevents issuance"
# Solution: Remove CAA record or add letsencrypt.org

4. Blank Page on apps.yourdomain.com

Symptom: Blank white/black page, no content

Causes & Solutions:

# Cause 1: HTTPS config mismatch
tutor config printvalue ENABLE_HTTPS
# Should be: true
# If false:
tutor config save --set ENABLE_HTTPS=true
kubectl rollout restart deployment mfe -n openedx

# Cause 2: Meilisearch still enabled
tutor config printvalue RUN_MEILISEARCH
# Should be: false
# If true:
tutor config save --set RUN_MEILISEARCH=false
tutor k8s stop && tutor k8s start

# Cause 3: Wrong URL
# MFE has no root page!
# Correct URLs:
https://apps.yourdomain.com/authn/login   βœ“
https://apps.yourdomain.com                βœ—

5. HPA Not Scaling

Symptom:

kubectl get hpa -n openedx
NAME       REFERENCE        TARGETS         MINPODS   MAXPODS   REPLICAS
lms-hpa    Deployment/lms   <unknown>/70%   2         5         2

Solution:

# Check metrics-server is installed
kubectl get deployment metrics-server -n kube-system

# If not found, install:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Wait 2 minutes, then check again
kubectl get hpa -n openedx

6. MongoDB Connection Failed

Symptom:

Error: "MongoNetworkError: failed to connect to server"

Solution:

# Check MongoDB instance is running
aws ec2 describe-instances \
  --instance-ids $MONGO_INSTANCE_ID \
  --query 'Reservations[0].Instances[0].State.Name'

# Check security group allows port 27017
aws ec2 describe-security-groups \
  --group-ids $MONGO_SG \
  --query 'SecurityGroups[0].IpPermissions[?ToPort==`27017`]'

# Check MongoDB is actually installed (view user-data logs)
aws ec2 get-console-output \
  --instance-id $MONGO_INSTANCE_ID \
  --output text | grep "MongoDB installation"

# If installation failed, terminate and recreate instance

7. Grafana Not Accessible

Symptom: Can't access Grafana dashboard

Solution:

# Check Grafana pod is running
kubectl get pods -n monitoring | grep grafana

# Get Grafana URL again
kubectl get svc prometheus-grafana -n monitoring

# If LoadBalancer pending:
kubectl describe svc prometheus-grafana -n monitoring
# Check events for errors

# Alternative: Port-forward
kubectl port-forward -n monitoring \
  svc/prometheus-grafana \
  3000:80 &
# Access: http://localhost:3000

8. Out of Memory Errors

Symptom:

OOMKilled

Solution:

# Check node memory usage
kubectl top nodes

# Scale up cluster
eksctl scale nodegroup \
  --cluster=openedx-prod \
  --name=openedx-workers \
  --nodes=4

# Or add resource limits
kubectl set resources deployment lms \
  -n openedx \
  --requests=cpu=500m,memory=1Gi \
  --limits=cpu=2,memory=2Gi

Deliverables Checklist

Required Deliverables for Al Nafi Submission

1. Documentation βœ“

  • [x] README.md (this file)

    • Architecture overview

    • Step-by-step deployment guide

    • Configuration decisions & rationale

    • Troubleshooting guide

  • [x] Architecture Diagram

    • Create using draw.io or similar

    • Show all components and connections

    • Include security layers

  • [x] Network Flow Diagram

    • Traffic flow from user to database

    • Show CDN, WAF, Load Balancer, Ingress, Pods

2. Configuration Artifacts βœ“

# Kubernetes manifests
~/openedx-project/k8s/
β”œβ”€β”€ storageclass-gp3.yaml
β”œβ”€β”€ ingress.yaml
β”œβ”€β”€ letsencrypt-issuer.yaml
β”œβ”€β”€ hpa-lms.yaml
β”œβ”€β”€ hpa-cms.yaml
└── grafana-openedx-dashboard.json

# Tutor configuration
~/.local/share/tutor/config.yml

# Persistent variables
~/.openedx-config/settings.sh

3. Automation Scripts βœ“

~/openedx-project/scripts/
β”œβ”€β”€ backup-daily.sh           # Automated backups
└── restore.sh                # Disaster recovery

4. Monitoring Configurations βœ“

  • [x] Prometheus + Grafana installed

  • [x] Custom OpenEdX dashboard created

  • [x] HPA configured with metrics

5. Proof of Implementation βœ“

Screenshots to include:

  1. βœ… EKS cluster with 3 nodes

  2. βœ… All OpenEdX pods running

  3. βœ… External databases (MySQL, MongoDB, Redis, OpenSearch)

  4. βœ… Nginx Ingress Controller

  5. βœ… SSL certificates issued

  6. βœ… HPA configured and working

  7. βœ… CloudFront distribution

  8. βœ… WAF with 4 rules

  9. βœ… Grafana dashboard

  10. βœ… OpenEdX homepage with SSL

  11. βœ… Studio with SSL

  12. βœ… Load test showing auto-scaling

  13. βœ… Database connectivity logs

  14. βœ… Cloudflare DNS configuration

Evaluation Criteria Compliance

How this guide meets Al Nafi requirements:

CriteriaWeightImplementationStatus
OpenEdX on EKS20%Tutor 21.0.1 on EKS 1.34, 3-node clusterβœ…
External Databases20%MySQL RDS, MongoDB EC2, Redis ElastiCache, OpenSearchβœ…
Nginx (not Caddy)15%Nginx Ingress 4.14.3, HTTP/2, TLS terminationβœ…
CloudFront + WAF15%CloudFront for S3, WAF with 4 rulesβœ…
Documentation15%Complete guide with architecture, rationale, troubleshootingβœ…
High Availability10%HPA, 3-node cluster, auto-scaling, health probesβœ…
Security5%TLS, WAF, encrypted storage, private databasesβœ…
TOTAL100%βœ… 100%

Cost Breakdown

Monthly Costs (Approximate)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Component                   β”‚ Monthly Cost β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ EKS Control Plane           β”‚      $73     β”‚
β”‚ 3Γ— t3.medium EC2 (workers)  β”‚      $75     β”‚
β”‚ MySQL RDS (db.t3.medium)    β”‚      $40     β”‚
β”‚ MongoDB EC2 (t2.medium)     β”‚      $35     β”‚
β”‚ Redis ElastiCache (t3.micro)β”‚      $12     β”‚
β”‚ OpenSearch (t3.small)       β”‚      $20     β”‚
β”‚ S3 Storage                  β”‚       $5     β”‚
β”‚ CloudFront + WAF            β”‚      $10     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TOTAL                       β”‚     $270     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Notes:
- Costs based on us-east-1 pricing
- Does not include data transfer (minimal for assessment)
- CloudFront cost assumes <10GB/month
- RDS cost assumes 20GB gp3 storage

Cost Optimization Tips

  1. Use Reserved Instances (not for assessment, but for production)

    • Save 30-60% on EC2 and RDS

    • Requires 1-3 year commitment

  2. Stop non-production resources

    • MongoDB EC2 can be stopped when not in use

    • RDS snapshots instead of running instance

  3. Right-size instances

    • Monitor usage with Grafana

    • Scale down if over-provisioned

  4. Use S3 Lifecycle Policies

    • Move old static files to Glacier

    • Delete old CloudFront logs


Submission Instructions

Final Steps Before Submission

  1. Test everything one final time:

     source ~/.openedx-config/settings.sh
     ./openedx-project/scripts/verify-deployment.sh
    
  2. Take all required screenshots

  3. Create architecture diagrams:

    • System architecture

    • Network flow diagram

    • Security architecture

  4. Organize files:

     openedx-eks-submission/
     β”œβ”€β”€ README.md                    (this guide)
     β”œβ”€β”€ ARCHITECTURE.md              (architecture decisions)
     β”œβ”€β”€ diagrams/
     β”‚   β”œβ”€β”€ system-architecture.png
     β”‚   β”œβ”€β”€ network-flow.png
     β”‚   └── security-architecture.png
     β”œβ”€β”€ screenshots/
     β”‚   β”œβ”€β”€ 01-eks-cluster.png
     β”‚   β”œβ”€β”€ 02-openedx-pods.png
     β”‚   β”œβ”€β”€ 03-databases.png
     β”‚   β”œβ”€β”€ ...
     β”œβ”€β”€ k8s/
     β”‚   β”œβ”€β”€ ingress.yaml
     β”‚   β”œβ”€β”€ hpa-lms.yaml
     β”‚   β”œβ”€β”€ ...
     β”œβ”€β”€ scripts/
     β”‚   β”œβ”€β”€ backup-daily.sh
     β”‚   β”œβ”€β”€ restore.sh
     └── configs/
         β”œβ”€β”€ tutor-config.yml
         └── settings.sh
    
  5. Create GitHub repository:

     cd ~/openedx-project
     git init
     git add .
     git commit -m "OpenEdX EKS Production Deployment"
     git remote add origin [your-repo-url]
     git push -u origin main
    
  6. Write final README summary in repository

Email Submission

To: hamza.mughal@alnafi.com, mohammad@alnafi.com
Subject: OpenEdX K8s Assessment – AWS EKS
Body:

Dear Al Nafi Hiring Team,

I am submitting my OpenEdX on AWS EKS deployment for technical assessment.

Project Details:
- Platform: AWS EKS 1.34
- OpenEdX: Tutor 21.0.1
- Domain: [your-domain.com]
- Repository: [GitHub URL]

Live Demo:
- LMS: https://[your-domain.com]
- Studio: https://[studio.your-domain.com]
- Admin: admin / [password in repo]

Key Highlights:
βœ… Production-grade Kubernetes deployment
βœ… All databases external (MySQL RDS, MongoDB EC2, Redis, OpenSearch)
βœ… Nginx Ingress with HTTP/2 and Let's Encrypt SSL
βœ… CloudFront CDN + AWS WAF with 4-layer protection
βœ… Horizontal Pod Autoscaling (demonstrated in screenshots)
βœ… Prometheus + Grafana monitoring
βœ… Complete documentation and automation scripts

Repository Structure:
- README.md: Complete deployment guide
- diagrams/: System and network architecture
- screenshots/: All required evidence
- k8s/: Kubernetes manifests
- scripts/: Backup and automation

The deployment is fully functional and can be verified at the URLs above.

Thank you for your consideration.

Best regards,
[Your Name]
[Your Email]
[Your Phone]

Repository README Template

# OpenEdX Production Deployment on AWS EKS

## Live Demo
- **LMS:** https://your-domain.com
- **Studio:** https://studio.your-domain.com
- **Admin:** `admin` / [see CREDENTIALS.md]

## Architecture
[Include system architecture diagram]

## Tech Stack
- **Kubernetes:** AWS EKS 1.34
- **OpenEdX:** Tutor 21.0.1
- **Databases:** MySQL RDS 8.0.45, MongoDB 8.0 (EC2), Redis 7.1, OpenSearch 2.11
- **Ingress:** Nginx 4.14.3 with HTTP/2
- **SSL:** cert-manager + Let's Encrypt
- **CDN:** CloudFront + S3
- **Security:** AWS WAF (4 rules)
- **Monitoring:** Prometheus + Grafana

## Deployment
See [DEPLOYMENT.md](DEPLOYMENT.md) for complete step-by-step guide.

## Evidence
- [Screenshots](screenshots/)
- [Architecture Diagrams](diagrams/)
- [Configuration Files](configs/)

## Contact
[Your contact information]

Conclusion

You now have a complete, production-ready OpenEdX deployment on AWS EKS that meets all Al Nafi requirements:

βœ… Core Platform: EKS 1.34 with 3-node cluster
βœ… OpenEdX: Tutor 21.0.1 with all components
βœ… External Databases: MySQL, MongoDB, Redis, OpenSearch
βœ… Nginx Ingress: HTTP/2 with Let's Encrypt SSL
βœ… CloudFront + WAF: CDN and 4-layer security
βœ… Auto-scaling: HPA for LMS and CMS
βœ… Monitoring: Prometheus + Grafana
βœ… Documentation: Complete guide with troubleshooting

What Makes This Guide Different

  1. Battle-tested: Based on real deployment experience

  2. Zero-debugging: Fixed all common issues upfront

  3. Production-ready: Not a prototype - actual production architecture

  4. Fully explained: Every command has "what" and "why"

  5. Copy-paste ready: All commands work as-is

  6. Complete: Nothing left out - from AWS account to SSL

Key Lessons Learned

  • Variable persistence is critical

  • S3 plugin breaks Tutor 21.0.1 - must disable

  • Meilisearch causes blank pages - must disable

  • MySQL needs both app and root credentials

  • Migrations must be run manually from worker pods

  • cert-manager is better than manual SSL

  • gp3 is same price but faster than gp2

Next Steps

  1. Deploy using this guide

  2. Take all screenshots

  3. Create diagrams

  4. Organize repository

  5. Submit to Al Nafi

Good luck with your submission! πŸš€


Credits & References

Created by: Battle-tested through real deployment
Date: February 2026
For: Al Nafi International College Assessment

References:

  • Tutor Documentation: https://docs.tutor.edly.io/

  • AWS EKS: https://docs.aws.amazon.com/eks/

  • Kubernetes: https://kubernetes.io/docs/

  • Let's Encrypt: https://letsencrypt.org/docs/

  • Prometheus: https://prometheus.io/docs/

Support:

  • Tutor Community: https://discuss.openedx.org/

  • Kubernetes Slack: https://kubernetes.slack.com/


END OF GUIDE