OpenEdX on AWS EKS - Complete Production Deployment Guide

Al Nafi Assessment | Battle-Tested | Zero-Debugging
Status: Production-Ready β
Platform: AWS EKS (Kubernetes 1.34)
OpenEdX: Tutor 21.0.1 (Latest)
Domain: Your-domain.com (replace throughout)
Deployment Time: 4-7 hours (with debugging)
Monthly Cost: ~$270
π Table of Contents
What You'll Build
A production-grade OpenEdX Learning Management System with:
β Core Platform
AWS EKS 1.34 (latest Kubernetes)
OpenEdX Tutor 21.0.1 (latest stable)
3-node cluster (t3.medium) with auto-scaling
β External Databases (All outside Kubernetes)
MySQL 8.0.45 (RDS) - Application data
MongoDB 8.0 (EC2 t2.medium) - Course content
Redis 7.1 (ElastiCache) - Caching
OpenSearch 2.11 - Search & analytics
β Web & Security
Nginx Ingress (replaces Caddy) with HTTP/2
Let's Encrypt SSL/TLS (cert-manager)
AWS CloudFront CDN for static files
AWS WAF with DDoS protection
β Operations
Horizontal Pod Autoscaling (HPA)
Prometheus + Grafana monitoring
Centralized logging
Automated backups
Health probes on all services
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SECURITY LAYER β
β Cloudflare DNS β AWS WAF (us-east-1) β CloudFront (S3) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INGRESS LAYER β
β AWS NLB β Nginx Ingress Controller (HTTP/2, TLS termination) β
β cert-manager (Let's Encrypt SSL) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION LAYER (EKS) β
β Namespace: openedx β
β ββββββββββββ¬βββββββββββ¬βββββββββββββ¬βββββββββββ β
β β LMS β CMS β Workers β MFE β β
β β (2-5) β (1-3) β (1 each) β (1) β β
β β HPA β HPA β β β β
β ββββββββββββ΄βββββββββββ΄βββββββββββββ΄βββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA LAYER (External) β
β ββββββββββββββ¬ββββββββββββββ¬ββββββββββββ¬βββββββββββββββ β
β β MySQL RDS β MongoDB EC2 β Redis β OpenSearch β β
β β 8.0.45 β 8.0 β 7.1 β 2.11 β β
β β db.t3.med β t2.medium β t3.micro β t3.small β β
β ββββββββββββββ΄ββββββββββββββ΄ββββββββββββ΄βββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STORAGE LAYER β
β S3 Bucket (Static Files) | EBS gp3 Volumes (PV/PVC) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Traffic Flow:
User Request
β
Cloudflare DNS (resolves domain)
β
AWS WAF (security checks)
β
CloudFront (serves static files from S3)
β
AWS Network Load Balancer
β
Nginx Ingress Controller (TLS termination, HTTP/2)
β
OpenEdX Pods (LMS/CMS/MFE based on hostname)
β
External Databases (MySQL/MongoDB/Redis/OpenSearch)
Why These Choices?
External Databases (NOT in Kubernetes)
Why: Databases need persistence, backups, and managed services provide:
Automated backups and point-in-time recovery
Managed updates and patching
Better performance isolation
Easier scaling
No risk of data loss if pods crash
MongoDB on EC2 (not Atlas)
Why:
Single EC2 instance simpler than Atlas setup
Full control over configuration
No external dependencies
Cost-effective for learning platform
Easy to backup (EBS snapshots)
Nginx over Caddy
Why:
Industry standard with extensive documentation
Better performance for high traffic
More control over SSL/TLS configuration
HTTP/2 support out of the box
Requirement from Al Nafi JD
cert-manager for SSL
Why:
Automated Let's Encrypt certificate management
Auto-renewal before expiry
Industry standard for Kubernetes SSL
Free SSL certificates
gp3 over gp2 Storage
Why:
Same or lower cost
3000 baseline IOPS (vs gp2's 3 IOPS/GB)
Better performance for databases
125 MiB/s baseline throughput
Tutor 21.0.1 (Latest)
Why:
Latest features and security patches
Better MFE (Micro Frontend) support
Improved performance
Active community support
Prerequisites
AWS Account
Admin access or PowerUser + IAM permissions
Credit card for AWS services (~$270/month)
Service limits:
3 t3.medium EC2 instances (EKS nodes)
1 db.t3.medium RDS instance
1 t2.medium EC2 instance (MongoDB)
Domain Name
Any domain registrar (Namecheap, GoDaddy, etc.)
Will configure with Cloudflare (free account)
Example:
yourdomain.com
Local Machine
Ubuntu 22.04 (or similar Linux)
4GB RAM minimum
20GB free disk space
Stable internet connection
Skills Needed
Basic Linux command line
Basic understanding of Kubernetes concepts
AWS console navigation
Copy-paste ability (most important!)
Time
Setup: 30 minutes
Deployment: 2-3 hours
Configuration: 30 minutes
Total: 3-4 hours (with breaks)
PART 0: Environment Setup
What This Does
Creates a persistent configuration file that survives terminal restarts and contains all your deployment variables. This was the #1 issue we solved - without this, you lose all variables when terminal closes!
Step 0.1: Create Persistent Config File
Run on your Ubuntu machine:
# Create config directory
mkdir -p ~/.openedx-config
chmod 700 ~/.openedx-config
# Create the config file with all variables
cat > ~/.openedx-config/settings.sh <<'EOF'
#!/bin/bash
# AWS Configuration
export AWS_REGION="us-east-1"
export AWS_ACCOUNT_ID=""
export PROJECT_NAME="openedx-prod"
# Domain Configuration (CHANGE THESE!)
export DOMAIN="yourdomain.com"
export STUDIO_DOMAIN="studio.yourdomain.com"
export MFE_DOMAIN="apps.yourdomain.com"
export CDN_DOMAIN="cdn.yourdomain.com"
# Admin Email (CHANGE THIS!)
export ADMIN_EMAIL="your-email@example.com"
# Auto-generated Passwords (will be filled during deployment)
export MYSQL_PASSWORD=""
export MONGO_PASSWORD=""
# Infrastructure IDs (will be filled during deployment)
export VPC_ID=""
export EKS_CLUSTER_NAME="openedx-prod"
export MYSQL_HOST=""
export MONGO_HOST=""
export MONGO_IP=""
export MONGO_INSTANCE_ID=""
export REDIS_HOST=""
export OPENSEARCH_HOST=""
export S3_BUCKET_NAME=""
export CLOUDFRONT_URL=""
export CLOUDFRONT_ID=""
export WAF_ARN=""
export LB_HOSTNAME=""
EOF
# Make it secure (only you can read/write)
chmod 600 ~/.openedx-config/settings.sh
# Add auto-load to your shell
echo 'source ~/.openedx-config/settings.sh 2>/dev/null' >> ~/.bashrc
# Load it now
source ~/.openedx-config/settings.sh
echo "β
Persistent config created at ~/.openedx-config/settings.sh"
echo "β οΈ IMPORTANT: Edit this file and change DOMAIN and ADMIN_EMAIL!"
Before proceeding, edit the config file:
nano ~/.openedx-config/settings.sh
Change these lines:
export DOMAIN="yourdomain.com" # Your actual domain
export STUDIO_DOMAIN="studio.yourdomain.com"
export MFE_DOMAIN="apps.yourdomain.com"
export CDN_DOMAIN="cdn.yourdomain.com"
export ADMIN_EMAIL="your-email@example.com" # Your email
Save and exit (Ctrl+X, Y, Enter).
Why this matters: Every variable is stored here. If your terminal crashes or you logout, just run source ~/.openedx-config/settings.sh and everything is back!
Step 0.2: Install Required Tools
What This Does: Installs all the command-line tools we'll need: AWS CLI, kubectl, eksctl, Helm, and Tutor.
#!/bin/bash
set -e
echo "Installing AWS CLI..."
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
sudo ./aws/install --update
rm -rf aws awscliv2.zip
echo "Installing kubectl..."
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install kubectl /usr/local/bin/
rm kubectl
echo "Installing eksctl..."
curl -sLO "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz"
tar -xzf eksctl_*.tar.gz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
rm eksctl_*.tar.gz
echo "Installing Helm..."
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
echo "Installing Tutor 21.0.1..."
sudo apt update && sudo apt install -y python3-pip python3-venv
python3 -m pip install --user --upgrade pip
python3 -m pip install --user "tutor[full]==21.0.1"
# Add Tutor to PATH
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
export PATH="$HOME/.local/bin:$PATH"
# Enable Kubernetes plugin
tutor plugins enable k8s
echo "β
All tools installed successfully!"
echo ""
echo "Verify installations:"
aws --version
kubectl version --client
eksctl version
helm version
tutor --version
Verify output shows:
AWS CLI:
aws-cli/2.x.xkubectl:
v1.29+eksctl:
0.x.xHelm:
v3.x.xTutor:
21.0.1
Step 0.3: Configure AWS Credentials
What This Does: Connects your terminal to your AWS account.
# Configure AWS CLI
aws configure
# You'll be prompted for:
# AWS Access Key ID: (paste from AWS Console β IAM β Security Credentials)
# AWS Secret Access Key: (paste from AWS Console)
# Default region: us-east-1
# Default output format: json
# Test connection
aws sts get-caller-identity
# Should show your AWS Account ID and user ARN
Save your Account ID to config:
source ~/.openedx-config/settings.sh
sed -i "s/export AWS_ACCOUNT_ID=\"\"/export AWS_ACCOUNT_ID=\"$(aws sts get-caller-identity --query Account --output text)\"/" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "AWS Account ID: $AWS_ACCOUNT_ID"
Step 0.4: Create Project Structure
What This Does: Organizes all our files in a clean structure.
mkdir -p ~/openedx-project/{k8s,scripts,docs,evidence}
cd ~/openedx-project
echo "β
Project structure created at ~/openedx-project/"
tree ~/openedx-project/
PART 1: EKS Cluster
What This Does
Creates a managed Kubernetes cluster on AWS with 3 worker nodes. This is where all OpenEdX pods will run. Uses EKS 1.34 (latest version as of Feb 2026).
Why 3 nodes?
High availability (if one node fails, others continue)
Resource distribution for LMS, CMS, and workers
Allows HPA (Horizontal Pod Autoscaling) to work properly
Step 1.1: Create EKS Cluster
This takes 15-20 minutes. AWS is creating VPC, subnets, security groups, and Kubernetes control plane.
source ~/.openedx-config/settings.sh
echo "Creating EKS 1.34 cluster (15-20 min)..."
echo "Cluster name: $EKS_CLUSTER_NAME"
echo "Region: $AWS_REGION"
eksctl create cluster \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--version 1.34 \
--nodegroup-name openedx-workers \
--node-type t3.medium \
--nodes 3 \
--nodes-min 2 \
--nodes-max 5 \
--managed \
--with-oidc
echo "β
EKS cluster created!"
What each flag does:
--version 1.34: Latest Kubernetes (released late 2025)--node-type t3.medium: 2 vCPU, 4GB RAM per node (right size for OpenEdX)--nodes 3: Start with 3 nodes--nodes-min 2: Auto-scaling minimum--nodes-max 5: Auto-scaling maximum--managed: AWS manages OS updates and patching--with-oidc: Enables IAM roles for service accounts (needed for S3 access)
Step 1.2: Save VPC Information
What This Does: Gets the VPC ID created by EKS and saves it for database configuration.
source ~/.openedx-config/settings.sh
# Get VPC ID
VPC_ID=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.vpcId" \
--output text)
# Save to config
sed -i "s|export VPC_ID=\"\"|export VPC_ID=\"$VPC_ID\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
VPC ID: $VPC_ID"
Step 1.3: Create OpenEdX Namespace
What This Does: Creates isolated namespace for all OpenEdX components.
kubectl create namespace openedx
echo "β
Namespace created"
kubectl get namespaces
Verification
# Check cluster is ready
kubectl get nodes
# Should show 3 nodes in "Ready" status:
# NAME STATUS ROLES AGE VERSION
# ip-xxx.ec2.internal Ready <none> 5m v1.34.x
# ip-yyy.ec2.internal Ready <none> 5m v1.34.x
# ip-zzz.ec2.internal Ready <none> 5m v1.34.x
# Check namespace
kubectl get ns openedx
# Should show: openedx Active 1m
Screenshot for evidence: Take screenshot of kubectl get nodes output.
PART 2: MySQL Database (RDS)
What This Does
Creates a managed MySQL database for OpenEdX application data (users, courses, enrollments, grades). Uses RDS (managed service) for automatic backups, patching, and high availability.
Why RDS?
Automated backups: Daily snapshots + 1-day retention
Managed updates: AWS handles security patches
Better performance: Dedicated instance, not competing with pods
Disaster recovery: Easy point-in-time restore
Critical Lessons Learned
Must create DB subnet group first (or you get "InvalidSubnet" error)
MySQL needs TWO sets of credentials:
adminuser (for migrations and admin tasks)openedxuser (for application)
Tutor requires root credentials: Set
MYSQL_ROOT_USERNAMEandMYSQL_ROOT_PASSWORD
Step 2.1: Generate MySQL Password
What This Does: Creates a strong random password for MySQL.
source ~/.openedx-config/settings.sh
# Generate 24-character password (letters and numbers only)
MYSQL_PASSWORD=$(openssl rand -base64 24 | tr -dc 'a-zA-Z0-9' | head -c 24)
# Save to config
sed -i "s|export MYSQL_PASSWORD=\"\"|export MYSQL_PASSWORD=\"$MYSQL_PASSWORD\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
MySQL password generated and saved"
echo "Password: $MYSQL_PASSWORD"
echo "β οΈ Save this password securely!"
Step 2.2: Configure Security Groups
What This Does: Allows EKS pods to connect to MySQL on port 3306.
source ~/.openedx-config/settings.sh
# Get security groups (recalculate - don't trust memory!)
DEFAULT_SG=$(aws ec2 describe-security-groups \
--filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
--region $AWS_REGION \
--query 'SecurityGroups[0].GroupId' --output text)
EKS_SG=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
--output text)
echo "Security Groups:"
echo " Default SG: $DEFAULT_SG"
echo " EKS SG: $EKS_SG"
# Allow MySQL traffic from EKS to Default SG
aws ec2 authorize-security-group-ingress \
--group-id $DEFAULT_SG \
--protocol tcp \
--port 3306 \
--source-group $EKS_SG \
--region $AWS_REGION 2>/dev/null || echo "Rule already exists"
echo "β
MySQL port 3306 opened for EKS"
Step 2.3: Create DB Subnet Group (CRITICAL!)
What This Does: Tells RDS which subnets it can use. Without this, you get "InvalidSubnet" error!
Why: EKS creates VPC without default subnets. RDS needs explicit subnet group.
source ~/.openedx-config/settings.sh
# Get private subnets (recalculate each time!)
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
--region $AWS_REGION \
--query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
--output text | tr '\t' ' ')
echo "Private subnets: $PRIVATE_SUBNETS"
# Error check
if [ -z "$PRIVATE_SUBNETS" ]; then
echo "β ERROR: No private subnets found!"
exit 1
fi
# Create DB subnet group
echo "Creating DB subnet group..."
aws rds create-db-subnet-group \
--db-subnet-group-name ${PROJECT_NAME}-db-subnet \
--db-subnet-group-description "OpenEdX database subnet group" \
--subnet-ids $PRIVATE_SUBNETS \
--region $AWS_REGION 2>/dev/null || echo "Subnet group already exists"
echo "β
DB subnet group created"
Step 2.4: Create MySQL RDS Instance
What This Does: Creates MySQL 8.0.45 database with gp3 storage (faster than gp2).
This takes 10-15 minutes.
source ~/.openedx-config/settings.sh
echo "Creating MySQL RDS 8.0.45 (10-15 min)..."
aws rds create-db-instance \
--db-instance-identifier ${PROJECT_NAME}-mysql \
--db-instance-class db.t3.medium \
--engine mysql \
--engine-version 8.0.45 \
--master-username admin \
--master-user-password "$MYSQL_PASSWORD" \
--allocated-storage 20 \
--storage-type gp3 \
--iops 3000 \
--db-subnet-group-name ${PROJECT_NAME}-db-subnet \
--vpc-security-group-ids $DEFAULT_SG \
--no-publicly-accessible \
--backup-retention-period 1 \
--region $AWS_REGION
echo "Waiting for MySQL to become available..."
aws rds wait db-instance-available \
--db-instance-identifier ${PROJECT_NAME}-mysql \
--region $AWS_REGION
echo "β
MySQL RDS created!"
What each flag does:
--db-instance-class db.t3.medium: 2 vCPU, 4GB RAM (right size for OpenEdX)--engine-version 8.0.45: Latest MySQL 8.0 minor version--storage-type gp3: Faster than gp2 (3000 baseline IOPS)--no-publicly-accessible: Security - only accessible from VPC--backup-retention-period 1: Keep 1 day of automated backups
Step 2.5: Get MySQL Endpoint
What This Does: Gets the connection hostname for MySQL.
source ~/.openedx-config/settings.sh
MYSQL_HOST=$(aws rds describe-db-instances \
--db-instance-identifier ${PROJECT_NAME}-mysql \
--region $AWS_REGION \
--query 'DBInstances[0].Endpoint.Address' \
--output text)
# Save to config
sed -i "s|export MYSQL_HOST=\"\"|export MYSQL_HOST=\"$MYSQL_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
MySQL Endpoint: $MYSQL_HOST"
Step 2.6: Create OpenEdX Database and User
What This Does:
Creates
openedxdatabase with UTF8 encodingCreates
openedxuser with FULL permissions (needed for migrations)
Why UTF8MB4: Supports emoji and international characters in course content.
source ~/.openedx-config/settings.sh
echo "Creating OpenEdX database and user..."
kubectl run mysql-setup --rm -i --image=mysql:8.0 -n openedx -- \
mysql -h $MYSQL_HOST -u admin -p"$MYSQL_PASSWORD" <<EOSQL
-- Create database with proper encoding
CREATE DATABASE IF NOT EXISTS openedx
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
-- Create openedx user
CREATE USER IF NOT EXISTS 'openedx'@'%'
IDENTIFIED BY '$MYSQL_PASSWORD';
-- Grant FULL permissions (migrations need this!)
GRANT ALL PRIVILEGES ON openedx.*
TO 'openedx'@'%'
WITH GRANT OPTION;
-- Apply changes
FLUSH PRIVILEGES;
-- Verify
SELECT User, Host FROM mysql.user WHERE User='openedx';
SHOW DATABASES;
EOSQL
echo "β
Database and user created with full CRUD permissions"
What permissions are granted:
SELECT, INSERT, UPDATE, DELETE (basic CRUD)
CREATE, DROP, ALTER, INDEX (schema changes for migrations)
CREATE VIEW, SHOW VIEW (for analytics)
CREATE ROUTINE, ALTER ROUTINE (for stored procedures)
LOCK TABLES, CREATE TEMPORARY TABLES (for bulk operations)
WITH GRANT OPTION (allows Tutor to manage permissions)
Verification
source ~/.openedx-config/settings.sh
# Test connection
kubectl run mysql-test --rm -i --image=mysql:8.0 -n openedx -- \
mysql -h $MYSQL_HOST -u openedx -p"$MYSQL_PASSWORD" -e "SHOW DATABASES;"
# Should show: openedx database
Screenshot for evidence:
RDS Console showing running instance
Output of
SHOW DATABASES;
PART 3: MongoDB (EC2)
What This Does
Creates a single MongoDB 8.0 instance on EC2 for storing course content, modulestore data, and user-generated content.
Why EC2 instead of Atlas?
Simpler setup: No external service signup
Full control: Configure as needed
Cost-effective: t2.medium is ~$35/month
Easy backup: EBS snapshots
No complexity: Single instance (no replica set needed for assessment)
Architecture Decision
Single Instance vs Replica Set:
Production would use 3-node replica set for high availability
For this assessment, single instance is acceptable
Can be upgraded to replica set later without data loss
Step 3.1: Generate MongoDB Password
source ~/.openedx-config/settings.sh
# Generate 24-character password
MONGO_PASSWORD=$(openssl rand -base64 24 | tr -dc 'a-zA-Z0-9' | head -c 24)
# Save to config
sed -i "s|export MONGO_PASSWORD=\"\"|export MONGO_PASSWORD=\"$MONGO_PASSWORD\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
MongoDB password generated"
echo "Password: $MONGO_PASSWORD"
Step 3.2: Get Ubuntu AMI
What This Does: Finds the latest Ubuntu 22.04 image in your region.
source ~/.openedx-config/settings.sh
AMI_ID=$(aws ec2 describe-images \
--owners amazon \
--filters \
"Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
"Name=state,Values=available" \
--region $AWS_REGION \
--query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \
--output text)
echo "Ubuntu AMI: $AMI_ID"
Step 3.3: Create MongoDB Security Group
What This Does: Creates firewall rules for MongoDB (port 27017).
source ~/.openedx-config/settings.sh
# Recalculate security groups
DEFAULT_SG=$(aws ec2 describe-security-groups \
--filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
--region $AWS_REGION \
--query 'SecurityGroups[0].GroupId' --output text)
EKS_SG=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
--output text)
# Create MongoDB security group
aws ec2 create-security-group \
--group-name ${PROJECT_NAME}-mongo-sg \
--description "MongoDB for OpenEdX" \
--vpc-id $VPC_ID \
--region $AWS_REGION 2>/dev/null || echo "Security group exists"
MONGO_SG=$(aws ec2 describe-security-groups \
--filters \
"Name=group-name,Values=${PROJECT_NAME}-mongo-sg" \
"Name=vpc-id,Values=$VPC_ID" \
--region $AWS_REGION \
--query 'SecurityGroups[0].GroupId' \
--output text)
echo "MongoDB SG: $MONGO_SG"
# Allow MongoDB port 27017 from EKS
aws ec2 authorize-security-group-ingress \
--group-id $MONGO_SG \
--protocol tcp \
--port 27017 \
--source-group $EKS_SG \
--region $AWS_REGION 2>/dev/null || echo "Rule already exists"
echo "β
MongoDB security group configured"
Step 3.4: Create User Data Script
What This Does: Creates a script that automatically installs and configures MongoDB when EC2 starts.
This is CRITICAL - the script runs on first boot and sets up everything!
source ~/.openedx-config/settings.sh
# Recalculate private subnets
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
--region $AWS_REGION \
--query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
--output text | tr '\t' ' ')
# Error check
if [ -z "$PRIVATE_SUBNETS" ]; then
echo "β ERROR: No private subnets found!"
exit 1
fi
# Get first subnet
MONGO_SUBNET=$(echo $PRIVATE_SUBNETS | awk '{print $1}')
echo "Using subnet: $MONGO_SUBNET"
# Create user data script (runs on first boot)
USER_DATA=$(cat <<'USERDATA'
#!/bin/bash
set -e
exec > >(tee /var/log/user-data.log)
exec 2>&1
echo "=== Starting MongoDB 8.0 Installation ==="
date
# Install MongoDB 8.0 official repository
echo "Installing MongoDB repository..."
apt-get update
apt-get install -y gnupg curl
curl -fsSL https://www.mongodb.org/static/pgp/server-8.0.asc | \
gpg --dearmor -o /usr/share/keyrings/mongodb-server-8.0.gpg
echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-8.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/8.0 multiverse" | \
tee /etc/apt/sources.list.d/mongodb-org-8.0.list
# Install MongoDB
echo "Installing MongoDB 8.0..."
apt-get update
apt-get install -y mongodb-org
# Configure MongoDB to listen on all interfaces
echo "Configuring MongoDB..."
cat > /etc/mongod.conf <<'MONGOCONF'
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
net:
port: 27017
bindIp: 0.0.0.0
processManagement:
timeZoneInfo: /usr/share/zoneinfo
MONGOCONF
# Start MongoDB
echo "Starting MongoDB..."
systemctl start mongod
systemctl enable mongod
# Wait for MongoDB to be ready
echo "Waiting for MongoDB to start..."
sleep 10
# Create admin user
echo "Creating admin user..."
mongosh <<'MONGOJS'
use admin
db.createUser({
user: "admin",
pwd: "REPLACE_PASSWORD",
roles: [
{ role: "root", db: "admin" },
{ role: "userAdminAnyDatabase", db: "admin" },
{ role: "dbAdminAnyDatabase", db: "admin" },
{ role: "readWriteAnyDatabase", db: "admin" }
]
})
MONGOJS
# Enable authentication
echo "Enabling authentication..."
cat >> /etc/mongod.conf <<'AUTHCONF'
security:
authorization: enabled
AUTHCONF
# Restart MongoDB with authentication
echo "Restarting MongoDB with authentication..."
systemctl restart mongod
# Wait for restart
sleep 5
# Verify
echo "Verifying MongoDB is running..."
systemctl status mongod --no-pager
echo "=== MongoDB Installation Complete ==="
date
USERDATA
)
# Replace password in user data
USER_DATA="${USER_DATA//REPLACE_PASSWORD/$MONGO_PASSWORD}"
echo "β
User data script created"
What the script does:
Installs MongoDB 8.0 from official repository
Configures MongoDB to listen on all interfaces (0.0.0.0)
Starts MongoDB and enables auto-start on boot
Creates admin user with full permissions
Enables authentication for security
Restarts MongoDB with authentication enabled
Step 3.5: Launch MongoDB EC2 Instance
What This Does: Launches t2.medium EC2 instance with MongoDB auto-installed.
This takes 3-4 minutes to launch + 2-3 minutes for MongoDB installation.
source ~/.openedx-config/settings.sh
echo "Launching MongoDB EC2 instance..."
MONGO_INSTANCE_ID=$(aws ec2 run-instances \
--image-id $AMI_ID \
--instance-type t2.medium \
--subnet-id $MONGO_SUBNET \
--security-group-ids $MONGO_SG \
--user-data "$USER_DATA" \
--block-device-mappings '[
{
"DeviceName":"/dev/sda1",
"Ebs":{
"VolumeSize":30,
"VolumeType":"gp3",
"Iops":3000,
"Encrypted":true,
"DeleteOnTermination":false
}
}
]' \
--tag-specifications 'ResourceType=instance,Tags=[
{Key=Name,Value=openedx-mongodb},
{Key=Project,Value=openedx},
{Key=Type,Value=database}
]' \
--region $AWS_REGION \
--query 'Instances[0].InstanceId' \
--output text)
echo "Instance ID: $MONGO_INSTANCE_ID"
# Save to config
sed -i "s|export MONGO_INSTANCE_ID=\"\"|export MONGO_INSTANCE_ID=\"$MONGO_INSTANCE_ID\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
# Wait for instance to be running
echo "Waiting for instance to start (1-2 min)..."
aws ec2 wait instance-running \
--instance-ids $MONGO_INSTANCE_ID \
--region $AWS_REGION
echo "β
Instance is running"
What each setting does:
--instance-type t2.medium: 2 vCPU, 4GB RAM (sufficient for OpenEdX)--block-device-mappings: 30GB gp3 storage with 3000 IOPSEncrypted:true: Encryption at rest (security best practice)DeleteOnTermination:false: Keep volume if instance terminates (data safety)
Step 3.6: Get MongoDB IP and Build Connection String
What This Does: Gets private IP and creates MongoDB connection string for Tutor.
source ~/.openedx-config/settings.sh
# Wait for user-data script to complete MongoDB installation
echo "Waiting for MongoDB installation to complete (2-3 min)..."
sleep 180
# Get private IP
MONGO_IP=$(aws ec2 describe-instances \
--instance-ids $MONGO_INSTANCE_ID \
--region $AWS_REGION \
--query 'Reservations[0].Instances[0].PrivateIpAddress' \
--output text)
echo "MongoDB private IP: $MONGO_IP"
# Build MongoDB connection string
# Format: mongodb://username:password@host:port/database?authSource=admin
MONGO_HOST="mongodb://admin:${MONGO_PASSWORD}@${MONGO_IP}:27017/openedx?authSource=admin"
# Save to config
sed -i "s|export MONGO_IP=\"\"|export MONGO_IP=\"$MONGO_IP\"|" ~/.openedx-config/settings.sh
sed -i "s|export MONGO_HOST=\"\"|export MONGO_HOST=\"$MONGO_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
MongoDB connection string created"
echo "IP: $MONGO_IP"
echo "Connection: mongodb://admin:***@$MONGO_IP:27017/openedx"
Connection String Explained:
mongodb:// Protocol
admin:password Username and password
@192.168.x.x Private IP (only accessible from VPC)
:27017 MongoDB port
/openedx Database name
?authSource=admin Authentication database
Step 3.7: Verify MongoDB Installation
What This Does: Tests that MongoDB is installed, running, and accepting connections.
source ~/.openedx-config/settings.sh
echo "Testing MongoDB connection from Kubernetes..."
kubectl run mongo-test --rm -i --image=mongo:8.0 -n openedx -- \
mongosh "$MONGO_HOST" --eval "
db.adminCommand({ping: 1});
db.version();
db.getMongo();
"
echo "β
MongoDB connection verified!"
Expected output:
{ ok: 1 }
8.0.x
mongodb://admin:***@192.168.x.x:27017/openedx?authSource=admin
Troubleshooting MongoDB
If connection fails:
# Check instance is running
aws ec2 describe-instances \
--instance-ids $MONGO_INSTANCE_ID \
--query 'Reservations[0].Instances[0].State.Name'
# Check user-data script logs (need SSM or SSH)
aws ec2 get-console-output \
--instance-id $MONGO_INSTANCE_ID \
--output text
# Check security group allows port 27017
aws ec2 describe-security-groups \
--group-ids $MONGO_SG \
--query 'SecurityGroups[0].IpPermissions[?ToPort==`27017`]'
Screenshot for Evidence
EC2 Console showing running MongoDB instance
Output of
mongo-testpod showing successful connectionMongoDB version output
PART 4: Redis & OpenSearch
What This Does
Creates caching (Redis) and search (OpenSearch) services. Both are AWS managed services for reliability.
Why These?
Redis: Session caching, API response caching, background job queue
OpenSearch: Full-text course search, analytics, reporting
Step 4.1: Create Redis (ElastiCache)
What This Does: Creates a managed Redis 7.1 instance for caching.
source ~/.openedx-config/settings.sh
echo "Creating Redis 7.1 (5-10 min)..."
# Recalculate security groups
DEFAULT_SG=$(aws ec2 describe-security-groups \
--filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
--region $AWS_REGION \
--query 'SecurityGroups[0].GroupId' --output text)
EKS_SG=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
--output text)
# Allow Redis port 6379
aws ec2 authorize-security-group-ingress \
--group-id $DEFAULT_SG \
--protocol tcp \
--port 6379 \
--source-group $EKS_SG \
--region $AWS_REGION 2>/dev/null || echo "Redis rule exists"
# Recalculate private subnets
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
--region $AWS_REGION \
--query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
--output text | tr '\t' ' ')
# Create cache subnet group
aws elasticache create-cache-subnet-group \
--cache-subnet-group-name ${PROJECT_NAME}-redis \
--cache-subnet-group-description "Redis subnet for OpenEdX" \
--subnet-ids $PRIVATE_SUBNETS \
--region $AWS_REGION 2>/dev/null || echo "Subnet group exists"
# Create Redis cluster
aws elasticache create-cache-cluster \
--cache-cluster-id ${PROJECT_NAME}-redis \
--cache-node-type cache.t3.micro \
--engine redis \
--engine-version 7.1 \
--num-cache-nodes 1 \
--cache-subnet-group-name ${PROJECT_NAME}-redis \
--security-group-ids $DEFAULT_SG \
--region $AWS_REGION
# Wait for Redis to be available
echo "Waiting for Redis (5-10 min)..."
aws elasticache wait cache-cluster-available \
--cache-cluster-id ${PROJECT_NAME}-redis \
--region $AWS_REGION
echo "β
Redis cluster created"
Step 4.2: Get Redis Endpoint
source ~/.openedx-config/settings.sh
REDIS_HOST=$(aws elasticache describe-cache-clusters \
--cache-cluster-id ${PROJECT_NAME}-redis \
--show-cache-node-info \
--region $AWS_REGION \
--query 'CacheClusters[0].CacheNodes[0].Endpoint.Address' \
--output text)
# Save to config
sed -i "s|export REDIS_HOST=\"\"|export REDIS_HOST=\"$REDIS_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
Redis endpoint: $REDIS_HOST"
Step 4.3: Create OpenSearch Domain
What This Does: Creates managed OpenSearch 2.11 for course search and analytics.
This takes 15-20 minutes and runs in background.
source ~/.openedx-config/settings.sh
echo "Creating OpenSearch 2.11 (15-20 min, background)..."
# Recalculate security groups and subnets
DEFAULT_SG=$(aws ec2 describe-security-groups \
--filters "Name=vpc-id,Values=$VPC_ID" "Name=group-name,Values=default" \
--region $AWS_REGION \
--query 'SecurityGroups[0].GroupId' --output text)
EKS_SG=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
--output text)
# Allow OpenSearch port 443
aws ec2 authorize-security-group-ingress \
--group-id $DEFAULT_SG \
--protocol tcp \
--port 443 \
--source-group $EKS_SG \
--region $AWS_REGION 2>/dev/null || echo "OpenSearch rule exists"
# Get private subnets
PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
--region $AWS_REGION \
--query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId' \
--output text | tr '\t' ' ')
# Get first subnet for OpenSearch (single-node)
OPENSEARCH_SUBNET=$(echo $PRIVATE_SUBNETS | awk '{print $1}')
# Create OpenSearch domain
aws opensearch create-domain \
--domain-name ${PROJECT_NAME}-search \
--engine-version OpenSearch_2.11 \
--cluster-config \
InstanceType=t3.small.search,InstanceCount=1 \
--ebs-options \
EBSEnabled=true,VolumeType=gp3,VolumeSize=10,Iops=3000 \
--vpc-options \
"SubnetIds=$OPENSEARCH_SUBNET,SecurityGroupIds=$DEFAULT_SG" \
--access-policies '{
"Version":"2012-10-17",
"Statement":[{
"Effect":"Allow",
"Principal":{"AWS":"*"},
"Action":"es:*",
"Resource":"*"
}]
}' \
--region $AWS_REGION
echo "β
OpenSearch domain creation started (15-20 min)"
echo "Continuing with other tasks while it creates..."
Step 4.4: Create OpenSearch Check Script
What This Does: Creates a script to check when OpenSearch is ready.
cat > ~/.openedx-config/check-opensearch.sh <<'CHECK'
#!/bin/bash
source ~/.openedx-config/settings.sh
STATUS=$(aws opensearch describe-domain \
--domain-name ${PROJECT_NAME}-search \
--region $AWS_REGION \
--query 'DomainStatus.Processing' \
--output text)
if [ "$STATUS" = "False" ]; then
OPENSEARCH_HOST=$(aws opensearch describe-domain \
--domain-name ${PROJECT_NAME}-search \
--region $AWS_REGION \
--query 'DomainStatus.Endpoints.vpc' \
--output text)
sed -i "s|export OPENSEARCH_HOST=\"\"|export OPENSEARCH_HOST=\"$OPENSEARCH_HOST\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
OpenSearch ready: https://$OPENSEARCH_HOST"
exit 0
else
echo "β³ OpenSearch still creating... ($STATUS)"
exit 1
fi
CHECK
chmod +x ~/.openedx-config/check-opensearch.sh
echo "β
OpenSearch check script created"
echo "Run: ~/.openedx-config/check-opensearch.sh to check status"
Use this script later before deploying OpenEdX!
Verification
# Check Redis
kubectl run redis-test --rm -i --image=redis:7.1 -n openedx -- \
redis-cli -h $REDIS_HOST ping
# Should return: PONG
# Check OpenSearch status
~/.openedx-config/check-opensearch.sh
Screenshot for Evidence
ElastiCache Console showing Redis cluster
OpenSearch Console showing domain
Output of
redis-cli pingtest
PART 5: Storage (S3 + EBS)
What This Does
Sets up storage for static files (CSS, JS, images) in S3 and persistent volumes for uploads in EBS.
Why S3?
Cost-effective: Pay only for what you use
Scalable: No size limits
Fast: Can be served via CloudFront CDN
Durable: 99.999999999% durability (11 nines)
Step 5.1: Create S3 Bucket
What This Does: Creates encrypted S3 bucket for static files.
source ~/.openedx-config/settings.sh
# Create unique bucket name with timestamp
S3_BUCKET_NAME="${PROJECT_NAME}-static-$(date +%s)"
echo "Creating S3 bucket: $S3_BUCKET_NAME"
# Create bucket
aws s3api create-bucket \
--bucket $S3_BUCKET_NAME \
--region $AWS_REGION
# Enable versioning (keep file history)
aws s3api put-bucket-versioning \
--bucket $S3_BUCKET_NAME \
--versioning-configuration Status=Enabled
# Block all public access (security)
aws s3api put-public-access-block \
--bucket $S3_BUCKET_NAME \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# Enable encryption at rest
aws s3api put-bucket-encryption \
--bucket $S3_BUCKET_NAME \
--server-side-encryption-configuration '{
"Rules":[{
"ApplyServerSideEncryptionByDefault":{
"SSEAlgorithm":"AES256"
}
}]
}'
# Save to config
sed -i "s|export S3_BUCKET_NAME=\"\"|export S3_BUCKET_NAME=\"$S3_BUCKET_NAME\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
S3 bucket created: $S3_BUCKET_NAME"
Step 5.2: Create IAM Policy for S3 Access
What This Does: Creates permissions for OpenEdX pods to read/write S3.
source ~/.openedx-config/settings.sh
cat > /tmp/s3-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": [
"arn:aws:s3:::$S3_BUCKET_NAME",
"arn:aws:s3:::$S3_BUCKET_NAME/*"
]
}]
}
EOF
# Create IAM policy
aws iam create-policy \
--policy-name ${PROJECT_NAME}-s3-policy \
--policy-document file:///tmp/s3-policy.json \
2>/dev/null || echo "Policy already exists"
echo "β
IAM policy created"
Step 5.3: Create IAM Role for Service Account
What This Does: Links IAM permissions to Kubernetes service account using IRSA (IAM Roles for Service Accounts).
Why IRSA?
No AWS credentials in pods (security)
Automatic credential rotation
Fine-grained permissions per pod
source ~/.openedx-config/settings.sh
eksctl create iamserviceaccount \
--name openedx-s3-sa \
--namespace openedx \
--cluster $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--role-name ${PROJECT_NAME}-s3-role \
--attach-policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${PROJECT_NAME}-s3-policy \
--approve \
--override-existing-serviceaccounts
echo "β
Service account created with S3 access"
Step 5.4: Configure gp3 Storage Class
What This Does: Sets gp3 as default storage class for persistent volumes.
Why gp3 over gp2?
Same or lower cost
3000 baseline IOPS (vs gp2's 3 IOPS/GB)
125 MiB/s baseline throughput
Better performance for databases and file uploads
cat > ~/openedx-project/k8s/storageclass-gp3.yaml <<'EOF'
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
EOF
# Install EBS CSI driver
eksctl create addon \
--name aws-ebs-csi-driver \
--cluster $EKS_CLUSTER_NAME \
--region $AWS_REGION \
--force
# Wait for driver to be ready
echo "Waiting for EBS CSI driver (30 sec)..."
sleep 30
# Remove gp2 as default
kubectl annotate storageclass gp2 \
storageclass.kubernetes.io/is-default-class=false \
--overwrite 2>/dev/null || true
# Apply gp3 storage class
kubectl apply -f ~/openedx-project/k8s/storageclass-gp3.yaml
echo "β
gp3 storage class configured as default"
Verification
# Check storage classes
kubectl get storageclass
# Should show:
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
# gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer
# gp3 (default) ebs.csi.aws.com Retain WaitForFirstConsumer
# Check S3 bucket
aws s3 ls | grep $S3_BUCKET_NAME
# Check service account
kubectl get serviceaccount openedx-s3-sa -n openedx
Screenshot for Evidence
S3 Console showing bucket with encryption enabled
Output of
kubectl get storageclassshowing gp3 as defaultIAM Console showing policy and role
PART 6: Deploy OpenEdX
What This Does
Deploys OpenEdX using Tutor 21.0.1 with all external databases configured.
Critical Lessons Learned
S3 plugin is BROKEN in Tutor 21.0.1 - causes
STORAGESerrorMeilisearch must be disabled - causes blank page issues
Caddy must be disabled properly - use
ENABLE_WEB_PROXY=falseMySQL needs root credentials - set
MYSQL_ROOT_USERNAMEandMYSQL_ROOT_PASSWORDMigrations don't run automatically - must run manually from worker pod
Step 6.1: Check OpenSearch is Ready
CRITICAL: OpenSearch must be ready before deploying!
source ~/.openedx-config/settings.sh
echo "Checking if OpenSearch is ready..."
~/.openedx-config/check-opensearch.sh
# If not ready, wait and check again
while ! ~/.openedx-config/check-opensearch.sh; do
echo "Waiting 60 seconds..."
sleep 60
done
source ~/.openedx-config/settings.sh
echo "β
OpenSearch ready: $OPENSEARCH_HOST"
Step 6.2: Configure Tutor
What This Does: Configures Tutor with all external services and disables problematic plugins.
source ~/.openedx-config/settings.sh
cd ~/openedx-project
echo "Configuring Tutor 21.0.1..."
# Initialize Tutor configuration
tutor config save
# CRITICAL: Disable problematic features
echo "Disabling Caddy (replaced by Nginx)..."
tutor config save \
--set ENABLE_WEB_PROXY=false \
--set CADDY_HTTP_PORT=81
# Disable internal services (using external)
echo "Configuring external services..."
tutor config save \
--set RUN_MYSQL=false \
--set RUN_MONGODB=false \
--set RUN_REDIS=false \
--set RUN_ELASTICSEARCH=false \
--set RUN_MEILISEARCH=false \
--set RUN_SMTP=false \
--set ENABLE_HTTPS=true \
--set K8S_NAMESPACE=openedx
# MySQL configuration (BOTH app and root credentials!)
echo "Configuring MySQL..."
tutor config save \
--set MYSQL_HOST=$MYSQL_HOST \
--set MYSQL_PORT=3306 \
--set MYSQL_DATABASE=openedx \
--set MYSQL_USERNAME=openedx \
--set MYSQL_PASSWORD=$MYSQL_PASSWORD \
--set MYSQL_ROOT_USERNAME=admin \
--set MYSQL_ROOT_PASSWORD=$MYSQL_PASSWORD
# MongoDB configuration
echo "Configuring MongoDB..."
tutor config save \
--set MONGODB_HOST=$MONGO_HOST
# Redis configuration
echo "Configuring Redis..."
tutor config save \
--set REDIS_HOST=$REDIS_HOST \
--set REDIS_PORT=6379
# OpenSearch configuration (use elasticsearch settings)
echo "Configuring OpenSearch..."
tutor config save \
--set SEARCH_ENGINE=elasticsearch \
--set ELASTICSEARCH_HOST=$OPENSEARCH_HOST \
--set ELASTICSEARCH_PORT=443 \
--set ELASTICSEARCH_SCHEME=https
# Domain configuration
echo "Configuring domains..."
tutor config save \
--set LMS_HOST=$DOMAIN \
--set CMS_HOST=$STUDIO_DOMAIN \
--set MFE_HOST=$MFE_DOMAIN
# Session cookie configuration (None = use domain from request)
tutor config save \
--set OPENEDX_COMMON_SESSION_COOKIE_DOMAIN=None \
--set OPENEDX_COMMON_CSRF_COOKIE_DOMAIN=None
echo "β
Tutor configured"
What each setting does:
ENABLE_WEB_PROXY=false: Disables Caddy (we use Nginx)RUN_*=false: Disables internal services (using external)RUN_MEILISEARCH=false: Critical! Prevents blank page issuesSEARCH_ENGINE=elasticsearch: Use OpenSearch (compatible with Elasticsearch API)MYSQL_ROOT_*: Required for Tutor's init jobsSESSION_COOKIE_DOMAIN=None: Allows cookies to work across subdomains
Step 6.3: Deploy OpenEdX to Kubernetes
What This Does: Creates all Kubernetes resources (deployments, services, configmaps).
tutor k8s start
echo "Waiting for pods to start (2 min)..."
sleep 120
# Wait for LMS to be ready
kubectl wait --for=condition=ready \
pod -l app.kubernetes.io/name=lms \
-n openedx \
--timeout=600s
echo "β
OpenEdX deployed to Kubernetes"
Step 6.4: Verify Caddy is Not Running
What This Does: Ensures Caddy is completely removed (we use Nginx).
# Check if Caddy deployment exists
if kubectl get deployment caddy -n openedx 2>&1 | grep -q "NotFound"; then
echo "β
Caddy correctly disabled"
else
echo "β οΈ Caddy still exists, removing..."
kubectl delete deployment caddy service caddy -n openedx 2>/dev/null || true
fi
# Remove any Caddy configmaps
kubectl delete configmap -l app.kubernetes.io/name=caddy -n openedx 2>/dev/null || true
echo "β
Caddy removed"
Step 6.5: Run Database Migrations Manually
What This Does: Creates all database tables. Tutor's k8s init doesn't work properly, so we run migrations from worker pod.
Why from worker pod?
Worker pods are stable (not restarting)
Same code as LMS/CMS
Same database connections
Django locks prevent concurrent migrations
source ~/.openedx-config/settings.sh
echo "Running LMS migrations (creates ~300 database tables)..."
echo "This takes 5-10 minutes..."
# Get LMS worker pod name
LMS_WORKER=$(kubectl get pod -l app.kubernetes.io/name=lms-worker \
-n openedx \
-o jsonpath='{.items[0].metadata.name}')
echo "Using worker pod: $LMS_WORKER"
# Run LMS migrations
kubectl exec -it $LMS_WORKER -n openedx -- \
./manage.py lms migrate --noinput
echo "β
LMS migrations complete"
echo "Running CMS migrations..."
# Get CMS worker pod name
CMS_WORKER=$(kubectl get pod -l app.kubernetes.io/name=cms-worker \
-n openedx \
-o jsonpath='{.items[0].metadata.name}')
echo "Using worker pod: $CMS_WORKER"
# Run CMS migrations
kubectl exec -it $CMS_WORKER -n openedx -- \
./manage.py cms migrate --noinput
echo "β
CMS migrations complete"
echo "β
All database tables created"
What migrations do:
Create ~300 tables in MySQL (users, courses, enrollments, grades, etc.)
Create CMS-specific tables (course authoring, content library)
Set up initial data (waffle switches, site configuration)
Step 6.6: Restart LMS and CMS Pods
What This Does: Restarts application pods so they can connect to newly-migrated database.
echo "Restarting LMS and CMS pods..."
kubectl rollout restart deployment lms cms -n openedx
# Wait for new pods to be ready
echo "Waiting for pods to restart (2 min)..."
sleep 120
kubectl wait --for=condition=ready \
pod -l app.kubernetes.io/name=lms \
-n openedx \
--timeout=600s
kubectl wait --for=condition=ready \
pod -l app.kubernetes.io/name=cms \
-n openedx \
--timeout=600s
echo "β
Pods restarted and ready"
Step 6.7: Create Admin User
What This Does: Creates superuser account for logging into OpenEdX.
source ~/.openedx-config/settings.sh
# Get LMS pod
LMS_POD=$(kubectl get pod -l app.kubernetes.io/name=lms \
-n openedx \
-o jsonpath='{.items[0].metadata.name}')
echo "Creating admin user..."
# Create user with staff and superuser permissions
kubectl exec -it $LMS_POD -n openedx -- \
./manage.py lms manage_user \
admin \
$ADMIN_EMAIL \
--staff \
--superuser
echo "Setting admin password..."
# Set password (will prompt you to enter password twice)
kubectl exec -it $LMS_POD -n openedx -- \
./manage.py lms changepassword admin
echo "β
Admin user created: admin / [your-password]"
echo "β οΈ SAVE THIS PASSWORD - you'll need it to login!"
Verification
# Check all pods are running
kubectl get pods -n openedx
# Should show:
# NAME READY STATUS RESTARTS AGE
# cms-xxx 1/1 Running 0 5m
# cms-worker-xxx 1/1 Running 0 5m
# lms-xxx 1/1 Running 0 5m
# lms-worker-xxx 1/1 Running 0 5m
# mfe-xxx 1/1 Running 0 5m
# Check services
kubectl get svc -n openedx
# Should show LMS, CMS, MFE services on port 8000/8002
# Test LMS API internally
kubectl run test --rm -i --image=curlimages/curl -n openedx -- \
curl -I http://lms:8000/api/user/v1/me
# Should return: HTTP/1.1 401 Unauthorized (correct - needs auth)
Screenshot for Evidence
Output of
kubectl get pods -n openedxshowing all RunningOutput of LMS migrations showing "OK" for each migration
Admin user creation confirmation
PART 7: Nginx Ingress
What This Does
Replaces Caddy with Nginx Ingress Controller for HTTP/2 support and industry-standard reverse proxy.
Why Nginx?
Industry standard: Well-documented, widely used
HTTP/2 support: Faster page loads
Better performance: Handles high traffic efficiently
Al Nafi requirement: Specifically requested in JD
Step 7.1: Install Nginx Ingress Controller
What This Does: Deploys Nginx Ingress Controller with AWS Network Load Balancer.
source ~/.openedx-config/settings.sh
echo "Installing Nginx Ingress Controller 4.14.3..."
# Add Helm repository
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
# Install Nginx Ingress
helm install nginx-ingress ingress-nginx/ingress-nginx \
--version 4.14.3 \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"="nlb" \
--set controller.config.use-http2="true" \
--set controller.config.enable-http3="true" \
--set controller.config.ssl-protocols="TLSv1.2 TLSv1.3" \
--set controller.config.proxy-body-size="100m"
echo "Waiting for Load Balancer (2 min)..."
sleep 120
echo "β
Nginx Ingress installed"
What each setting does:
--version 4.14.3: Latest version supporting Kubernetes 1.34service.type=LoadBalancer: Creates AWS NLBaws-load-balancer-type=nlb: Network Load Balancer (Layer 4)use-http2=true: Enable HTTP/2 protocolenable-http3=true: Enable HTTP/3 (QUIC) supportssl-protocols: TLS 1.2 and 1.3 only (security)proxy-body-size=100m: Allow large file uploads
Step 7.2: Get Load Balancer Hostname
What This Does: Gets AWS NLB DNS name for configuring Cloudflare.
source ~/.openedx-config/settings.sh
LB_HOSTNAME=$(kubectl get svc nginx-ingress-ingress-nginx-controller \
-n ingress-nginx \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
# Save to config
sed -i "s|export LB_HOSTNAME=\"\"|export LB_HOSTNAME=\"$LB_HOSTNAME\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
Load Balancer: $LB_HOSTNAME"
echo "This will be used in Cloudflare DNS"
Step 7.3: Create Ingress Resource
What This Does: Configures routing rules for LMS, CMS, and MFE based on hostname.
source ~/.openedx-config/settings.sh
cat > ~/openedx-project/k8s/ingress.yaml <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openedx-ingress
namespace: openedx
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
ingressClassName: nginx
rules:
- host: $DOMAIN
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: lms
port:
number: 8000
- host: $STUDIO_DOMAIN
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: cms
port:
number: 8000
- host: $MFE_DOMAIN
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mfe
port:
number: 8002
EOF
kubectl apply -f ~/openedx-project/k8s/ingress.yaml
echo "β
Ingress rules configured"
Routing explained:
Request to yourdomain.com
β
Nginx reads Host header: yourdomain.com
β
Matches rule #1
β
Routes to LMS service (port 8000)
Request to studio.yourdomain.com
β
Nginx reads Host header: studio.yourdomain.com
β
Matches rule #2
β
Routes to CMS service (port 8000)
Request to apps.yourdomain.com
β
Nginx reads Host header: apps.yourdomain.com
β
Matches rule #3
β
Routes to MFE service (port 8002)
Verification
# Check Nginx pods
kubectl get pods -n ingress-nginx
# Should show:
# nginx-ingress-ingress-nginx-controller-xxx 1/1 Running
# Check ingress resource
kubectl get ingress -n openedx
# Should show:
# NAME CLASS HOSTS ADDRESS
# openedx-ingress nginx yourdomain.com,studio...,apps... xxx.elb.amazonaws.com
# Test Nginx config
kubectl exec -it \
$(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
-n ingress-nginx -- \
nginx -t
# Should return: configuration file /etc/nginx/nginx.conf test is successful
Screenshot for Evidence
Output of
kubectl get ingress -n openedxAWS EC2 Load Balancers console showing NLB
Nginx controller logs showing HTTP/2 enabled
PART 8: SSL/TLS (cert-manager)
What This Does
Automates SSL certificate management using cert-manager and Let's Encrypt.
Why cert-manager?
Free SSL certificates from Let's Encrypt
Automatic renewal (90-day certs renewed at 60 days)
Industry standard for Kubernetes SSL
Zero maintenance after setup
Step 8.1: Install cert-manager
What This Does: Installs cert-manager CRDs and controller.
echo "Installing cert-manager 1.14.4..."
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
echo "Waiting for cert-manager to be ready (1 min)..."
sleep 60
# Verify cert-manager is running
kubectl get pods -n cert-manager
# Should show 3 pods:
# cert-manager-xxx
# cert-manager-cainjector-xxx
# cert-manager-webhook-xxx
echo "β
cert-manager installed"
Step 8.2: Create Let's Encrypt Issuer
What This Does: Configures cert-manager to use Let's Encrypt for SSL certificates.
source ~/.openedx-config/settings.sh
cat > ~/openedx-project/k8s/letsencrypt-issuer.yaml <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# Let's Encrypt production server
server: https://acme-v02.api.letsencrypt.org/directory
# Email for expiry notifications
email: $ADMIN_EMAIL
# Secret to store account private key
privateKeySecretRef:
name: letsencrypt-prod
# HTTP-01 challenge (proves domain ownership)
solvers:
- http01:
ingress:
class: nginx
EOF
kubectl apply -f ~/openedx-project/k8s/letsencrypt-issuer.yaml
echo "β
Let's Encrypt issuer configured"
How it works:
cert-manager requests certificate from Let's Encrypt
Let's Encrypt sends HTTP challenge: "Prove you own this domain"
cert-manager creates temporary Ingress route for challenge
Let's Encrypt verifies domain ownership via HTTP request
Certificate issued and stored in Kubernetes Secret
Nginx uses certificate for TLS termination
Step 8.3: Update Ingress with TLS
What This Does: Adds TLS configuration to Ingress, triggering automatic certificate issuance.
source ~/.openedx-config/settings.sh
cat > ~/openedx-project/k8s/ingress.yaml <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openedx-ingress
namespace: openedx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
ingressClassName: nginx
tls:
- hosts:
- $DOMAIN
- $STUDIO_DOMAIN
- $MFE_DOMAIN
secretName: openedx-tls
rules:
- host: $DOMAIN
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: lms
port:
number: 8000
- host: $STUDIO_DOMAIN
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: cms
port:
number: 8000
- host: $MFE_DOMAIN
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mfe
port:
number: 8002
EOF
kubectl apply -f ~/openedx-project/k8s/ingress.yaml
echo "Certificate issuance triggered..."
echo "Waiting for certificate (2-3 min)..."
sleep 180
# Check certificate status
kubectl get certificate -n openedx
# Should show:
# NAME READY SECRET AGE
# openedx-tls True openedx-tls 2m
echo "β
SSL certificates issued"
Step 8.4: Verify SSL Certificate
# Check certificate details
kubectl describe certificate openedx-tls -n openedx
# Should show:
# Status:
# Conditions:
# Type: Ready
# Status: True
# Not After: [3 months from now]
# Test HTTPS (replace with your domain)
curl -I https://$DOMAIN
# Should return: HTTP/2 200
Verification
# Check cert-manager pods
kubectl get pods -n cert-manager
# Check certificate
kubectl get certificate -n openedx
# Should show: openedx-tls True openedx-tls
# Check TLS secret
kubectl get secret openedx-tls -n openedx
# Should show secret with tls.crt and tls.key
# Verify certificate expiry (should be ~90 days)
kubectl get certificate openedx-tls -n openedx -o jsonpath='{.status.notAfter}'
Screenshot for Evidence
Output of
kubectl get certificate -n openedxshowing READY=TrueBrowser showing green padlock on your domain
SSL Labs test showing A+ rating (optional)
PART 9: CloudFront + WAF
What This Does
Sets up CDN for static files and Web Application Firewall for security.
Why CloudFront + WAF?
Faster load times: Serve static files from edge locations
Reduced origin load: S3 serves files, not application servers
DDoS protection: WAF rate limiting and bot detection
Cost savings: Cheaper bandwidth from CloudFront than EKS
Step 9.1: Create CloudFront Origin Access Identity
What This Does: Allows CloudFront to access private S3 bucket.
source ~/.openedx-config/settings.sh
echo "Creating CloudFront Origin Access Identity..."
OAI_ID=$(aws cloudfront create-cloud-front-origin-access-identity \
--cloud-front-origin-access-identity-config \
CallerReference=$(date +%s),Comment="OpenEdX Static Files" \
--query 'CloudFrontOriginAccessIdentity.Id' \
--output text)
echo "OAI ID: $OAI_ID"
# Update S3 bucket policy to allow CloudFront
cat > /tmp/s3-cloudfront-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity $OAI_ID"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::$S3_BUCKET_NAME/*"
}]
}
EOF
aws s3api put-bucket-policy \
--bucket $S3_BUCKET_NAME \
--policy file:///tmp/s3-cloudfront-policy.json
echo "β
S3 bucket policy updated for CloudFront"
Step 9.2: Create CloudFront Distribution
What This Does: Creates CDN distribution for S3 static files.
source ~/.openedx-config/settings.sh
echo "Creating CloudFront distribution..."
cat > /tmp/cloudfront-config.json <<EOF
{
"CallerReference": "$(date +%s)",
"Comment": "OpenEdX Static Files CDN",
"Enabled": true,
"Origins": {
"Quantity": 1,
"Items": [{
"Id": "S3-$S3_BUCKET_NAME",
"DomainName": "$S3_BUCKET_NAME.s3.$AWS_REGION.amazonaws.com",
"S3OriginConfig": {
"OriginAccessIdentity": "origin-access-identity/cloudfront/$OAI_ID"
}
}]
},
"DefaultCacheBehavior": {
"TargetOriginId": "S3-$S3_BUCKET_NAME",
"ViewerProtocolPolicy": "redirect-to-https",
"AllowedMethods": {
"Quantity": 2,
"Items": ["GET", "HEAD"],
"CachedMethods": {
"Quantity": 2,
"Items": ["GET", "HEAD"]
}
},
"ForwardedValues": {
"QueryString": false,
"Cookies": {"Forward": "none"}
},
"MinTTL": 0,
"DefaultTTL": 86400,
"MaxTTL": 31536000,
"Compress": true,
"TrustedSigners": {
"Enabled": false,
"Quantity": 0
}
},
"PriceClass": "PriceClass_100",
"ViewerCertificate": {
"CloudFrontDefaultCertificate": true
},
"HttpVersion": "http2and3"
}
EOF
aws cloudfront create-distribution \
--distribution-config file:///tmp/cloudfront-config.json \
> /tmp/cloudfront-output.json
CF_ID=$(jq -r '.Distribution.Id' /tmp/cloudfront-output.json)
CLOUDFRONT_URL=$(jq -r '.Distribution.DomainName' /tmp/cloudfront-output.json)
# Save to config
sed -i "s|export CLOUDFRONT_ID=\"\"|export CLOUDFRONT_ID=\"$CF_ID\"|" ~/.openedx-config/settings.sh
sed -i "s|export CLOUDFRONT_URL=\"\"|export CLOUDFRONT_URL=\"$CLOUDFRONT_URL\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
CloudFront distribution created"
echo "Distribution ID: $CF_ID"
echo "CloudFront URL: $CLOUDFRONT_URL"
What each setting does:
ViewerProtocolPolicy: redirect-to-https: Force HTTPSDefaultTTL: 86400: Cache for 24 hoursCompress: true: Enable gzip compressionHttpVersion: http2and3: Enable HTTP/2 and HTTP/3PriceClass_100: Use only US, Canada, Europe edge locations (cheapest)
Step 9.3: Create WAF Web ACL
What This Does: Creates Web Application Firewall with rate limiting and DDoS protection.
WAF MUST be in us-east-1 for CloudFront!
source ~/.openedx-config/settings.sh
echo "Creating WAF Web ACL in us-east-1..."
aws wafv2 create-web-acl \
--name ${PROJECT_NAME}-waf \
--scope CLOUDFRONT \
--default-action Allow={} \
--rules '[
{
"Name": "RateLimit",
"Priority": 1,
"Statement": {
"RateBasedStatement": {
"Limit": 2000,
"AggregateKeyType": "IP"
}
},
"Action": {"Block": {}},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "RateLimit"
}
},
{
"Name": "AWSManagedRulesCommonRuleSet",
"Priority": 2,
"Statement": {
"ManagedRuleGroupStatement": {
"VendorName": "AWS",
"Name": "AWSManagedRulesCommonRuleSet"
}
},
"OverrideAction": {"None": {}},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "CommonRuleSet"
}
},
{
"Name": "AWSManagedRulesKnownBadInputsRuleSet",
"Priority": 3,
"Statement": {
"ManagedRuleGroupStatement": {
"VendorName": "AWS",
"Name": "AWSManagedRulesKnownBadInputsRuleSet"
}
},
"OverrideAction": {"None": {}},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "KnownBadInputs"
}
},
{
"Name": "AWSManagedRulesAmazonIpReputationList",
"Priority": 4,
"Statement": {
"ManagedRuleGroupStatement": {
"VendorName": "AWS",
"Name": "AWSManagedRulesAmazonIpReputationList"
}
},
"OverrideAction": {"None": {}},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "IpReputation"
}
}
]' \
--visibility-config \
SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=openedx-waf \
--region us-east-1 \
> /tmp/waf-output.json
WAF_ARN=$(jq -r '.Summary.ARN' /tmp/waf-output.json)
# Save to config
sed -i "s|export WAF_ARN=\"\"|export WAF_ARN=\"$WAF_ARN\"|" ~/.openedx-config/settings.sh
source ~/.openedx-config/settings.sh
echo "β
WAF Web ACL created"
echo "WAF ARN: $WAF_ARN"
WAF Rules Explained:
Rate Limiting: Block IPs making >2000 requests per 5 minutes
Common Rule Set: Protect against SQL injection, XSS, LFI
Known Bad Inputs: Block malformed requests
IP Reputation List: Block known malicious IPs
Step 9.4: Associate WAF with CloudFront
What This Does: Attaches WAF to CloudFront distribution.
source ~/.openedx-config/settings.sh
echo "Waiting for CloudFront distribution to deploy (5-10 min)..."
# Wait for CloudFront to be fully deployed
aws cloudfront wait distribution-deployed \
--id $CF_ID
echo "CloudFront deployed, attaching WAF..."
# Get current distribution config
aws cloudfront get-distribution-config \
--id $CF_ID \
> /tmp/cf-current.json
ETAG=$(jq -r '.ETag' /tmp/cf-current.json)
# Add WAF to config
jq --arg waf "$WAF_ARN" \
'.DistributionConfig.WebACLId = $waf | .DistributionConfig' \
/tmp/cf-current.json \
> /tmp/cf-updated.json
# Update distribution
aws cloudfront update-distribution \
--id $CF_ID \
--if-match $ETAG \
--distribution-config file:///tmp/cf-updated.json
echo "β
WAF attached to CloudFront"
echo "Waiting for distribution update (5 min)..."
sleep 300
echo "β
CloudFront + WAF fully configured"
Verification
# Check CloudFront distribution
aws cloudfront get-distribution --id $CF_ID \
--query 'Distribution.DistributionConfig.Enabled'
# Should return: true
# Check WAF is attached
aws cloudfront get-distribution --id $CF_ID \
--query 'Distribution.DistributionConfig.WebACLId'
# Should return: your WAF ARN
# Test CloudFront URL
curl -I https://$CLOUDFRONT_URL
# Should return: HTTP/2 200
Screenshot for Evidence
CloudFront Console showing distribution
WAF Console showing Web ACL with 4 rules
CloudWatch metrics showing WAF activity
PART 10: Monitoring (Prometheus/Grafana)
What This Does
Sets up centralized monitoring and metrics visualization.
Why Prometheus + Grafana?
Industry standard: Most popular Kubernetes monitoring stack
Real-time metrics: CPU, memory, network, pod health
Custom dashboards: Visualize application performance
Alerting: Get notified of issues
Step 10.1: Install Metrics Server
What This Does: Enables kubectl top and HPA (Horizontal Pod Autoscaler).
echo "Installing Metrics Server..."
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
echo "Waiting for Metrics Server (1 min)..."
sleep 60
# Verify metrics are available
kubectl top nodes
# Should show CPU and memory usage for each node
echo "β
Metrics Server installed"
Step 10.2: Install Prometheus + Grafana Stack
What This Does: Installs complete monitoring stack with pre-configured dashboards.
echo "Installing Prometheus + Grafana..."
# Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=admin \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.prometheusSpec.resources.requests.memory=1Gi \
--set grafana.service.type=LoadBalancer
echo "Waiting for Prometheus and Grafana (2 min)..."
sleep 120
echo "β
Prometheus + Grafana installed"
What this includes:
Prometheus: Metrics collection and storage
Grafana: Visualization dashboards
AlertManager: Alert routing and notifications
Node Exporter: Node-level metrics
kube-state-metrics: Kubernetes object metrics
Pre-built dashboards: Kubernetes cluster, pod, and node dashboards
Step 10.3: Get Grafana URL
What This Does: Gets the Load Balancer URL for accessing Grafana dashboard.
echo "Getting Grafana URL..."
# Wait for LoadBalancer to be provisioned
sleep 60
GRAFANA_URL=$(kubectl get svc prometheus-grafana \
-n monitoring \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo ""
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo " GRAFANA DASHBOARD ACCESS "
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo ""
echo "URL: http://$GRAFANA_URL"
echo "Username: admin"
echo "Password: admin"
echo ""
echo "β οΈ IMPORTANT: Change password after first login!"
echo ""
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
Step 10.4: Access Grafana and View Dashboards
Steps to access Grafana:
Open browser and go to:
http://[GRAFANA_URL]Login with username:
admin, password:adminChange password when prompted
View dashboards:
Click "Dashboards" in left menu
Select "Kubernetes / Compute Resources / Cluster"
This shows overall cluster health
Available Pre-built Dashboards:
Kubernetes / Compute Resources / Cluster: Overall cluster metrics
Kubernetes / Compute Resources / Namespace (Pods): Pod-level metrics
Kubernetes / Compute Resources / Node (Pods): Node-level metrics
Kubernetes / Networking / Cluster: Network traffic
Node Exporter / Nodes: Detailed node metrics
Step 10.5: Create Custom OpenEdX Dashboard
What This Does: Creates a custom dashboard for monitoring OpenEdX specifically.
cat > ~/openedx-project/k8s/grafana-openedx-dashboard.json <<'EOF'
{
"dashboard": {
"title": "OpenEdX Production Monitoring",
"tags": ["openedx", "lms", "cms"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "LMS Pod CPU Usage",
"type": "graph",
"targets": [{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"openedx\",pod=~\"lms.*\"}[5m])) by (pod)",
"legendFormat": "{{pod}}"
}]
},
{
"id": 2,
"title": "LMS Pod Memory Usage",
"type": "graph",
"targets": [{
"expr": "sum(container_memory_usage_bytes{namespace=\"openedx\",pod=~\"lms.*\"}) by (pod)",
"legendFormat": "{{pod}}"
}]
},
{
"id": 3,
"title": "CMS Pod CPU Usage",
"type": "graph",
"targets": [{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"openedx\",pod=~\"cms.*\"}[5m])) by (pod)",
"legendFormat": "{{pod}}"
}]
},
{
"id": 4,
"title": "HTTP Request Rate",
"type": "graph",
"targets": [{
"expr": "sum(rate(nginx_ingress_controller_requests[5m])) by (host)",
"legendFormat": "{{host}}"
}]
}
]
}
}
EOF
echo "β
Custom OpenEdX dashboard created"
echo "Import this dashboard in Grafana:"
echo "1. Go to Dashboards β Import"
echo "2. Upload: ~/openedx-project/k8s/grafana-openedx-dashboard.json"
Step 10.6: View Prometheus Metrics
Steps to access Prometheus:
# Port-forward Prometheus UI
kubectl port-forward -n monitoring \
svc/prometheus-kube-prometheus-prometheus \
9090:9090 &
echo "Prometheus UI: http://localhost:9090"
Useful Prometheus Queries:
# Total pod count in openedx namespace
count(kube_pod_info{namespace="openedx"})
# CPU usage by pod
rate(container_cpu_usage_seconds_total{namespace="openedx"}[5m])
# Memory usage by pod
container_memory_usage_bytes{namespace="openedx"}
# Pod restart count
kube_pod_container_status_restarts_total{namespace="openedx"}
# HTTP requests per second
rate(nginx_ingress_controller_requests[5m])
Verification
# Check monitoring pods
kubectl get pods -n monitoring
# Should show:
# alertmanager-xxx
# prometheus-xxx
# grafana-xxx
# prometheus-kube-state-metrics-xxx
# prometheus-prometheus-node-exporter-xxx
# Check Grafana service
kubectl get svc -n monitoring
# Test metrics endpoint
kubectl top pods -n openedx
# Should show CPU and memory usage for each pod
Screenshot for Evidence
Grafana dashboard showing OpenEdX pod metrics
Prometheus targets page showing all targets "UP"
kubectl top pods -n openedxoutput
PART 11: HPA & Scaling
What This Does
Configures Horizontal Pod Autoscaling for automatic scaling based on CPU usage.
Why HPA?
Handles traffic spikes: Automatically adds pods during high load
Cost optimization: Scales down during low traffic
High availability: Multiple pods provide redundancy
Performance: Distributes load across pods
Step 11.1: Create HPA for LMS
What This Does: Auto-scales LMS pods from 2 to 5 based on 70% CPU threshold.
cat > ~/openedx-project/k8s/hpa-lms.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: lms-hpa
namespace: openedx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: lms
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max
EOF
kubectl apply -f ~/openedx-project/k8s/hpa-lms.yaml
echo "β
LMS HPA configured"
Configuration explained:
minReplicas: 2: Always run at least 2 pods (high availability)maxReplicas: 5: Scale up to maximum 5 podsaverageUtilization: 70: Trigger scaling at 70% CPUscaleDown.stabilizationWindowSeconds: 300: Wait 5 min before scaling down (prevent flapping)scaleUp.stabilizationWindowSeconds: 0: Scale up immediatelyscaleUp.policies: Can double pods or add 2 pods at a time (whichever is more)
Step 11.2: Create HPA for CMS
What This Does: Auto-scales CMS pods from 1 to 3 (lower than LMS since less traffic).
cat > ~/openedx-project/k8s/hpa-cms.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cms-hpa
namespace: openedx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cms
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 0
EOF
kubectl apply -f ~/openedx-project/k8s/hpa-cms.yaml
echo "β
CMS HPA configured"
Step 11.3: Scale Down Single Pods
What This Does: Ensures HPA controls replica count (remove any manual scaling).
# Let HPA manage LMS replicas
kubectl scale deployment lms --replicas=2 -n openedx
# Let HPA manage CMS replicas
kubectl scale deployment cms --replicas=1 -n openedx
echo "Waiting for HPA to take control (30 sec)..."
sleep 30
echo "β
Deployments scaled down, HPA in control"
Step 11.4: Test Auto-Scaling
What This Does: Generates load to trigger HPA scaling.
source ~/.openedx-config/settings.sh
echo "Testing auto-scaling with load..."
# Create load generator pod
kubectl run load-generator --rm -i --image=busybox -n openedx -- /bin/sh -c "
while true; do
wget -q -O- http://lms:8000 > /dev/null
done
"
# In another terminal, watch HPA:
# kubectl get hpa -n openedx -w
# You should see:
# - CPU usage increase
# - HPA change from 2 to 3 to 4 pods as CPU crosses 70%
# - After stopping load, pods scale back down to 2
# Stop load generator: Ctrl+C
Verification
# Check HPA status
kubectl get hpa -n openedx
# Should show:
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# lms-hpa Deployment/lms 45%/70% 2 5 2
# cms-hpa Deployment/cms 30%/70% 1 3 1
# Check current pod count
kubectl get pods -n openedx | grep -E '(lms|cms)-[a-z0-9]+-' | wc -l
# Watch HPA in real-time
kubectl get hpa -n openedx -w
# Check HPA events
kubectl describe hpa lms-hpa -n openedx
Screenshot for Evidence
Output of
kubectl get hpa -n openedxGrafana dashboard during load test showing CPU spike
kubectl get pods -n openedxduring scale-up showing multiple LMS pods
PART 12: DNS Configuration
What This Does
Configures Cloudflare DNS to point your domains to the Load Balancer.
Why Cloudflare?
Free plan works perfectly
DNS management is simple
Additional features: DDoS protection, SSL, caching
Fast DNS resolution: 99.99% uptime
Step 12.1: Add Domain to Cloudflare
Manual steps (do in browser):
Go to: https://www.cloudflare.com/
Sign up or log in
Click: "Add a Site"
Enter your domain: yourdomain.com
Select plan: Free
Click: "Continue"
Cloudflare scans existing DNS records (if any)
Click: "Continue"
Cloudflare shows nameservers:
ava.ns.cloudflare.comkal.ns.cloudflare.comCopy these nameservers
Step 12.2: Update Nameservers at Domain Registrar
Where your domain is registered (GoDaddy, Namecheap, etc.):
Log in to your domain registrar
Find "Manage DNS" or "Nameservers"
Change from "Default" to "Custom"
Enter Cloudflare nameservers:
ava.ns.cloudflare.comkal.ns.cloudflare.comSave changes
Wait 2-24 hours for DNS propagation (usually ~1 hour)
Step 12.3: Configure DNS Records in Cloudflare
In Cloudflare Dashboard β DNS β Records:
source ~/.openedx-config/settings.sh
echo ""
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo " CLOUDFLARE DNS CONFIGURATION "
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo ""
echo "Add these DNS records in Cloudflare:"
echo ""
echo "1. LMS (Main Site)"
echo " Type: CNAME"
echo " Name: @"
echo " Content: $LB_HOSTNAME"
echo " Proxy: DNS only (gray cloud)"
echo " TTL: Auto"
echo ""
echo "2. Studio (Course Authoring)"
echo " Type: CNAME"
echo " Name: studio"
echo " Content: $LB_HOSTNAME"
echo " Proxy: DNS only (gray cloud)"
echo " TTL: Auto"
echo ""
echo "3. MFE (Login/Register)"
echo " Type: CNAME"
echo " Name: apps"
echo " Content: $LB_HOSTNAME"
echo " Proxy: DNS only (gray cloud)"
echo " TTL: Auto"
echo ""
echo "4. CDN (Static Files)"
echo " Type: CNAME"
echo " Name: cdn"
echo " Content: $CLOUDFRONT_URL"
echo " Proxy: DNS only (gray cloud)"
echo " TTL: Auto"
echo ""
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
Important: Use "DNS only" (gray cloud), NOT "Proxied" (orange cloud)
Why DNS only?
SSL termination happens at Nginx (not Cloudflare)
Prevents double SSL termination
Cloudflare proxy would interfere with cert-manager
Step 12.4: Configure Cloudflare SSL Settings
In Cloudflare Dashboard β SSL/TLS:
Set SSL/TLS encryption mode:
Go to: SSL/TLS β Overview
Select: "Full (strict)"
This ensures end-to-end encryption
Enable Always Use HTTPS:
Go to: SSL/TLS β Edge Certificates
Toggle ON: "Always Use HTTPS"
This redirects HTTP to HTTPS
Enable Automatic HTTPS Rewrites:
Toggle ON: "Automatic HTTPS Rewrites"
Fixes mixed content warnings
Enable HTTP/2:
Toggle ON: "HTTP/2"
Faster page loads
Enable HTTP/3 (QUIC):
Toggle ON: "HTTP/3 (with QUIC)"
Even faster, uses UDP
Enable Brotli Compression:
Go to: Speed β Optimization
Toggle ON: "Brotli"
Smaller file sizes
Step 12.5: Verify DNS Propagation
Wait 5-30 minutes, then test:
source ~/.openedx-config/settings.sh
echo "Testing DNS resolution..."
# Test main domain
nslookup $DOMAIN
# Should return Load Balancer IP addresses
# Test studio
nslookup $STUDIO_DOMAIN
# Should return Load Balancer IP addresses (same as above)
# Test apps
nslookup $MFE_DOMAIN
# Should return Load Balancer IP addresses (same as above)
# Test CDN
nslookup $CDN_DOMAIN
# Should return CloudFront IP addresses (different from above)
echo "β
DNS configured"
Verification
# Test HTTPS on all domains
curl -I https://$DOMAIN
# Should return: HTTP/2 200
curl -I https://$STUDIO_DOMAIN
# Should return: HTTP/2 200
curl -I https://$MFE_DOMAIN/authn/login
# Should return: HTTP/2 200
curl -I https://$CDN_DOMAIN
# Should return: HTTP/2 200 (from CloudFront)
# Check SSL certificate
echo | openssl s_client -connect $DOMAIN:443 -servername $DOMAIN 2>/dev/null | \
openssl x509 -noout -dates
# Should show: Let's Encrypt certificate valid for 90 days
Screenshot for Evidence
Cloudflare DNS records page
Output of
nslookupshowing correct IPsBrowser showing green padlock on all domains
Verification & Testing
Complete System Check
Run this comprehensive verification:
source ~/.openedx-config/settings.sh
echo ""
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo " OPENEDX PRODUCTION VERIFICATION "
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo ""
# 1. Kubernetes Cluster
echo "1. KUBERNETES CLUSTER"
kubectl get nodes
echo ""
# 2. OpenEdX Pods
echo "2. OPENEDX PODS"
kubectl get pods -n openedx
echo ""
# 3. External Databases
echo "3. EXTERNAL DATABASES"
echo "MySQL: $MYSQL_HOST"
echo "MongoDB: $MONGO_IP (t2.medium)"
echo "Redis: $REDIS_HOST"
echo "OpenSearch: $OPENSEARCH_HOST"
echo ""
# 4. Ingress & Load Balancer
echo "4. INGRESS & LOAD BALANCER"
kubectl get ingress -n openedx
echo "Load Balancer: $LB_HOSTNAME"
echo ""
# 5. SSL Certificates
echo "5. SSL CERTIFICATES"
kubectl get certificate -n openedx
echo ""
# 6. HPA (Auto-scaling)
echo "6. HORIZONTAL POD AUTOSCALING"
kubectl get hpa -n openedx
echo ""
# 7. Storage
echo "7. STORAGE"
echo "S3 Bucket: $S3_BUCKET_NAME"
kubectl get storageclass
echo ""
# 8. CDN & Security
echo "8. CDN & SECURITY"
echo "CloudFront: $CLOUDFRONT_URL"
echo "WAF: Enabled (4 rules)"
echo ""
# 9. Monitoring
echo "9. MONITORING"
kubectl get pods -n monitoring | grep -E '(prometheus|grafana)'
echo "Grafana: http://$(kubectl get svc prometheus-grafana -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')"
echo ""
# 10. Endpoints
echo "10. PUBLIC ENDPOINTS"
echo "LMS: https://$DOMAIN"
echo "Studio: https://$STUDIO_DOMAIN"
echo "Login: https://$MFE_DOMAIN/authn/login"
echo "Admin: https://$DOMAIN/admin (username: admin)"
echo ""
echo "ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
Functional Testing
Test each component:
source ~/.openedx-config/settings.sh
# 1. Test LMS Homepage
echo "Testing LMS..."
curl -I https://$DOMAIN
# Should return: HTTP/2 200
# 2. Test Studio
echo "Testing Studio..."
curl -I https://$STUDIO_DOMAIN
# Should return: HTTP/2 302 (redirect to login)
# 3. Test MFE Login
echo "Testing MFE Login..."
curl -I https://$MFE_DOMAIN/authn/login
# Should return: HTTP/2 200
# 4. Test API
echo "Testing LMS API..."
curl -I https://$DOMAIN/api/user/v1/me
# Should return: HTTP/2 401 (correct - needs authentication)
# 5. Test Static Files via CDN
echo "Testing CDN..."
curl -I https://$CDN_DOMAIN
# Should return: HTTP/2 200 (from CloudFront)
echo "β
All endpoints responding correctly"
Browser Testing
Open in browser and verify:
LMS Homepage: https://yourdomain.com
Should load OpenEdX homepage
Check SSL (green padlock)
Check Network tab: HTTP/2 protocol
Login Page: https://apps.yourdomain.com/authn/login
Should load login form
Test login with admin credentials
Should redirect to dashboard
Studio: https://studio.yourdomain.com
Should redirect to login
After login, should show Studio homepage
Admin Panel: https://yourdomain.com/admin
Login with admin credentials
Should show Django admin interface
Performance Testing
Test auto-scaling:
# Generate load
kubectl run -i --tty load-generator --rm \
--image=busybox \
--restart=Never \
-n openedx -- /bin/sh -c \
"while sleep 0.01; do wget -q -O- http://lms:8000; done"
# In another terminal, watch scaling
kubectl get hpa -n openedx -w
# Should see:
# - CPU usage increase
# - REPLICAS increase from 2 to 3, 4, 5
# - After stopping load, scale back down to 2
Security Testing
Verify WAF is working:
# Test rate limiting (make >2000 requests in 5 minutes)
for i in {1..2100}; do
curl -s https://$DOMAIN > /dev/null &
done
wait
# Check WAF metrics in AWS Console:
# WAF β Web ACLs β openedx-prod-waf β Metrics
# Should see blocked requests
# Test SQL injection protection
curl "https://$DOMAIN/?id=1' OR '1'='1"
# Should be blocked by WAF (returns 403)
Screenshot Checklist
Take screenshots of:
β
kubectl get nodes- 3 nodes Readyβ
kubectl get pods -n openedx- all Runningβ
kubectl get hpa -n openedx- HPA configuredβ
kubectl get certificate -n openedx- SSL cert Readyβ AWS RDS Console - MySQL instance running
β AWS EC2 Console - MongoDB instance running
β AWS ElastiCache Console - Redis cluster
β AWS OpenSearch Console - domain active
β AWS CloudFront Console - distribution deployed
β AWS WAF Console - Web ACL with 4 rules
β Cloudflare DNS records
β Grafana dashboard showing metrics
β Browser showing OpenEdX homepage with SSL
β Browser showing Studio with SSL
β Browser showing MFE login with SSL
Backup Strategy
Automated Daily Backups
Create backup script:
cat > ~/openedx-project/scripts/backup-daily.sh <<'BACKUP'
#!/bin/bash
set -e
source ~/.openedx-config/settings.sh
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR=~/openedx-backups/$DATE
mkdir -p $BACKUP_DIR
echo "Starting backup: $DATE"
# 1. MySQL Backup (RDS snapshot)
echo "Backing up MySQL..."
aws rds create-db-snapshot \
--db-instance-identifier ${PROJECT_NAME}-mysql \
--db-snapshot-identifier mysql-backup-$DATE \
--region $AWS_REGION
# 2. Redis Backup (ElastiCache snapshot)
echo "Backing up Redis..."
aws elasticache create-snapshot \
--cache-cluster-id ${PROJECT_NAME}-redis \
--snapshot-name redis-backup-$DATE \
--region $AWS_REGION
# 3. MongoDB Backup (EBS snapshot)
echo "Backing up MongoDB..."
MONGO_VOL=$(aws ec2 describe-instances \
--instance-ids $MONGO_INSTANCE_ID \
--query 'Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId' \
--output text)
aws ec2 create-snapshot \
--volume-id $MONGO_VOL \
--description "MongoDB backup $DATE" \
--region $AWS_REGION
# 4. OpenSearch Backup (manual snapshot)
echo "Backing up OpenSearch..."
curl -X PUT "https://$OPENSEARCH_HOST/_snapshot/backup-$DATE" \
-H "Content-Type: application/json" \
-d '{
"type": "s3",
"settings": {
"bucket": "'$S3_BUCKET_NAME'",
"region": "'$AWS_REGION'",
"base_path": "opensearch-backups/'$DATE'"
}
}'
# 5. Kubernetes Config Backup
echo "Backing up Kubernetes configs..."
kubectl get all -n openedx -o yaml > $BACKUP_DIR/k8s-resources.yaml
kubectl get configmap -n openedx -o yaml > $BACKUP_DIR/k8s-configmaps.yaml
kubectl get secret -n openedx -o yaml > $BACKUP_DIR/k8s-secrets.yaml
kubectl get pvc -n openedx -o yaml > $BACKUP_DIR/k8s-pvcs.yaml
# 6. Tutor Config Backup
echo "Backing up Tutor config..."
cp ~/.local/share/tutor/config.yml $BACKUP_DIR/tutor-config.yml
cp -r ~/.local/share/tutor/env $BACKUP_DIR/tutor-env
# 7. Project Files Backup
echo "Backing up project files..."
tar -czf $BACKUP_DIR/project-files.tar.gz ~/openedx-project/
echo "β
Backup complete: $BACKUP_DIR"
echo ""
echo "Backup contents:"
ls -lh $BACKUP_DIR/
BACKUP
chmod +x ~/openedx-project/scripts/backup-daily.sh
echo "β
Backup script created"
Schedule Automated Backups
Set up daily cron job:
# Add to crontab
(crontab -l 2>/dev/null; echo "0 2 * * * ~/openedx-project/scripts/backup-daily.sh >> ~/openedx-backups/backup.log 2>&1") | crontab -
echo "β
Daily backups scheduled for 2 AM"
Manual Backup
Run backup manually:
~/openedx-project/scripts/backup-daily.sh
Restore Procedure
Document how to restore from backup:
cat > ~/openedx-project/docs/RESTORE.md <<'RESTORE'
# OpenEdX Disaster Recovery
## Restore from Backup
### 1. Restore MySQL
```bash
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier openedx-prod-mysql-restored \
--db-snapshot-identifier mysql-backup-YYYYMMDD-HHMMSS
2. Restore Redis
aws elasticache create-cache-cluster \
--cache-cluster-id openedx-prod-redis-restored \
--snapshot-name redis-backup-YYYYMMDD-HHMMSS
3. Restore MongoDB
# Create volume from snapshot
aws ec2 create-volume \
--snapshot-id snap-xxx \
--availability-zone us-east-1a
# Attach to new EC2 instance
# (See full MongoDB setup in main guide)
4. Restore Kubernetes Resources
kubectl apply -f ~/openedx-backups/YYYYMMDD-HHMMSS/k8s-resources.yaml
kubectl apply -f ~/openedx-backups/YYYYMMDD-HHMMSS/k8s-configmaps.yaml
5. Restore Tutor Config
cp ~/openedx-backups/YYYYMMDD-HHMMSS/tutor-config.yml \
~/.local/share/tutor/config.yml
RESTORE
echo "β Restore documentation created"
---
## Troubleshooting Guide
### Common Issues and Solutions
#### 1. Pods Stuck in "Pending" State
**Symptom:**
kubectl get pods -n openedx NAME READY STATUS RESTARTS AGE lms-xxx 0/1 Pending 0 5m
**Cause:** Insufficient resources (CPU/memory)
**Solution:**
```bash
# Check events
kubectl describe pod lms-xxx -n openedx
# If "Insufficient memory":
# Delete old pods to free resources
kubectl delete pod -l app.kubernetes.io/name=lms-worker -n openedx
# Or scale up cluster
eksctl scale nodegroup \
--cluster=openedx-prod \
--name=openedx-workers \
--nodes=4
2. Pods Crashing with "CrashLoopBackOff"
Symptom:
NAME READY STATUS RESTARTS AGE
lms-xxx 0/1 CrashLoopBackOff 5 10m
Solution:
# Check logs for error
kubectl logs lms-xxx -n openedx --tail=50
# Common errors:
# Error: "Table 'openedx.waffle_switch' doesn't exist"
# Solution: Run migrations (see Part 6, Step 6.5)
# Error: "OperationalError: (2003, \"Can't connect to MySQL\")"
# Solution: Check MySQL security group allows port 3306 from EKS
aws ec2 describe-security-groups --group-ids $DEFAULT_SG
# Error: "STORAGES is not defined"
# Solution: S3 plugin is enabled - disable it
tutor plugins disable s3
tutor k8s stop && tutor k8s start
3. SSL Certificate Not Issuing
Symptom:
kubectl get certificate -n openedx
NAME READY SECRET AGE
openedx-tls False openedx-tls 10m
Solution:
# Check certificate status
kubectl describe certificate openedx-tls -n openedx
# Common issues:
# Issue: "Waiting for HTTP-01 challenge propagation"
# Solution: Check ingress is accessible
curl http://$DOMAIN/.well-known/acme-challenge/test
# Issue: "DNS problem: NXDOMAIN"
# Solution: DNS not propagated yet - wait 30 minutes
# Issue: "CAA record prevents issuance"
# Solution: Remove CAA record or add letsencrypt.org
4. Blank Page on apps.yourdomain.com
Symptom: Blank white/black page, no content
Causes & Solutions:
# Cause 1: HTTPS config mismatch
tutor config printvalue ENABLE_HTTPS
# Should be: true
# If false:
tutor config save --set ENABLE_HTTPS=true
kubectl rollout restart deployment mfe -n openedx
# Cause 2: Meilisearch still enabled
tutor config printvalue RUN_MEILISEARCH
# Should be: false
# If true:
tutor config save --set RUN_MEILISEARCH=false
tutor k8s stop && tutor k8s start
# Cause 3: Wrong URL
# MFE has no root page!
# Correct URLs:
https://apps.yourdomain.com/authn/login β
https://apps.yourdomain.com β
5. HPA Not Scaling
Symptom:
kubectl get hpa -n openedx
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
lms-hpa Deployment/lms <unknown>/70% 2 5 2
Solution:
# Check metrics-server is installed
kubectl get deployment metrics-server -n kube-system
# If not found, install:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Wait 2 minutes, then check again
kubectl get hpa -n openedx
6. MongoDB Connection Failed
Symptom:
Error: "MongoNetworkError: failed to connect to server"
Solution:
# Check MongoDB instance is running
aws ec2 describe-instances \
--instance-ids $MONGO_INSTANCE_ID \
--query 'Reservations[0].Instances[0].State.Name'
# Check security group allows port 27017
aws ec2 describe-security-groups \
--group-ids $MONGO_SG \
--query 'SecurityGroups[0].IpPermissions[?ToPort==`27017`]'
# Check MongoDB is actually installed (view user-data logs)
aws ec2 get-console-output \
--instance-id $MONGO_INSTANCE_ID \
--output text | grep "MongoDB installation"
# If installation failed, terminate and recreate instance
7. Grafana Not Accessible
Symptom: Can't access Grafana dashboard
Solution:
# Check Grafana pod is running
kubectl get pods -n monitoring | grep grafana
# Get Grafana URL again
kubectl get svc prometheus-grafana -n monitoring
# If LoadBalancer pending:
kubectl describe svc prometheus-grafana -n monitoring
# Check events for errors
# Alternative: Port-forward
kubectl port-forward -n monitoring \
svc/prometheus-grafana \
3000:80 &
# Access: http://localhost:3000
8. Out of Memory Errors
Symptom:
OOMKilled
Solution:
# Check node memory usage
kubectl top nodes
# Scale up cluster
eksctl scale nodegroup \
--cluster=openedx-prod \
--name=openedx-workers \
--nodes=4
# Or add resource limits
kubectl set resources deployment lms \
-n openedx \
--requests=cpu=500m,memory=1Gi \
--limits=cpu=2,memory=2Gi
Deliverables Checklist
Required Deliverables for Al Nafi Submission
1. Documentation β
[x] README.md (this file)
Architecture overview
Step-by-step deployment guide
Configuration decisions & rationale
Troubleshooting guide
[x] Architecture Diagram
Create using draw.io or similar
Show all components and connections
Include security layers
[x] Network Flow Diagram
Traffic flow from user to database
Show CDN, WAF, Load Balancer, Ingress, Pods
2. Configuration Artifacts β
# Kubernetes manifests
~/openedx-project/k8s/
βββ storageclass-gp3.yaml
βββ ingress.yaml
βββ letsencrypt-issuer.yaml
βββ hpa-lms.yaml
βββ hpa-cms.yaml
βββ grafana-openedx-dashboard.json
# Tutor configuration
~/.local/share/tutor/config.yml
# Persistent variables
~/.openedx-config/settings.sh
3. Automation Scripts β
~/openedx-project/scripts/
βββ backup-daily.sh # Automated backups
βββ restore.sh # Disaster recovery
4. Monitoring Configurations β
[x] Prometheus + Grafana installed
[x] Custom OpenEdX dashboard created
[x] HPA configured with metrics
5. Proof of Implementation β
Screenshots to include:
β EKS cluster with 3 nodes
β All OpenEdX pods running
β External databases (MySQL, MongoDB, Redis, OpenSearch)
β Nginx Ingress Controller
β SSL certificates issued
β HPA configured and working
β CloudFront distribution
β WAF with 4 rules
β Grafana dashboard
β OpenEdX homepage with SSL
β Studio with SSL
β Load test showing auto-scaling
β Database connectivity logs
β Cloudflare DNS configuration
Evaluation Criteria Compliance
How this guide meets Al Nafi requirements:
| Criteria | Weight | Implementation | Status |
| OpenEdX on EKS | 20% | Tutor 21.0.1 on EKS 1.34, 3-node cluster | β |
| External Databases | 20% | MySQL RDS, MongoDB EC2, Redis ElastiCache, OpenSearch | β |
| Nginx (not Caddy) | 15% | Nginx Ingress 4.14.3, HTTP/2, TLS termination | β |
| CloudFront + WAF | 15% | CloudFront for S3, WAF with 4 rules | β |
| Documentation | 15% | Complete guide with architecture, rationale, troubleshooting | β |
| High Availability | 10% | HPA, 3-node cluster, auto-scaling, health probes | β |
| Security | 5% | TLS, WAF, encrypted storage, private databases | β |
| TOTAL | 100% | β 100% |
Cost Breakdown
Monthly Costs (Approximate)
βββββββββββββββββββββββββββββββ¬βββββββββββββββ
β Component β Monthly Cost β
βββββββββββββββββββββββββββββββΌβββββββββββββββ€
β EKS Control Plane β $73 β
β 3Γ t3.medium EC2 (workers) β $75 β
β MySQL RDS (db.t3.medium) β $40 β
β MongoDB EC2 (t2.medium) β $35 β
β Redis ElastiCache (t3.micro)β $12 β
β OpenSearch (t3.small) β $20 β
β S3 Storage β $5 β
β CloudFront + WAF β $10 β
βββββββββββββββββββββββββββββββΌβββββββββββββββ€
β TOTAL β $270 β
βββββββββββββββββββββββββββββββ΄βββββββββββββββ
Notes:
- Costs based on us-east-1 pricing
- Does not include data transfer (minimal for assessment)
- CloudFront cost assumes <10GB/month
- RDS cost assumes 20GB gp3 storage
Cost Optimization Tips
Use Reserved Instances (not for assessment, but for production)
Save 30-60% on EC2 and RDS
Requires 1-3 year commitment
Stop non-production resources
MongoDB EC2 can be stopped when not in use
RDS snapshots instead of running instance
Right-size instances
Monitor usage with Grafana
Scale down if over-provisioned
Use S3 Lifecycle Policies
Move old static files to Glacier
Delete old CloudFront logs
Submission Instructions
Final Steps Before Submission
Test everything one final time:
source ~/.openedx-config/settings.sh ./openedx-project/scripts/verify-deployment.shTake all required screenshots
Create architecture diagrams:
System architecture
Network flow diagram
Security architecture
Organize files:
openedx-eks-submission/ βββ README.md (this guide) βββ ARCHITECTURE.md (architecture decisions) βββ diagrams/ β βββ system-architecture.png β βββ network-flow.png β βββ security-architecture.png βββ screenshots/ β βββ 01-eks-cluster.png β βββ 02-openedx-pods.png β βββ 03-databases.png β βββ ... βββ k8s/ β βββ ingress.yaml β βββ hpa-lms.yaml β βββ ... βββ scripts/ β βββ backup-daily.sh β βββ restore.sh βββ configs/ βββ tutor-config.yml βββ settings.shCreate GitHub repository:
cd ~/openedx-project git init git add . git commit -m "OpenEdX EKS Production Deployment" git remote add origin [your-repo-url] git push -u origin mainWrite final README summary in repository
Email Submission
To: hamza.mughal@alnafi.com, mohammad@alnafi.com
Subject: OpenEdX K8s Assessment β AWS EKS
Body:
Dear Al Nafi Hiring Team,
I am submitting my OpenEdX on AWS EKS deployment for technical assessment.
Project Details:
- Platform: AWS EKS 1.34
- OpenEdX: Tutor 21.0.1
- Domain: [your-domain.com]
- Repository: [GitHub URL]
Live Demo:
- LMS: https://[your-domain.com]
- Studio: https://[studio.your-domain.com]
- Admin: admin / [password in repo]
Key Highlights:
β
Production-grade Kubernetes deployment
β
All databases external (MySQL RDS, MongoDB EC2, Redis, OpenSearch)
β
Nginx Ingress with HTTP/2 and Let's Encrypt SSL
β
CloudFront CDN + AWS WAF with 4-layer protection
β
Horizontal Pod Autoscaling (demonstrated in screenshots)
β
Prometheus + Grafana monitoring
β
Complete documentation and automation scripts
Repository Structure:
- README.md: Complete deployment guide
- diagrams/: System and network architecture
- screenshots/: All required evidence
- k8s/: Kubernetes manifests
- scripts/: Backup and automation
The deployment is fully functional and can be verified at the URLs above.
Thank you for your consideration.
Best regards,
[Your Name]
[Your Email]
[Your Phone]
Repository README Template
# OpenEdX Production Deployment on AWS EKS
## Live Demo
- **LMS:** https://your-domain.com
- **Studio:** https://studio.your-domain.com
- **Admin:** `admin` / [see CREDENTIALS.md]
## Architecture
[Include system architecture diagram]
## Tech Stack
- **Kubernetes:** AWS EKS 1.34
- **OpenEdX:** Tutor 21.0.1
- **Databases:** MySQL RDS 8.0.45, MongoDB 8.0 (EC2), Redis 7.1, OpenSearch 2.11
- **Ingress:** Nginx 4.14.3 with HTTP/2
- **SSL:** cert-manager + Let's Encrypt
- **CDN:** CloudFront + S3
- **Security:** AWS WAF (4 rules)
- **Monitoring:** Prometheus + Grafana
## Deployment
See [DEPLOYMENT.md](DEPLOYMENT.md) for complete step-by-step guide.
## Evidence
- [Screenshots](screenshots/)
- [Architecture Diagrams](diagrams/)
- [Configuration Files](configs/)
## Contact
[Your contact information]
Conclusion
You now have a complete, production-ready OpenEdX deployment on AWS EKS that meets all Al Nafi requirements:
β
Core Platform: EKS 1.34 with 3-node cluster
β
OpenEdX: Tutor 21.0.1 with all components
β
External Databases: MySQL, MongoDB, Redis, OpenSearch
β
Nginx Ingress: HTTP/2 with Let's Encrypt SSL
β
CloudFront + WAF: CDN and 4-layer security
β
Auto-scaling: HPA for LMS and CMS
β
Monitoring: Prometheus + Grafana
β
Documentation: Complete guide with troubleshooting
What Makes This Guide Different
Battle-tested: Based on real deployment experience
Zero-debugging: Fixed all common issues upfront
Production-ready: Not a prototype - actual production architecture
Fully explained: Every command has "what" and "why"
Copy-paste ready: All commands work as-is
Complete: Nothing left out - from AWS account to SSL
Key Lessons Learned
Variable persistence is critical
S3 plugin breaks Tutor 21.0.1 - must disable
Meilisearch causes blank pages - must disable
MySQL needs both app and root credentials
Migrations must be run manually from worker pods
cert-manager is better than manual SSL
gp3 is same price but faster than gp2
Next Steps
Deploy using this guide
Take all screenshots
Create diagrams
Organize repository
Submit to Al Nafi
Good luck with your submission! π
Credits & References
Created by: Battle-tested through real deployment
Date: February 2026
For: Al Nafi International College Assessment
References:
Tutor Documentation: https://docs.tutor.edly.io/
AWS EKS: https://docs.aws.amazon.com/eks/
Kubernetes: https://kubernetes.io/docs/
Let's Encrypt: https://letsencrypt.org/docs/
Prometheus: https://prometheus.io/docs/
Support:
Tutor Community: https://discuss.openedx.org/
Kubernetes Slack: https://kubernetes.slack.com/
END OF GUIDE



