RiddleBuddy · DevOps Portfolio

System Architecture

AWS EKS · Kubernetes · OpenTelemetry · GitHub Actions
Live Demo
01 — Application Architecture
INTERNET END USERS Browser HTTPS :443 EXTERNAL API DeepSeek AI HTTPS :443 / REST VPC · 10.30.0.0/16 PUBLIC SUBNET · 10.30.1.0/24 INTERNET GW IGW igw-riddlebuddy NAT GATEWAY NAT GW nat-riddlebuddy LOAD BALANCER ALB :80 → :443 · sg-alb sg-alb :443 PRIVATE SUBNET · 10.30.2.0/24 EKS CLUSTER · K8s 1.29 INGRESS NGINX :8080 · sg-eks API POD FastAPI Python · :8000 10.30.2.10 SERVICE POD Feedback Java Spring · :8080 feedback-svc CACHE POD Redis :6379 redis-svc (ClusterIP) COLLECTOR OTel :4317 OTLP · otel-svc KUBERNETES SERVICES ClusterIP: feedback-svc · redis-svc · otel-svc · NodePort: api-svc · Namespace: riddlebuddy-prod EC2 · t3.small Grafana :3000 · 10.30.2.50 sg-graf — OUTSIDE VPC — AWS MANAGED Amazon Managed Prometheus (AMP) HTTPS :443 route /api/* REST :6379 → DeepSeek remote_write SigV4 PromQL
Public / HTTPS traffic
Internal API routing
Feedback service (REST)
Cache (Redis)
DeepSeek API (outbound)
Observability pipeline
02 — Observability Stack
EKS CLUSTER APPLICATION PODS FASTAPI POD riddlebuddy-api OTel SDK (Python) metrics · traces · logs FEEDBACK POD riddlebuddy-fb OTel SDK (Java) metrics · traces · logs CLUSTER ClusterMetrics kubelet · kube-state node / pod metrics COLLECTION LAYER OTEL COLLECTOR opentelemetry-collector Receives: OTLP REST :4317 · OTLP HTTP :4318 AWS MANAGED BACKEND METRICS STORE Amazon Managed Prometheus (AMP) remote_write SigV4 auth · Long-term retention · PromQL query endpoint EC2 · t3.small GRAFANA DASHBOARD Grafana :3000 · 10.30.2.50 Request Rate req/s ↑ Error Rate 5xx errors Latency p99 ms Pod CPU / Memory · K8s Cluster Health · Custom App Metrics Alerting via Grafana Alerts → webhook / email Query lang: PromQL · Loki LogQL OTLP :4317 OTLP :4317 remote_write SigV4 PromQL
OTLP telemetry (metrics · traces · logs)
remote_write to AMP (SigV4)
PromQL queries to Grafana
03 — CI / CD Pipeline
CI — CONTINUOUS INTEGRATION (GitHub Actions) CD — CONTINUOUS DELIVERY (ArgoCD GitOps) TRIGGER BUILD PUBLISH ARGOCD EKS DEPLOY DEVELOPER git push main · feature/* ACTIONS RUNNER GitHub Actions on: push · pull_request BUILD STAGE DOCKER BUILD Multi-stage API · Feedback images python:3.11-slim · eclipse-temurin TESTS pytest / JUnit unit + integration SCAN Trivy · Ruff vuln scan + lint PUBLISH STAGE PUSH IMAGE Amazon ECR docker push :git-sha aws_account.dkr.ecr.region IMAGE TAG :latest + :abc1234 short git-sha for traceability EKS CLUSTER · riddlebuddy-prod GITOPS CONTROLLER ArgoCD Polls ECR for new images interval: 3 min App: riddlebuddy · Helm chart Namespace: riddlebuddy-prod Self-heal · Auto-sync enabled DEPLOYED PODS riddlebuddy-api :8000 riddlebuddy-feedback :8080 redis :6379 ARGOCD SYNC STATUS ● Synced ● Healthy ⟳ OutOfSync → auto New image on ECR detected → ArgoCD triggers rolling update → Pods replaced with zero downtime ON FAILURE ArgoCD detects unhealthy pods → auto-rollback to last known good revision argocd app rollback riddlebuddy · History preserved in ArgoCD UI push polls ECR new image? sync CI PIPELINE NOTES Triggered on: push to main / PR · Runs on: ubuntu-latest GitHub-hosted runner AWS creds via OIDC (no long-lived keys) · DEEPSEEK_API_KEY → GitHub Secrets → K8s Secret Image tagged with short git-sha for full traceability across CI and ArgoCD history
Git trigger / success path
Build pipeline (GitHub Actions)
ECR publish
ArgoCD GitOps sync
Security scan
Failure / auto-rollback