RiddleBuddy · DevOps Portfolio
System
Architecture
AWS EKS · Kubernetes · OpenTelemetry · GitHub Actions
Live Demo
01
App Architecture
02
Observability
03
CI / CD
01 —
Application Architecture
INTERNET
END USERS
Browser
HTTPS :443
EXTERNAL API
DeepSeek AI
HTTPS :443 / REST
VPC · 10.30.0.0/16
PUBLIC SUBNET · 10.30.1.0/24
INTERNET GW
IGW
igw-riddlebuddy
NAT GATEWAY
NAT GW
nat-riddlebuddy
LOAD BALANCER
ALB
:80 → :443 · sg-alb
sg-alb :443
PRIVATE SUBNET · 10.30.2.0/24
EKS CLUSTER · K8s 1.29
INGRESS
NGINX
:8080 · sg-eks
API POD
FastAPI
Python · :8000
10.30.2.10
SERVICE POD
Feedback
Java Spring · :8080
feedback-svc
CACHE POD
Redis
:6379
redis-svc (ClusterIP)
COLLECTOR
OTel
:4317 OTLP · otel-svc
KUBERNETES SERVICES
ClusterIP: feedback-svc · redis-svc · otel-svc · NodePort: api-svc · Namespace: riddlebuddy-prod
EC2 · t3.small
Grafana
:3000 · 10.30.2.50
sg-graf
— OUTSIDE VPC —
AWS MANAGED
Amazon Managed Prometheus (AMP)
HTTPS :443
route /api/*
REST
:6379
→ DeepSeek
remote_write SigV4
PromQL
Public / HTTPS traffic
Internal API routing
Feedback service (REST)
Cache (Redis)
DeepSeek API (outbound)
Observability pipeline
02 —
Observability Stack
EKS CLUSTER
APPLICATION PODS
FASTAPI POD
riddlebuddy-api
OTel SDK (Python)
metrics · traces · logs
FEEDBACK POD
riddlebuddy-fb
OTel SDK (Java)
metrics · traces · logs
CLUSTER
ClusterMetrics
kubelet · kube-state
node / pod metrics
COLLECTION LAYER
OTEL COLLECTOR
opentelemetry-collector
Receives: OTLP REST :4317 · OTLP HTTP :4318
AWS MANAGED BACKEND
METRICS STORE
Amazon Managed Prometheus (AMP)
remote_write SigV4 auth · Long-term retention · PromQL query endpoint
EC2 · t3.small
GRAFANA DASHBOARD
Grafana
:3000 · 10.30.2.50
Request Rate
req/s ↑
Error Rate
5xx errors
Latency
p99 ms
Pod CPU / Memory · K8s Cluster Health · Custom App Metrics
Alerting via Grafana Alerts → webhook / email
Query lang: PromQL · Loki LogQL
OTLP :4317
OTLP :4317
remote_write SigV4
PromQL
OTLP telemetry (metrics · traces · logs)
remote_write to AMP (SigV4)
PromQL queries to Grafana
03 —
CI / CD Pipeline
CI — CONTINUOUS INTEGRATION (GitHub Actions)
CD — CONTINUOUS DELIVERY (ArgoCD GitOps)
TRIGGER
BUILD
PUBLISH
ARGOCD
EKS DEPLOY
DEVELOPER
git push
main · feature/*
ACTIONS RUNNER
GitHub Actions
on: push · pull_request
BUILD STAGE
DOCKER BUILD
Multi-stage
API · Feedback images
python:3.11-slim · eclipse-temurin
TESTS
pytest / JUnit
unit + integration
SCAN
Trivy · Ruff
vuln scan + lint
PUBLISH STAGE
PUSH IMAGE
Amazon ECR
docker push :git-sha
aws_account.dkr.ecr.region
IMAGE TAG
:latest + :abc1234
short git-sha for traceability
EKS CLUSTER · riddlebuddy-prod
GITOPS CONTROLLER
ArgoCD
Polls ECR for new images
interval: 3 min
App: riddlebuddy · Helm chart
Namespace: riddlebuddy-prod
Self-heal · Auto-sync enabled
DEPLOYED PODS
riddlebuddy-api :8000
riddlebuddy-feedback :8080
redis :6379
ARGOCD SYNC STATUS
● Synced
● Healthy
⟳ OutOfSync → auto
New image on ECR detected → ArgoCD triggers rolling update → Pods replaced with zero downtime
ON FAILURE
ArgoCD detects unhealthy pods → auto-rollback to last known good revision
argocd app rollback riddlebuddy · History preserved in ArgoCD UI
push
polls ECR
new image?
sync
CI PIPELINE NOTES
Triggered on: push to main / PR · Runs on: ubuntu-latest GitHub-hosted runner
AWS creds via OIDC (no long-lived keys) · DEEPSEEK_API_KEY → GitHub Secrets → K8s Secret
Image tagged with short git-sha for full traceability across CI and ArgoCD history
Git trigger / success path
Build pipeline (GitHub Actions)
ECR publish
ArgoCD GitOps sync
Security scan
Failure / auto-rollback