4.0 KiB
4.0 KiB
Replica-Based Build Locking System
🎯 Concept
Instead of using lock files, we use Kubernetes deployment replica scaling as an atomic locking mechanism:
- Replicas = 0: No build running (lock available)
- Replicas = 1: Build in progress (lock acquired)
🔧 How It Works
Build Start (Lock Acquisition)
# Check if lock is available
CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')
if [ "$CURRENT_REPLICAS" = "0" ]; then
# Acquire lock by scaling up
kubectl scale deployment buildah-external --replicas=1
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s
else
# Lock unavailable - build already running
exit 1
fi
Build End (Lock Release)
# Always release lock (runs on success OR failure)
kubectl scale deployment buildah-external --replicas=0
kubectl wait --for=delete pod -l app=buildah-external --timeout=60s
✅ Benefits
🔒 Atomic Operations
- Kubernetes guarantees atomic scaling operations
- No race conditions possible between concurrent builds
- Built-in conflict resolution via Kubernetes API
🚀 Resource Efficiency
- Zero resource usage when no builds are running
- Pod only exists during active builds
- Automatic cleanup of compute resources
🛡️ Robust Error Handling
- Scale-down always runs (success or failure)
- No stale locks - Kubernetes manages lifecycle
- Self-healing if pods crash during build
📊 Observable State
- Easy monitoring:
kubectl get deployment buildah-external - Clear status: Replica count = build status
- No hidden state in lock files
🔄 Build Pipeline Flow
graph TD
A[Build Triggered] --> B{Check Replicas}
B -->|replicas=0| C[Scale to 1]
B -->|replicas≠0| D[❌ Build Already Running]
C --> E[Wait for Pod Ready]
E --> F[Execute Build]
F --> G[Scale to 0]
G --> H[✅ Build Complete]
D --> I[❌ Exit with Error]
📋 Pipeline Implementation
Build Step
{
name: "build-via-external-buildah",
commands: [
// Check current replicas
"CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')",
// Acquire lock or fail
"if [ \"$CURRENT_REPLICAS\" = \"0\" ]; then",
" kubectl scale deployment buildah-external --replicas=1",
" kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s",
"else",
" echo \"Build already running!\"; exit 1",
"fi",
// ... build commands ...
]
}
Cleanup Step
{
name: "scale-down-buildah",
commands: [
"kubectl scale deployment buildah-external --replicas=0",
"kubectl wait --for=delete pod -l app=buildah-external --timeout=60s"
],
when: {
status: ["success", "failure"] // Always runs
}
}
🧪 Testing
Use the test script to verify the locking mechanism:
pipeline/test-replica-locking.sh
This tests:
- ✅ Lock acquisition when available
- ✅ Lock blocking when unavailable
- ✅ Proper lock release
- ✅ System reset for next build
🔍 Monitoring
Check Build Status
# Quick status check
kubectl get deployment buildah-external -n apps--droneio--prd
# Detailed status
kubectl describe deployment buildah-external -n apps--droneio--prd
Build Status Meanings
- READY 0/0: No build running, system idle
- READY 0/1: Build starting, pod creating
- READY 1/1: Build active, pod running
- READY 1/0: Build ending, pod terminating
🎯 Migration Notes
This approach replaces:
- ❌ Lock file creation/deletion
- ❌ Lock timeout mechanisms
- ❌ Lock cleanup scripts
- ❌ Manual pod discovery
With Kubernetes-native:
- ✅ Atomic scaling operations
- ✅ Built-in conflict resolution
- ✅ Automatic resource management
- ✅ Observable state
The system is now simpler, more reliable, and more efficient! 🚀