# Replica-Based Build Locking System ## 🎯 **Concept** Instead of using lock files, we use Kubernetes deployment **replica scaling** as an atomic locking mechanism: - **Replicas = 0**: No build running (lock available) - **Replicas = 1**: Build in progress (lock acquired) ## πŸ”§ **How It Works** ### **Build Start (Lock Acquisition)** ```bash # Check if lock is available CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}') if [ "$CURRENT_REPLICAS" = "0" ]; then # Acquire lock by scaling up kubectl scale deployment buildah-external --replicas=1 kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s else # Lock unavailable - build already running exit 1 fi ``` ### **Build End (Lock Release)** ```bash # Always release lock (runs on success OR failure) kubectl scale deployment buildah-external --replicas=0 kubectl wait --for=delete pod -l app=buildah-external --timeout=60s ``` ## βœ… **Benefits** ### **πŸ”’ Atomic Operations** - **Kubernetes guarantees** atomic scaling operations - **No race conditions** possible between concurrent builds - **Built-in conflict resolution** via Kubernetes API ### **πŸš€ Resource Efficiency** - **Zero resource usage** when no builds are running - **Pod only exists** during active builds - **Automatic cleanup** of compute resources ### **πŸ›‘οΈ Robust Error Handling** - **Scale-down always runs** (success or failure) - **No stale locks** - Kubernetes manages lifecycle - **Self-healing** if pods crash during build ### **πŸ“Š Observable State** - **Easy monitoring**: `kubectl get deployment buildah-external` - **Clear status**: Replica count = build status - **No hidden state** in lock files ## πŸ”„ **Build Pipeline Flow** ```mermaid graph TD A[Build Triggered] --> B{Check Replicas} B -->|replicas=0| C[Scale to 1] B -->|replicasβ‰ 0| D[❌ Build Already Running] C --> E[Wait for Pod Ready] E --> F[Execute Build] F --> G[Scale to 0] G --> H[βœ… Build Complete] D --> I[❌ Exit with Error] ``` ## πŸ“‹ **Pipeline Implementation** ### **Build Step** ```jsonnet { name: "build-via-external-buildah", commands: [ // Check current replicas "CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')", // Acquire lock or fail "if [ \"$CURRENT_REPLICAS\" = \"0\" ]; then", " kubectl scale deployment buildah-external --replicas=1", " kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s", "else", " echo \"Build already running!\"; exit 1", "fi", // ... build commands ... ] } ``` ### **Cleanup Step** ```jsonnet { name: "scale-down-buildah", commands: [ "kubectl scale deployment buildah-external --replicas=0", "kubectl wait --for=delete pod -l app=buildah-external --timeout=60s" ], when: { status: ["success", "failure"] // Always runs } } ``` ## πŸ§ͺ **Testing** Use the test script to verify the locking mechanism: ```bash pipeline/test-replica-locking.sh ``` This tests: - βœ… Lock acquisition when available - βœ… Lock blocking when unavailable - βœ… Proper lock release - βœ… System reset for next build ## πŸ” **Monitoring** ### **Check Build Status** ```bash # Quick status check kubectl get deployment buildah-external -n apps--droneio--prd # Detailed status kubectl describe deployment buildah-external -n apps--droneio--prd ``` ### **Build Status Meanings** - **READY 0/0**: No build running, system idle - **READY 0/1**: Build starting, pod creating - **READY 1/1**: Build active, pod running - **READY 1/0**: Build ending, pod terminating ## 🎯 **Migration Notes** This approach **replaces**: - ❌ Lock file creation/deletion - ❌ Lock timeout mechanisms - ❌ Lock cleanup scripts - ❌ Manual pod discovery With **Kubernetes-native**: - βœ… Atomic scaling operations - βœ… Built-in conflict resolution - βœ… Automatic resource management - βœ… Observable state The system is now **simpler, more reliable, and more efficient**! πŸš€