Files
Maison/arti-api/auth-service/pipeline/REPLICA-LOCKING.md
2026-02-10 12:12:11 +01:00

4.0 KiB

Replica-Based Build Locking System

🎯 Concept

Instead of using lock files, we use Kubernetes deployment replica scaling as an atomic locking mechanism:

  • Replicas = 0: No build running (lock available)
  • Replicas = 1: Build in progress (lock acquired)

🔧 How It Works

Build Start (Lock Acquisition)

# Check if lock is available
CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')

if [ "$CURRENT_REPLICAS" = "0" ]; then
  # Acquire lock by scaling up
  kubectl scale deployment buildah-external --replicas=1
  kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s
else
  # Lock unavailable - build already running
  exit 1
fi

Build End (Lock Release)

# Always release lock (runs on success OR failure)
kubectl scale deployment buildah-external --replicas=0
kubectl wait --for=delete pod -l app=buildah-external --timeout=60s

Benefits

🔒 Atomic Operations

  • Kubernetes guarantees atomic scaling operations
  • No race conditions possible between concurrent builds
  • Built-in conflict resolution via Kubernetes API

🚀 Resource Efficiency

  • Zero resource usage when no builds are running
  • Pod only exists during active builds
  • Automatic cleanup of compute resources

🛡️ Robust Error Handling

  • Scale-down always runs (success or failure)
  • No stale locks - Kubernetes manages lifecycle
  • Self-healing if pods crash during build

📊 Observable State

  • Easy monitoring: kubectl get deployment buildah-external
  • Clear status: Replica count = build status
  • No hidden state in lock files

🔄 Build Pipeline Flow

graph TD
    A[Build Triggered] --> B{Check Replicas}
    B -->|replicas=0| C[Scale to 1]
    B -->|replicas≠0| D[❌ Build Already Running]
    C --> E[Wait for Pod Ready]
    E --> F[Execute Build]
    F --> G[Scale to 0]
    G --> H[✅ Build Complete]
    D --> I[❌ Exit with Error]

📋 Pipeline Implementation

Build Step

{
  name: "build-via-external-buildah",
  commands: [
    // Check current replicas
    "CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')",
    
    // Acquire lock or fail
    "if [ \"$CURRENT_REPLICAS\" = \"0\" ]; then",
    "  kubectl scale deployment buildah-external --replicas=1",
    "  kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s",
    "else",
    "  echo \"Build already running!\"; exit 1",
    "fi",
    
    // ... build commands ...
  ]
}

Cleanup Step

{
  name: "scale-down-buildah",
  commands: [
    "kubectl scale deployment buildah-external --replicas=0",
    "kubectl wait --for=delete pod -l app=buildah-external --timeout=60s"
  ],
  when: {
    status: ["success", "failure"]  // Always runs
  }
}

🧪 Testing

Use the test script to verify the locking mechanism:

pipeline/test-replica-locking.sh

This tests:

  • Lock acquisition when available
  • Lock blocking when unavailable
  • Proper lock release
  • System reset for next build

🔍 Monitoring

Check Build Status

# Quick status check
kubectl get deployment buildah-external -n apps--droneio--prd

# Detailed status
kubectl describe deployment buildah-external -n apps--droneio--prd

Build Status Meanings

  • READY 0/0: No build running, system idle
  • READY 0/1: Build starting, pod creating
  • READY 1/1: Build active, pod running
  • READY 1/0: Build ending, pod terminating

🎯 Migration Notes

This approach replaces:

  • Lock file creation/deletion
  • Lock timeout mechanisms
  • Lock cleanup scripts
  • Manual pod discovery

With Kubernetes-native:

  • Atomic scaling operations
  • Built-in conflict resolution
  • Automatic resource management
  • Observable state

The system is now simpler, more reliable, and more efficient! 🚀