Files
Maison/arti-api/auth-service/pipeline/REPLICA-LOCKING.md
2026-02-10 12:12:11 +01:00

152 lines
4.0 KiB
Markdown

# Replica-Based Build Locking System
## 🎯 **Concept**
Instead of using lock files, we use Kubernetes deployment **replica scaling** as an atomic locking mechanism:
- **Replicas = 0**: No build running (lock available)
- **Replicas = 1**: Build in progress (lock acquired)
## 🔧 **How It Works**
### **Build Start (Lock Acquisition)**
```bash
# Check if lock is available
CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')
if [ "$CURRENT_REPLICAS" = "0" ]; then
# Acquire lock by scaling up
kubectl scale deployment buildah-external --replicas=1
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s
else
# Lock unavailable - build already running
exit 1
fi
```
### **Build End (Lock Release)**
```bash
# Always release lock (runs on success OR failure)
kubectl scale deployment buildah-external --replicas=0
kubectl wait --for=delete pod -l app=buildah-external --timeout=60s
```
## ✅ **Benefits**
### **🔒 Atomic Operations**
- **Kubernetes guarantees** atomic scaling operations
- **No race conditions** possible between concurrent builds
- **Built-in conflict resolution** via Kubernetes API
### **🚀 Resource Efficiency**
- **Zero resource usage** when no builds are running
- **Pod only exists** during active builds
- **Automatic cleanup** of compute resources
### **🛡️ Robust Error Handling**
- **Scale-down always runs** (success or failure)
- **No stale locks** - Kubernetes manages lifecycle
- **Self-healing** if pods crash during build
### **📊 Observable State**
- **Easy monitoring**: `kubectl get deployment buildah-external`
- **Clear status**: Replica count = build status
- **No hidden state** in lock files
## 🔄 **Build Pipeline Flow**
```mermaid
graph TD
A[Build Triggered] --> B{Check Replicas}
B -->|replicas=0| C[Scale to 1]
B -->|replicas≠0| D[❌ Build Already Running]
C --> E[Wait for Pod Ready]
E --> F[Execute Build]
F --> G[Scale to 0]
G --> H[✅ Build Complete]
D --> I[❌ Exit with Error]
```
## 📋 **Pipeline Implementation**
### **Build Step**
```jsonnet
{
name: "build-via-external-buildah",
commands: [
// Check current replicas
"CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')",
// Acquire lock or fail
"if [ \"$CURRENT_REPLICAS\" = \"0\" ]; then",
" kubectl scale deployment buildah-external --replicas=1",
" kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s",
"else",
" echo \"Build already running!\"; exit 1",
"fi",
// ... build commands ...
]
}
```
### **Cleanup Step**
```jsonnet
{
name: "scale-down-buildah",
commands: [
"kubectl scale deployment buildah-external --replicas=0",
"kubectl wait --for=delete pod -l app=buildah-external --timeout=60s"
],
when: {
status: ["success", "failure"] // Always runs
}
}
```
## 🧪 **Testing**
Use the test script to verify the locking mechanism:
```bash
pipeline/test-replica-locking.sh
```
This tests:
- ✅ Lock acquisition when available
- ✅ Lock blocking when unavailable
- ✅ Proper lock release
- ✅ System reset for next build
## 🔍 **Monitoring**
### **Check Build Status**
```bash
# Quick status check
kubectl get deployment buildah-external -n apps--droneio--prd
# Detailed status
kubectl describe deployment buildah-external -n apps--droneio--prd
```
### **Build Status Meanings**
- **READY 0/0**: No build running, system idle
- **READY 0/1**: Build starting, pod creating
- **READY 1/1**: Build active, pod running
- **READY 1/0**: Build ending, pod terminating
## 🎯 **Migration Notes**
This approach **replaces**:
- ❌ Lock file creation/deletion
- ❌ Lock timeout mechanisms
- ❌ Lock cleanup scripts
- ❌ Manual pod discovery
With **Kubernetes-native**:
- ✅ Atomic scaling operations
- ✅ Built-in conflict resolution
- ✅ Automatic resource management
- ✅ Observable state
The system is now **simpler, more reliable, and more efficient**! 🚀