152 lines
4.0 KiB
Markdown
152 lines
4.0 KiB
Markdown
# Replica-Based Build Locking System
|
|
|
|
## 🎯 **Concept**
|
|
|
|
Instead of using lock files, we use Kubernetes deployment **replica scaling** as an atomic locking mechanism:
|
|
|
|
- **Replicas = 0**: No build running (lock available)
|
|
- **Replicas = 1**: Build in progress (lock acquired)
|
|
|
|
## 🔧 **How It Works**
|
|
|
|
### **Build Start (Lock Acquisition)**
|
|
```bash
|
|
# Check if lock is available
|
|
CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')
|
|
|
|
if [ "$CURRENT_REPLICAS" = "0" ]; then
|
|
# Acquire lock by scaling up
|
|
kubectl scale deployment buildah-external --replicas=1
|
|
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s
|
|
else
|
|
# Lock unavailable - build already running
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
### **Build End (Lock Release)**
|
|
```bash
|
|
# Always release lock (runs on success OR failure)
|
|
kubectl scale deployment buildah-external --replicas=0
|
|
kubectl wait --for=delete pod -l app=buildah-external --timeout=60s
|
|
```
|
|
|
|
## ✅ **Benefits**
|
|
|
|
### **🔒 Atomic Operations**
|
|
- **Kubernetes guarantees** atomic scaling operations
|
|
- **No race conditions** possible between concurrent builds
|
|
- **Built-in conflict resolution** via Kubernetes API
|
|
|
|
### **🚀 Resource Efficiency**
|
|
- **Zero resource usage** when no builds are running
|
|
- **Pod only exists** during active builds
|
|
- **Automatic cleanup** of compute resources
|
|
|
|
### **🛡️ Robust Error Handling**
|
|
- **Scale-down always runs** (success or failure)
|
|
- **No stale locks** - Kubernetes manages lifecycle
|
|
- **Self-healing** if pods crash during build
|
|
|
|
### **📊 Observable State**
|
|
- **Easy monitoring**: `kubectl get deployment buildah-external`
|
|
- **Clear status**: Replica count = build status
|
|
- **No hidden state** in lock files
|
|
|
|
## 🔄 **Build Pipeline Flow**
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Build Triggered] --> B{Check Replicas}
|
|
B -->|replicas=0| C[Scale to 1]
|
|
B -->|replicas≠0| D[❌ Build Already Running]
|
|
C --> E[Wait for Pod Ready]
|
|
E --> F[Execute Build]
|
|
F --> G[Scale to 0]
|
|
G --> H[✅ Build Complete]
|
|
D --> I[❌ Exit with Error]
|
|
```
|
|
|
|
## 📋 **Pipeline Implementation**
|
|
|
|
### **Build Step**
|
|
```jsonnet
|
|
{
|
|
name: "build-via-external-buildah",
|
|
commands: [
|
|
// Check current replicas
|
|
"CURRENT_REPLICAS=$(kubectl get deployment buildah-external -o jsonpath='{.spec.replicas}')",
|
|
|
|
// Acquire lock or fail
|
|
"if [ \"$CURRENT_REPLICAS\" = \"0\" ]; then",
|
|
" kubectl scale deployment buildah-external --replicas=1",
|
|
" kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=120s",
|
|
"else",
|
|
" echo \"Build already running!\"; exit 1",
|
|
"fi",
|
|
|
|
// ... build commands ...
|
|
]
|
|
}
|
|
```
|
|
|
|
### **Cleanup Step**
|
|
```jsonnet
|
|
{
|
|
name: "scale-down-buildah",
|
|
commands: [
|
|
"kubectl scale deployment buildah-external --replicas=0",
|
|
"kubectl wait --for=delete pod -l app=buildah-external --timeout=60s"
|
|
],
|
|
when: {
|
|
status: ["success", "failure"] // Always runs
|
|
}
|
|
}
|
|
```
|
|
|
|
## 🧪 **Testing**
|
|
|
|
Use the test script to verify the locking mechanism:
|
|
|
|
```bash
|
|
pipeline/test-replica-locking.sh
|
|
```
|
|
|
|
This tests:
|
|
- ✅ Lock acquisition when available
|
|
- ✅ Lock blocking when unavailable
|
|
- ✅ Proper lock release
|
|
- ✅ System reset for next build
|
|
|
|
## 🔍 **Monitoring**
|
|
|
|
### **Check Build Status**
|
|
```bash
|
|
# Quick status check
|
|
kubectl get deployment buildah-external -n apps--droneio--prd
|
|
|
|
# Detailed status
|
|
kubectl describe deployment buildah-external -n apps--droneio--prd
|
|
```
|
|
|
|
### **Build Status Meanings**
|
|
- **READY 0/0**: No build running, system idle
|
|
- **READY 0/1**: Build starting, pod creating
|
|
- **READY 1/1**: Build active, pod running
|
|
- **READY 1/0**: Build ending, pod terminating
|
|
|
|
## 🎯 **Migration Notes**
|
|
|
|
This approach **replaces**:
|
|
- ❌ Lock file creation/deletion
|
|
- ❌ Lock timeout mechanisms
|
|
- ❌ Lock cleanup scripts
|
|
- ❌ Manual pod discovery
|
|
|
|
With **Kubernetes-native**:
|
|
- ✅ Atomic scaling operations
|
|
- ✅ Built-in conflict resolution
|
|
- ✅ Automatic resource management
|
|
- ✅ Observable state
|
|
|
|
The system is now **simpler, more reliable, and more efficient**! 🚀 |