Initialisation depot
This commit is contained in:
144
arti-api/auth-service/pipeline/GRACEFUL-TERMINATION.md
Normal file
144
arti-api/auth-service/pipeline/GRACEFUL-TERMINATION.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# Graceful Termination Solutions for Buildah Container
|
||||
|
||||
## 🎯 **Problem**
|
||||
|
||||
`sleep infinity` ignores SIGTERM signals, forcing Kubernetes to wait for SIGKILL timeout (default 30 seconds). This causes:
|
||||
- ⏳ Slow pod termination
|
||||
- 💸 Unnecessary resource usage during termination
|
||||
- 🐌 Slower scaling operations
|
||||
|
||||
## ✅ **Solutions Implemented**
|
||||
|
||||
### **🥇 Recommended: Signal-Aware Bash Loop**
|
||||
|
||||
```bash
|
||||
command: ["/bin/bash"]
|
||||
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ **Immediate response** to SIGTERM (tested: 2 seconds)
|
||||
- ✅ **Simple implementation** - no external dependencies
|
||||
- ✅ **Compatible** with existing infrastructure
|
||||
- ✅ **Resource efficient** - responsive sleep loops
|
||||
|
||||
### **⚙️ Configuration Parameters**
|
||||
|
||||
```yaml
|
||||
terminationGracePeriodSeconds: 5 # Reduced from default 30s
|
||||
readinessProbe:
|
||||
exec:
|
||||
command: ["/bin/bash", "-c", "buildah --version"]
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
```
|
||||
|
||||
## 📊 **Performance Comparison**
|
||||
|
||||
| Method | Termination Time | Complexity | Resource Usage |
|
||||
|--------|------------------|------------|----------------|
|
||||
| `sleep infinity` | 30s (SIGKILL) | Low | High during termination |
|
||||
| **Signal-aware loop** | **2s** | Low | **Low** |
|
||||
| Custom entrypoint | 3-5s | Medium | Low |
|
||||
| Chart override | Variable | High | Low |
|
||||
|
||||
## 🔧 **Implementation Options**
|
||||
|
||||
### **Option 1: Direct Deployment Update** ⭐
|
||||
```yaml
|
||||
command: ["/bin/bash"]
|
||||
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
|
||||
terminationGracePeriodSeconds: 5
|
||||
```
|
||||
|
||||
**Use when:** Direct control over deployment YAML
|
||||
|
||||
### **Option 2: Chart Override Values**
|
||||
```yaml
|
||||
# For Helm chart deployments
|
||||
buildah-external:
|
||||
command: ["/bin/bash"]
|
||||
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
|
||||
terminationGracePeriodSeconds: 5
|
||||
```
|
||||
|
||||
**Use when:** Deployment managed by Helm charts
|
||||
|
||||
### **Option 3: ConfigMap Entrypoint**
|
||||
```yaml
|
||||
# More sophisticated signal handling with cleanup
|
||||
volumeMounts:
|
||||
- name: entrypoint-script
|
||||
mountPath: /scripts
|
||||
volumes:
|
||||
- name: entrypoint-script
|
||||
configMap:
|
||||
name: buildah-entrypoint
|
||||
```
|
||||
|
||||
**Use when:** Need complex termination logic or cleanup
|
||||
|
||||
## 🧪 **Validation**
|
||||
|
||||
### **Test Graceful Termination**
|
||||
```bash
|
||||
pipeline/test-graceful-termination.sh
|
||||
```
|
||||
|
||||
**Validates:**
|
||||
- ✅ Pod responsiveness during operation
|
||||
- ✅ Signal handling speed (target: <10s)
|
||||
- ✅ Clean termination without SIGKILL
|
||||
- ✅ Proper deployment scaling
|
||||
|
||||
### **Test Results**
|
||||
```
|
||||
✅ Pod terminated in 2 seconds
|
||||
🎉 Excellent! Graceful termination completed quickly (≤10s)
|
||||
📝 Method: Signal-aware bash loop with trap
|
||||
```
|
||||
|
||||
## 🔄 **Integration with Replica Locking**
|
||||
|
||||
The signal-aware termination works perfectly with the replica-based locking system:
|
||||
|
||||
```bash
|
||||
# Scale up (acquire lock) - fast startup
|
||||
kubectl scale deployment buildah-external --replicas=1
|
||||
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=60s
|
||||
|
||||
# Scale down (release lock) - fast termination
|
||||
kubectl scale deployment buildah-external --replicas=0
|
||||
kubectl wait --for=delete pod -l app=buildah-external --timeout=10s # Much faster!
|
||||
```
|
||||
|
||||
## 📋 **Migration Steps**
|
||||
|
||||
1. **Update deployment** with signal-aware command
|
||||
2. **Reduce termination grace period** to 5-10 seconds
|
||||
3. **Add readiness probe** for build verification
|
||||
4. **Test termination speed** with validation script
|
||||
5. **Monitor** build pipeline performance
|
||||
|
||||
## 🎯 **Benefits Achieved**
|
||||
|
||||
- **🚀 15x faster termination** (30s → 2s)
|
||||
- **💰 Resource savings** during scaling operations
|
||||
- **🔧 Better UX** for developers (faster builds)
|
||||
- **⚡ Responsive scaling** for replica-based locking
|
||||
- **🛡️ Robust** - handles signals properly
|
||||
|
||||
## 🔍 **Monitoring Commands**
|
||||
|
||||
```bash
|
||||
# Check termination grace period
|
||||
kubectl get pod <pod-name> -o jsonpath='{.spec.terminationGracePeriodSeconds}'
|
||||
|
||||
# Monitor termination events
|
||||
kubectl get events --field-selector involvedObject.name=<pod-name>
|
||||
|
||||
# Test signal responsiveness
|
||||
kubectl exec <pod-name> -- kill -TERM 1
|
||||
```
|
||||
|
||||
This solution provides **optimal performance** while maintaining **simplicity** and **compatibility** with existing infrastructure! 🎉
|
||||
Reference in New Issue
Block a user