144 lines
4.1 KiB
Markdown
144 lines
4.1 KiB
Markdown
# Graceful Termination Solutions for Buildah Container
|
|
|
|
## 🎯 **Problem**
|
|
|
|
`sleep infinity` ignores SIGTERM signals, forcing Kubernetes to wait for SIGKILL timeout (default 30 seconds). This causes:
|
|
- ⏳ Slow pod termination
|
|
- 💸 Unnecessary resource usage during termination
|
|
- 🐌 Slower scaling operations
|
|
|
|
## ✅ **Solutions Implemented**
|
|
|
|
### **🥇 Recommended: Signal-Aware Bash Loop**
|
|
|
|
```bash
|
|
command: ["/bin/bash"]
|
|
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
|
|
```
|
|
|
|
**Benefits:**
|
|
- ✅ **Immediate response** to SIGTERM (tested: 2 seconds)
|
|
- ✅ **Simple implementation** - no external dependencies
|
|
- ✅ **Compatible** with existing infrastructure
|
|
- ✅ **Resource efficient** - responsive sleep loops
|
|
|
|
### **⚙️ Configuration Parameters**
|
|
|
|
```yaml
|
|
terminationGracePeriodSeconds: 5 # Reduced from default 30s
|
|
readinessProbe:
|
|
exec:
|
|
command: ["/bin/bash", "-c", "buildah --version"]
|
|
initialDelaySeconds: 5
|
|
periodSeconds: 10
|
|
```
|
|
|
|
## 📊 **Performance Comparison**
|
|
|
|
| Method | Termination Time | Complexity | Resource Usage |
|
|
|--------|------------------|------------|----------------|
|
|
| `sleep infinity` | 30s (SIGKILL) | Low | High during termination |
|
|
| **Signal-aware loop** | **2s** | Low | **Low** |
|
|
| Custom entrypoint | 3-5s | Medium | Low |
|
|
| Chart override | Variable | High | Low |
|
|
|
|
## 🔧 **Implementation Options**
|
|
|
|
### **Option 1: Direct Deployment Update** ⭐
|
|
```yaml
|
|
command: ["/bin/bash"]
|
|
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
|
|
terminationGracePeriodSeconds: 5
|
|
```
|
|
|
|
**Use when:** Direct control over deployment YAML
|
|
|
|
### **Option 2: Chart Override Values**
|
|
```yaml
|
|
# For Helm chart deployments
|
|
buildah-external:
|
|
command: ["/bin/bash"]
|
|
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
|
|
terminationGracePeriodSeconds: 5
|
|
```
|
|
|
|
**Use when:** Deployment managed by Helm charts
|
|
|
|
### **Option 3: ConfigMap Entrypoint**
|
|
```yaml
|
|
# More sophisticated signal handling with cleanup
|
|
volumeMounts:
|
|
- name: entrypoint-script
|
|
mountPath: /scripts
|
|
volumes:
|
|
- name: entrypoint-script
|
|
configMap:
|
|
name: buildah-entrypoint
|
|
```
|
|
|
|
**Use when:** Need complex termination logic or cleanup
|
|
|
|
## 🧪 **Validation**
|
|
|
|
### **Test Graceful Termination**
|
|
```bash
|
|
pipeline/test-graceful-termination.sh
|
|
```
|
|
|
|
**Validates:**
|
|
- ✅ Pod responsiveness during operation
|
|
- ✅ Signal handling speed (target: <10s)
|
|
- ✅ Clean termination without SIGKILL
|
|
- ✅ Proper deployment scaling
|
|
|
|
### **Test Results**
|
|
```
|
|
✅ Pod terminated in 2 seconds
|
|
🎉 Excellent! Graceful termination completed quickly (≤10s)
|
|
📝 Method: Signal-aware bash loop with trap
|
|
```
|
|
|
|
## 🔄 **Integration with Replica Locking**
|
|
|
|
The signal-aware termination works perfectly with the replica-based locking system:
|
|
|
|
```bash
|
|
# Scale up (acquire lock) - fast startup
|
|
kubectl scale deployment buildah-external --replicas=1
|
|
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=60s
|
|
|
|
# Scale down (release lock) - fast termination
|
|
kubectl scale deployment buildah-external --replicas=0
|
|
kubectl wait --for=delete pod -l app=buildah-external --timeout=10s # Much faster!
|
|
```
|
|
|
|
## 📋 **Migration Steps**
|
|
|
|
1. **Update deployment** with signal-aware command
|
|
2. **Reduce termination grace period** to 5-10 seconds
|
|
3. **Add readiness probe** for build verification
|
|
4. **Test termination speed** with validation script
|
|
5. **Monitor** build pipeline performance
|
|
|
|
## 🎯 **Benefits Achieved**
|
|
|
|
- **🚀 15x faster termination** (30s → 2s)
|
|
- **💰 Resource savings** during scaling operations
|
|
- **🔧 Better UX** for developers (faster builds)
|
|
- **⚡ Responsive scaling** for replica-based locking
|
|
- **🛡️ Robust** - handles signals properly
|
|
|
|
## 🔍 **Monitoring Commands**
|
|
|
|
```bash
|
|
# Check termination grace period
|
|
kubectl get pod <pod-name> -o jsonpath='{.spec.terminationGracePeriodSeconds}'
|
|
|
|
# Monitor termination events
|
|
kubectl get events --field-selector involvedObject.name=<pod-name>
|
|
|
|
# Test signal responsiveness
|
|
kubectl exec <pod-name> -- kill -TERM 1
|
|
```
|
|
|
|
This solution provides **optimal performance** while maintaining **simplicity** and **compatibility** with existing infrastructure! 🎉 |