Files

Serge NOEL c3176e8d79 Initialisation depot

2026-02-10 12:12:11 +01:00

4.1 KiB

Raw Permalink Blame History

Graceful Termination Solutions for Buildah Container

🎯 Problem

sleep infinity ignores SIGTERM signals, forcing Kubernetes to wait for SIGKILL timeout (default 30 seconds). This causes:

⏳ Slow pod termination
💸 Unnecessary resource usage during termination
🐌 Slower scaling operations

✅ Solutions Implemented

🥇 Recommended: Signal-Aware Bash Loop

command: ["/bin/bash"]
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]

Benefits:

✅ Immediate response to SIGTERM (tested: 2 seconds)
✅ Simple implementation - no external dependencies
✅ Compatible with existing infrastructure
✅ Resource efficient - responsive sleep loops

⚙️ Configuration Parameters

terminationGracePeriodSeconds: 5  # Reduced from default 30s
readinessProbe:
  exec:
    command: ["/bin/bash", "-c", "buildah --version"]
  initialDelaySeconds: 5
  periodSeconds: 10

📊 Performance Comparison

Method	Termination Time	Complexity	Resource Usage
`sleep infinity`	30s (SIGKILL)	Low	High during termination
Signal-aware loop	2s	Low	Low
Custom entrypoint	3-5s	Medium	Low
Chart override	Variable	High	Low

🔧 Implementation Options

Option 1: Direct Deployment Update ⭐

command: ["/bin/bash"]
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
terminationGracePeriodSeconds: 5

Use when: Direct control over deployment YAML

Option 2: Chart Override Values

# For Helm chart deployments
buildah-external:
  command: ["/bin/bash"]
  args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
  terminationGracePeriodSeconds: 5

Use when: Deployment managed by Helm charts

Option 3: ConfigMap Entrypoint

# More sophisticated signal handling with cleanup
volumeMounts:
- name: entrypoint-script
  mountPath: /scripts
volumes:
- name: entrypoint-script
  configMap:
    name: buildah-entrypoint

Use when: Need complex termination logic or cleanup

🧪 Validation

Test Graceful Termination

pipeline/test-graceful-termination.sh

Validates:

✅ Pod responsiveness during operation
✅ Signal handling speed (target: <10s)
✅ Clean termination without SIGKILL
✅ Proper deployment scaling

Test Results

✅ Pod terminated in 2 seconds
🎉 Excellent! Graceful termination completed quickly (≤10s)
📝 Method: Signal-aware bash loop with trap

🔄 Integration with Replica Locking

The signal-aware termination works perfectly with the replica-based locking system:

# Scale up (acquire lock) - fast startup
kubectl scale deployment buildah-external --replicas=1
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=60s

# Scale down (release lock) - fast termination  
kubectl scale deployment buildah-external --replicas=0
kubectl wait --for=delete pod -l app=buildah-external --timeout=10s  # Much faster!

📋 Migration Steps

Update deployment with signal-aware command
Reduce termination grace period to 5-10 seconds
Add readiness probe for build verification
Test termination speed with validation script
Monitor build pipeline performance

🎯 Benefits Achieved

🚀 15x faster termination (30s → 2s)
💰 Resource savings during scaling operations
🔧 Better UX for developers (faster builds)
⚡ Responsive scaling for replica-based locking
🛡️ Robust - handles signals properly

🔍 Monitoring Commands

# Check termination grace period
kubectl get pod <pod-name> -o jsonpath='{.spec.terminationGracePeriodSeconds}'

# Monitor termination events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Test signal responsiveness
kubectl exec <pod-name> -- kill -TERM 1

This solution provides optimal performance while maintaining simplicity and compatibility with existing infrastructure! 🎉

4.1 KiB Raw Permalink Blame History