Files
Maison/arti-api/auth-service/pipeline/GRACEFUL-TERMINATION.md
2026-02-10 12:12:11 +01:00

4.1 KiB

Graceful Termination Solutions for Buildah Container

🎯 Problem

sleep infinity ignores SIGTERM signals, forcing Kubernetes to wait for SIGKILL timeout (default 30 seconds). This causes:

  • Slow pod termination
  • 💸 Unnecessary resource usage during termination
  • 🐌 Slower scaling operations

Solutions Implemented

command: ["/bin/bash"]
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]

Benefits:

  • Immediate response to SIGTERM (tested: 2 seconds)
  • Simple implementation - no external dependencies
  • Compatible with existing infrastructure
  • Resource efficient - responsive sleep loops

⚙️ Configuration Parameters

terminationGracePeriodSeconds: 5  # Reduced from default 30s
readinessProbe:
  exec:
    command: ["/bin/bash", "-c", "buildah --version"]
  initialDelaySeconds: 5
  periodSeconds: 10

📊 Performance Comparison

Method Termination Time Complexity Resource Usage
sleep infinity 30s (SIGKILL) Low High during termination
Signal-aware loop 2s Low Low
Custom entrypoint 3-5s Medium Low
Chart override Variable High Low

🔧 Implementation Options

Option 1: Direct Deployment Update

command: ["/bin/bash"]
args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
terminationGracePeriodSeconds: 5

Use when: Direct control over deployment YAML

Option 2: Chart Override Values

# For Helm chart deployments
buildah-external:
  command: ["/bin/bash"]
  args: ["-c", "trap 'exit 0' TERM; while true; do sleep 30 & wait $!; done"]
  terminationGracePeriodSeconds: 5

Use when: Deployment managed by Helm charts

Option 3: ConfigMap Entrypoint

# More sophisticated signal handling with cleanup
volumeMounts:
- name: entrypoint-script
  mountPath: /scripts
volumes:
- name: entrypoint-script
  configMap:
    name: buildah-entrypoint

Use when: Need complex termination logic or cleanup

🧪 Validation

Test Graceful Termination

pipeline/test-graceful-termination.sh

Validates:

  • Pod responsiveness during operation
  • Signal handling speed (target: <10s)
  • Clean termination without SIGKILL
  • Proper deployment scaling

Test Results

✅ Pod terminated in 2 seconds
🎉 Excellent! Graceful termination completed quickly (≤10s)
📝 Method: Signal-aware bash loop with trap

🔄 Integration with Replica Locking

The signal-aware termination works perfectly with the replica-based locking system:

# Scale up (acquire lock) - fast startup
kubectl scale deployment buildah-external --replicas=1
kubectl wait --for=condition=ready pod -l app=buildah-external --timeout=60s

# Scale down (release lock) - fast termination  
kubectl scale deployment buildah-external --replicas=0
kubectl wait --for=delete pod -l app=buildah-external --timeout=10s  # Much faster!

📋 Migration Steps

  1. Update deployment with signal-aware command
  2. Reduce termination grace period to 5-10 seconds
  3. Add readiness probe for build verification
  4. Test termination speed with validation script
  5. Monitor build pipeline performance

🎯 Benefits Achieved

  • 🚀 15x faster termination (30s → 2s)
  • 💰 Resource savings during scaling operations
  • 🔧 Better UX for developers (faster builds)
  • Responsive scaling for replica-based locking
  • 🛡️ Robust - handles signals properly

🔍 Monitoring Commands

# Check termination grace period
kubectl get pod <pod-name> -o jsonpath='{.spec.terminationGracePeriodSeconds}'

# Monitor termination events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Test signal responsiveness
kubectl exec <pod-name> -- kill -TERM 1

This solution provides optimal performance while maintaining simplicity and compatibility with existing infrastructure! 🎉