APiGen Chaos - Chaos Engineering Module
Comprehensive chaos engineering and resilience testing module for APiGen. Verify system behavior under failure conditions, network issues, resource constraints, and service degradation.
Features
🐒 Chaos Monkey Integration
- Latency Injection: Add random delays to method executions
- Exception Throwing: Randomly throw exceptions to simulate failures
- Application Kill: Terminate the application to test recovery
- Resource Stress: Simulate memory and CPU pressure
🌐 Network Chaos (Toxiproxy)
- Latency Injection: Add network latency with configurable jitter
- Bandwidth Limiting: Throttle network throughput
- Connection Cutting: Simulate network partitions
- Timeout Simulation: Introduce delays exceeding timeout thresholds
- Packet Loss: Drop packets to simulate unreliable networks
🔧 Service Failure Simulation (WireMock)
- HTTP Errors: Return specific status codes (500, 404, 503, etc.)
- Timeouts: Delay responses beyond timeout thresholds
- Malformed Responses: Return invalid JSON or corrupted data
- Random Failures: Probabilistic failure injection
- Circuit Breaker Testing: Simulate circuit breaker patterns
- Variable Latency: Random response times within a range
🗄️ Database Chaos
- Connection Failures: Simulate database connection drops
- Transient Failures: Limited consecutive failures followed by recovery
- Slow Connections: Add delays to connection establishment
- Partial Failures: Probabilistic connection failures
💾 Resource Stress Testing
- CPU Stress: Saturate CPU cores with configurable thread count
- Memory Stress: Allocate large memory blocks
- Memory Leak Simulation: Continuous memory allocation
- Resource Monitoring: Track memory and CPU usage
🎯 Test Orchestration
- Scenario Builder: Fluent API for complex chaos scenarios
- Parallel Execution: Run multiple chaos scenarios concurrently
- Custom Actions: Define custom chaos behaviors
- Result Tracking: Monitor scenario execution and outcomes
Installation
Add to your build.gradle:
groovy
dependencies {
testImplementation 'com.jnzader:apigen-chaos'
}Quick Start
1. Chaos Monkey Configuration
yaml
# application-chaos.yml
chaos:
monkey:
enabled: true
latency-enabled: true
latency-min: 100
latency-max: 5000
exceptions-enabled: true
level: method
attack-probability: 0.12. Network Chaos Testing
java
@Autowired
private NetworkChaosSimulator networkChaos;
@Test
void testNetworkLatency() throws Exception {
// Add 500ms latency with 50ms jitter
networkChaos.addLatency("database-proxy", 500, 50);
// Execute your code
performDatabaseOperation();
// Restore normal behavior
networkChaos.restore("database-proxy");
}3. Service Failure Simulation
java
@Autowired
private ServiceFailureSimulator serviceFailure;
@Test
void testServiceTimeout() {
serviceFailure.start(8080);
// Simulate 10 second timeout
serviceFailure.simulateTimeout("/api/users", 10000);
// Your test code
assertThatThrownBy(() -> callExternalService())
.hasMessageContaining("timeout");
serviceFailure.stop();
}4. Database Chaos
java
@Autowired
private DatabaseChaosSimulator dbChaos;
@Test
void testDatabaseFailover() {
// Enable transient failures (3 consecutive failures)
dbChaos.enableTransientFailures(3);
// First 3 attempts should fail
for (int i = 0; i < 3; i++) {
assertThatThrownBy(() -> repository.findAll())
.isInstanceOf(SQLException.class);
}
// 4th attempt should succeed (automatic recovery)
assertThat(repository.findAll()).isNotEmpty();
}5. Resource Stress Testing
java
@Autowired
private ResourceStressSimulator resourceStress;
@Test
void testUnderMemoryPressure() {
// Allocate 500MB in 50MB blocks
resourceStress.startMemoryStress(500, 50);
// Verify application still functions
assertThat(service.processLargeDataset()).isTrue();
resourceStress.stopMemoryStress();
}
@Test
void testUnderCpuLoad() {
// Stress 4 CPU cores for 10 seconds
resourceStress.startCpuStress(4, 10);
// Verify performance degradation handling
long responseTime = measureResponseTime();
assertThat(responseTime).isLessThan(5000);
}6. Orchestrated Chaos Scenarios
java
@Autowired
private ChaosTestOrchestrator orchestrator;
@Test
void testComplexFailureScenario() {
orchestrator.scenario("Multi-component failure")
.withNetworkLatency("api-proxy", 1000, 100, 5000)
.withServiceFailure("/api/orders", 503, 3000)
.withDatabaseFailure(dbChaos, 0.3, 5000)
.withCpuStress(2, 10)
.run()
.thenAccept(result -> {
assertThat(result.isSuccess()).isTrue();
});
}Usage Scenarios
Testing Resilience Patterns
java
@Test
void testCircuitBreakerOpens() {
// Simulate service failures
serviceFailure.simulateCircuitBreakerOpen("/api/payment", 3);
// Make requests until circuit opens
for (int i = 0; i < 5; i++) {
try {
paymentService.processPayment(order);
} catch (CircuitBreakerOpenException e) {
// Circuit should open after 3 failures
assertThat(i).isGreaterThanOrEqualTo(3);
}
}
}Testing Retry Logic
java
@Test
void testRetryOnTransientFailure() {
// Enable 2 consecutive failures
dbChaos.enableTransientFailures(2);
// Service should retry and succeed on 3rd attempt
List<User> users = userService.findAllWithRetry();
assertThat(users).isNotEmpty();
}Testing Graceful Degradation
java
@Test
void testDegradationUnderLoad() {
resourceStress.startMemoryStress(1000, 100);
resourceStress.startCpuStress(8, 30);
// Verify service returns cached data instead of failing
Response response = apiClient.getData();
assertThat(response.getStatus()).isEqualTo(200);
assertThat(response.isFromCache()).isTrue();
resourceStress.stopMemoryStress();
resourceStress.stopCpuStress();
}Testing Timeout Handling
java
@Test
void testTimeoutHandling() {
// Simulate slow downstream service
serviceFailure.simulateVariableLatency("/api/external", 5000, 10000);
// Should timeout and use fallback
CompletableFuture<Data> future = service.fetchDataAsync();
assertThatThrownBy(() -> future.get(3, TimeUnit.SECONDS))
.isInstanceOf(TimeoutException.class);
// Verify fallback was used
Data fallbackData = service.getFallbackData();
assertThat(fallbackData).isNotNull();
}Configuration
Chaos Monkey Properties
yaml
chaos:
monkey:
enabled: true/false
latency-enabled: true/false
latency-min: 100 # ms
latency-max: 5000 # ms
exceptions-enabled: true/false
exception-message: "Custom error message"
kill-enabled: false # WARNING: Terminates application
memory-stress-enabled: false
cpu-stress-enabled: false
level: method|service|repository|component|restController
attack-probability: 0.0-1.0 # 0.1 = 10%
watcher-enabled: true/falseActuator Endpoints
Monitor chaos experiments via Spring Boot Actuator:
bash
# Enable/disable chaos monkey
POST /actuator/chaosmonkey/enable
POST /actuator/chaosmonkey/disable
# Get current configuration
GET /actuator/chaosmonkey
# Get watcher status
GET /actuator/chaosmonkey/watchersBest Practices
- Start Small: Begin with single failure types before combining scenarios
- Use Profiles: Enable chaos only in test/staging environments
- Monitor Metrics: Track application metrics during chaos tests
- Set Timeouts: Always configure test timeouts to prevent hanging
- Clean Up: Always restore normal behavior after tests
- Gradual Increase: Start with low failure probabilities and increase gradually
- Document Scenarios: Maintain a catalog of tested failure scenarios
- Automate: Include chaos tests in CI/CD pipelines
Safety
- Never run in production without proper controls
- Use
@Profile("!prod")to prevent accidental production usage - Set
chaos.monkey.enabled=falseby default - Implement kill switches for chaos experiments
- Monitor resource usage during stress tests
- Set conservative timeouts
Dependencies
- Chaos Monkey for Spring Boot 3.2.0
- Toxiproxy Java 2.1.7
- WireMock 3.10.0
- Testcontainers 1.21.4
- Awaitility 4.2.2
Examples
See src/test/java for comprehensive examples:
NetworkChaosIntegrationTest.java- Network chaos scenariosServiceFailureIntegrationTest.java- Service failure patternsDatabaseChaosIntegrationTest.java- Database resilience testingResourceStressIntegrationTest.java- Resource stress scenarios
Contributing
Contributions welcome! Please ensure:
- All chaos scenarios have corresponding tests
- Safety mechanisms are in place
- Documentation is updated
- Examples are provided
License
MIT License - see LICENSE file for details