# Regex Denial of Service via Catastrophic Backtracking

Language: Python
Severity: High
CWE: CWE-1333

## Source
9

## Flow
9-11

## Sink
11

## Vulnerable Code
```python
import re
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/api/iot/validate_sensor_id', methods=['POST'])
def validate_iot_sensor_identifier():
    sensor_id = request.json.get('sensor_id', '')
    pattern = r'^(([a-zA-Z0-9]+)*-)*([a-zA-Z0-9]+)+$'
    if re.match(pattern, sensor_id):
        return jsonify({'valid': True, 'message': 'Sensor ID format accepted'})
    return jsonify({'valid': False, 'message': 'Invalid sensor identifier format'}), 400
```

## Explanation

The regex pattern contains nested quantifiers (([a-zA-Z0-9]+)*-)* which causes catastrophic backtracking when processing malicious input. When re.match() attempts to match an input like many alphanumeric characters followed by a dash without proper termination, the regex engine explores exponentially many backtracking paths, leading to CPU exhaustion and denial of service.

## Remediation

The fix replaces the vulnerable regex pattern `^(([a-zA-Z0-9]+)*-)*([a-zA-Z0-9]+)+$` with the safe equivalent `^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$`, which matches the same logical format (alphanumeric segments separated by dashes) but eliminates nested quantifiers that caused catastrophic backtracking. Additionally, an input length check of 128 characters is added as a defense-in-depth measure to prevent abuse even if future regex changes inadvertently reintroduce complexity.

## Secure Code
```python
import re
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/api/iot/validate_sensor_id', methods=['POST'])
def validate_iot_sensor_identifier():
    sensor_id = request.json.get('sensor_id', '')
    
    # Input length limit to prevent abuse
    if not sensor_id or len(sensor_id) > 128:
        return jsonify({'valid': False, 'message': 'Invalid sensor identifier format'}), 400
    
    # Safe regex without nested quantifiers - matches alphanumeric segments separated by dashes
    pattern = r'^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$'
    if re.match(pattern, sensor_id):
        return jsonify({'valid': True, 'message': 'Sensor ID format accepted'})
    return jsonify({'valid': False, 'message': 'Invalid sensor identifier format'}), 400
```
