# Pickle Deserialization via `__reduce__` Arbitrary Code Execution

Language: Python
Severity: Critical
CWE: CWE-502

## Source
9

## Flow
9-10-11

## Sink
11

## Vulnerable Code
```python
import pickle
import base64
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/iot/device/restore', methods=['POST'])
def restore_device_config():
    encoded_state = request.json.get('device_state')
    device_data = base64.b64decode(encoded_state)
    restored_config = pickle.loads(device_data)
    return jsonify({'status': 'restored', 'device_id': restored_config.get('device_id'), 'firmware': restored_config.get('firmware_version')})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

## Explanation

The application accepts user-controlled data from a POST request, base64 decodes it, and directly deserializes it using pickle.loads() without validation. Python's pickle module can execute arbitrary code during deserialization through __reduce__ methods, allowing attackers to achieve remote code execution.

## Remediation

The fix replaces pickle deserialization with JSON parsing, which cannot execute arbitrary code during deserialization. Additionally, input validation ensures the decoded data conforms to expected structure and types, and an optional HMAC signature verification provides defense-in-depth to ensure data integrity from trusted devices.

## Secure Code
```python
import json
import base64
import hmac
import hashlib
from flask import Flask, request, jsonify

app = Flask(__name__)

# Secret key for HMAC validation (should be stored securely, e.g., environment variable)
DEVICE_STATE_SECRET = b'your-secure-secret-key-here'

ALLOWED_FIELDS = {'device_id', 'firmware_version', 'config', 'network_settings', 'sensors', 'last_sync'}

def validate_device_config(data):
    """Validate that the deserialized config only contains expected fields and safe types."""
    if not isinstance(data, dict):
        return False
    if not data.keys() <= ALLOWED_FIELDS:
        return False
    # Ensure device_id is a string
    if 'device_id' in data and not isinstance(data['device_id'], str):
        return False
    # Ensure firmware_version is a string
    if 'firmware_version' in data and not isinstance(data['firmware_version'], str):
        return False
    return True

@app.route('/iot/device/restore', methods=['POST'])
def restore_device_config():
    request_data = request.json
    if not request_data or 'device_state' not in request_data:
        return jsonify({'status': 'error', 'message': 'Missing device_state'}), 400

    encoded_state = request_data.get('device_state')
    provided_signature = request_data.get('signature')

    try:
        device_data = base64.b64decode(encoded_state)
    except Exception:
        return jsonify({'status': 'error', 'message': 'Invalid base64 encoding'}), 400

    # Verify HMAC signature if provided (defense in depth)
    if provided_signature:
        expected_signature = hmac.new(DEVICE_STATE_SECRET, device_data, hashlib.sha256).hexdigest()
        if not hmac.compare_digest(provided_signature, expected_signature):
            return jsonify({'status': 'error', 'message': 'Invalid signature'}), 403

    # Use JSON instead of pickle for safe deserialization
    try:
        restored_config = json.loads(device_data)
    except (json.JSONDecodeError, UnicodeDecodeError):
        return jsonify({'status': 'error', 'message': 'Invalid JSON configuration data'}), 400

    # Validate the structure of the restored config
    if not validate_device_config(restored_config):
        return jsonify({'status': 'error', 'message': 'Invalid configuration structure'}), 400

    return jsonify({
        'status': 'restored',
        'device_id': restored_config.get('device_id'),
        'firmware': restored_config.get('firmware_version')
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```
