# XPath Injection via lxml.etree.XPath on Untrusted Expressions

Language: Python
Severity: High
CWE: CWE-643

## Source
5

## Flow
5-7-8

## Sink
8

## Vulnerable Code
```python
from lxml import etree
from flask import request, jsonify

def query_iot_device_telemetry():
    device_filter = request.args.get('filter', 'default')
    xml_data = etree.parse('iot_telemetry.xml')
    xpath_query = f"//device[@status='{device_filter}']/metrics"
    xpath_eval = etree.XPath(xpath_query)
    results = xpath_eval(xml_data)
    telemetry = [etree.tostring(node, encoding='unicode') for node in results]
    return jsonify({'devices': telemetry, 'count': len(telemetry)})
```

## Explanation

The code constructs an XPath query by directly concatenating user-controlled input from request.args.get('filter') into the XPath expression string without sanitization. This allows an attacker to inject malicious XPath syntax that can alter the query logic, bypass filters, or extract unauthorized data from the XML document.

## Remediation

The fix uses lxml's built-in XPath variable substitution (parameterized queries) via the `$status` placeholder and passing the value as a keyword argument to the XPath evaluation call, which prevents any injected XPath syntax from being interpreted. Additionally, an allowlist validation is applied to restrict input to known valid device status values, providing defense in depth.

## Secure Code
```python
from lxml import etree
from flask import request, jsonify
import re

ALLOWED_STATUS_VALUES = {'active', 'inactive', 'maintenance', 'error', 'default', 'online', 'offline'}

def query_iot_device_telemetry():
    device_filter = request.args.get('filter', 'default')
    
    # Validate input against allowlist of known device statuses
    if device_filter not in ALLOWED_STATUS_VALUES:
        return jsonify({'error': 'Invalid filter value', 'allowed': sorted(ALLOWED_STATUS_VALUES)}), 400
    
    xml_data = etree.parse('iot_telemetry.xml')
    
    # Use XPath variable substitution to prevent injection
    xpath_query = "//device[@status=$status]/metrics"
    xpath_eval = etree.XPath(xpath_query, namespaces={})
    results = xpath_eval(xml_data, status=device_filter)
    
    telemetry = [etree.tostring(node, encoding='unicode') for node in results]
    return jsonify({'devices': telemetry, 'count': len(telemetry)})
```
