{"title":"XPath Injection via lxml XPath Predicates","language":"Python","severity":"High","cwe":"CWE-643","source_lines":[8,9],"flow_lines":[8,9,11],"sink_lines":[11],"vulnerable_code":"from lxml import etree\nimport flask\n\napp = flask.Flask(__name__)\n\n@app.route('/iot/device_telemetry')\ndef fetch_device_metrics():\n    device_id = flask.request.args.get('device_id', '')\n    sensor_type = flask.request.args.get('sensor', 'temperature')\n    xml_data = etree.parse('telemetry_data.xml')\n    query = f\"//device[@id='{device_id}']/sensors/sensor[@type='{sensor_type}']/reading\"\n    results = xml_data.xpath(query)\n    metrics = [etree.tostring(r, encoding='unicode') for r in results]\n    return flask.jsonify({'device': device_id, 'metrics': metrics})","explanation":"The application constructs an XPath query by directly concatenating unsanitized user inputs (device_id and sensor_type) into the query string. An attacker can break out of the XPath predicate context and inject arbitrary XPath expressions to extract unauthorized data or bypass access controls.","remediation":"The fix uses lxml's built-in XPath parameterization (XPath variables via keyword arguments to the xpath() method) which safely handles user input without string concatenation. Additionally, an input validation function using a whitelist regex ensures that only alphanumeric identifiers (with hyphens and underscores) are accepted, providing defense-in-depth against injection attacks.","secure_code":"from lxml import etree\nimport flask\nimport re\n\napp = flask.Flask(__name__)\n\ndef validate_identifier(value):\n    \"\"\"Validate that the input is a safe alphanumeric identifier.\"\"\"\n    if not re.match(r'^[a-zA-Z0-9_\\-]+$', value):\n        return None\n    return value\n\n@app.route('/iot/device_telemetry')\ndef fetch_device_metrics():\n    device_id = flask.request.args.get('device_id', '')\n    sensor_type = flask.request.args.get('sensor', 'temperature')\n    \n    # Validate inputs to prevent XPath injection\n    device_id = validate_identifier(device_id)\n    sensor_type = validate_identifier(sensor_type)\n    \n    if device_id is None or sensor_type is None:\n        return flask.jsonify({'error': 'Invalid input parameters'}), 400\n    \n    xml_data = etree.parse('telemetry_data.xml')\n    \n    # Use parameterized XPath with XPath variables to prevent injection\n    query = \"//device[@id=$device_id]/sensors/sensor[@type=$sensor_type]/reading\"\n    results = xml_data.xpath(query, device_id=device_id, sensor_type=sensor_type)\n    \n    metrics = [etree.tostring(r, encoding='unicode') for r in results]\n    return flask.jsonify({'device': device_id, 'metrics': metrics})"}