{"title":"XPath Injection via lxml.etree.XPath on Untrusted Expressions","language":"Python","severity":"High","cwe":"CWE-643","source_lines":[5],"flow_lines":[5,7,8],"sink_lines":[8],"vulnerable_code":"from lxml import etree\nfrom flask import request, jsonify\n\ndef query_iot_device_telemetry():\n    device_filter = request.args.get('filter', 'default')\n    xml_data = etree.parse('iot_telemetry.xml')\n    xpath_query = f\"//device[@status='{device_filter}']/metrics\"\n    xpath_eval = etree.XPath(xpath_query)\n    results = xpath_eval(xml_data)\n    telemetry = [etree.tostring(node, encoding='unicode') for node in results]\n    return jsonify({'devices': telemetry, 'count': len(telemetry)})","explanation":"The code constructs an XPath query by directly concatenating user-controlled input from request.args.get('filter') into the XPath expression string without sanitization. This allows an attacker to inject malicious XPath syntax that can alter the query logic, bypass filters, or extract unauthorized data from the XML document.","remediation":"The fix uses lxml's built-in XPath variable substitution (parameterized queries) via the `$status` placeholder and passing the value as a keyword argument to the XPath evaluation call, which prevents any injected XPath syntax from being interpreted. Additionally, an allowlist validation is applied to restrict input to known valid device status values, providing defense in depth.","secure_code":"from lxml import etree\nfrom flask import request, jsonify\nimport re\n\nALLOWED_STATUS_VALUES = {'active', 'inactive', 'maintenance', 'error', 'default', 'online', 'offline'}\n\ndef query_iot_device_telemetry():\n    device_filter = request.args.get('filter', 'default')\n    \n    # Validate input against allowlist of known device statuses\n    if device_filter not in ALLOWED_STATUS_VALUES:\n        return jsonify({'error': 'Invalid filter value', 'allowed': sorted(ALLOWED_STATUS_VALUES)}), 400\n    \n    xml_data = etree.parse('iot_telemetry.xml')\n    \n    # Use XPath variable substitution to prevent injection\n    xpath_query = \"//device[@status=$status]/metrics\"\n    xpath_eval = etree.XPath(xpath_query, namespaces={})\n    results = xpath_eval(xml_data, status=device_filter)\n    \n    telemetry = [etree.tostring(node, encoding='unicode') for node in results]\n    return jsonify({'devices': telemetry, 'count': len(telemetry)})"}