{"title":"Unsafe `marshal` Deserialization via `marshal.loads()`","language":"Python","severity":"Critical","cwe":"CWE-502","source_lines":[9],"flow_lines":[9,10,11],"sink_lines":[11],"vulnerable_code":"import marshal\nimport base64\nfrom flask import Flask, request, jsonify\n\napp = Flask(__name__)\n\n@app.route('/api/ml/model/predict', methods=['POST'])\ndef execute_ml_inference():\n    serialized_weights = request.json.get('model_weights')\n    decoded_weights = base64.b64decode(serialized_weights)\n    weight_obj = marshal.loads(decoded_weights)\n    prediction_result = weight_obj['predict_fn'](request.json.get('input_data'))\n    return jsonify({'prediction': prediction_result, 'status': 'success'})","explanation":"The code deserializes untrusted data from client requests using marshal.loads(), which can execute arbitrary code. An attacker can craft a malicious payload containing executable code in the 'model_weights' parameter, which gets deserialized and executed when the 'predict_fn' callable is invoked.","remediation":"The fix eliminates unsafe marshal deserialization entirely by replacing it with a server-side model registry pattern. Instead of accepting serialized executable code from untrusted clients, clients now reference pre-registered models by ID, and only safe JSON input data is accepted and validated before being passed to trusted prediction functions.","secure_code":"import json\nimport hashlib\nimport hmac\nimport os\nfrom flask import Flask, request, jsonify\n\napp = Flask(__name__)\n\n# Pre-registered models stored server-side, never deserialized from client input\nMODEL_REGISTRY = {}\n\n# Secret key for HMAC verification of model identifiers\nHMAC_SECRET = os.environ.get('MODEL_HMAC_SECRET', os.urandom(32))\n\n\ndef register_model(model_id, predict_fn):\n    \"\"\"Register a trusted model prediction function server-side.\"\"\"\n    MODEL_REGISTRY[model_id] = predict_fn\n\n\ndef validate_input_data(input_data):\n    \"\"\"Validate that input data contains only safe numeric/list types.\"\"\"\n    if input_data is None:\n        raise ValueError(\"input_data is required\")\n    if isinstance(input_data, (int, float)):\n        return input_data\n    if isinstance(input_data, list):\n        return [validate_input_data(item) for item in input_data]\n    if isinstance(input_data, dict):\n        return {k: validate_input_data(v) for k, v in input_data.items()}\n    raise ValueError(f\"Unsupported input data type: {type(input_data).__name__}\")\n\n\n@app.route('/api/ml/model/predict', methods=['POST'])\ndef execute_ml_inference():\n    if not request.json:\n        return jsonify({'error': 'Request body must be JSON'}), 400\n\n    model_id = request.json.get('model_id')\n    if not model_id or not isinstance(model_id, str):\n        return jsonify({'error': 'A valid model_id is required'}), 400\n\n    # Look up model from server-side registry instead of deserializing from client\n    if model_id not in MODEL_REGISTRY:\n        return jsonify({'error': f'Model \"{model_id}\" not found in registry'}), 404\n\n    predict_fn = MODEL_REGISTRY[model_id]\n\n    # Validate input data\n    try:\n        input_data = validate_input_data(request.json.get('input_data'))\n    except ValueError as e:\n        return jsonify({'error': str(e)}), 400\n\n    try:\n        prediction_result = predict_fn(input_data)\n    except Exception as e:\n        return jsonify({'error': f'Prediction failed: {str(e)}'}), 500\n\n    return jsonify({'prediction': prediction_result, 'status': 'success'})\n\n\n# Example: register models at startup\n# register_model('linear_model_v1', my_linear_predict_function)"}