{"title":"Dependency Confusion via Untrusted pip Package Resolution","language":"Python","severity":"Critical","cwe":"CWE-78","source_lines":[10],"flow_lines":[10,11,12],"sink_lines":[12],"vulnerable_code":"import subprocess\nimport os\n\ndef bootstrap_ml_dependencies(model_type):\n    required_libs = {\n        'vision': 'cv2-utils tensorvision imgprocess-core',\n        'nlp': 'text-analyzer sentiment-core nlp-utils',\n        'audio': 'audio-processor wave-analyzer sound-utils'\n    }\n    packages = required_libs.get(model_type, 'base-ml-toolkit')\n    extra_index = os.getenv('ML_PACKAGE_MIRROR', 'https://ml-packages.internal.corp')\n    install_cmd = f'pip install --extra-index-url {extra_index} {packages}'\n    subprocess.run(install_cmd, shell=True, check=False)\n    return f'ML environment ready for {model_type}'","explanation":"The code constructs a shell command using an environment variable (ML_PACKAGE_MIRROR) that is directly interpolated into the pip install command without validation. This enables command injection attacks through the environment variable and dependency confusion attacks via untrusted package repositories that could serve malicious packages with the same names as legitimate internal packages.","remediation":"The fix eliminates both command injection and dependency confusion by: (1) using subprocess with a list of arguments and shell=False to prevent shell injection, (2) validating the index URL against an allowlist of trusted internal repositories, (3) using --index-url instead of --extra-index-url so pip does NOT fall back to public PyPI, and (4) validating package names against an allowlist and regex pattern.","secure_code":"import subprocess\nimport os\nimport re\nfrom urllib.parse import urlparse\n\n# Allowlist of known safe internal package indexes\nALLOWED_INDEXES = [\n    'https://ml-packages.internal.corp',\n    'https://pypi.internal.corp/simple',\n]\n\n# Allowlist of known safe package names\nALLOWED_PACKAGES = {\n    'vision': ['cv2-utils', 'tensorvision', 'imgprocess-core'],\n    'nlp': ['text-analyzer', 'sentiment-core', 'nlp-utils'],\n    'audio': ['audio-processor', 'wave-analyzer', 'sound-utils'],\n    'default': ['base-ml-toolkit']\n}\n\nPACKAGE_NAME_PATTERN = re.compile(r'^[a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?$')\n\n\ndef validate_index_url(url):\n    \"\"\"Validate that the index URL is in the allowlist.\"\"\"\n    if url not in ALLOWED_INDEXES:\n        raise ValueError(f\"Untrusted package index URL: {url}. Must be one of: {ALLOWED_INDEXES}\")\n    parsed = urlparse(url)\n    if parsed.scheme != 'https':\n        raise ValueError(\"Package index must use HTTPS.\")\n    return url\n\n\ndef validate_packages(packages):\n    \"\"\"Validate package names against allowlist and naming pattern.\"\"\"\n    for pkg in packages:\n        if not PACKAGE_NAME_PATTERN.match(pkg):\n            raise ValueError(f\"Invalid package name format: {pkg}\")\n    return packages\n\n\ndef bootstrap_ml_dependencies(model_type):\n    \"\"\"Bootstrap ML dependencies using only trusted, validated sources.\"\"\"\n    if model_type not in ALLOWED_PACKAGES:\n        packages = ALLOWED_PACKAGES['default']\n    else:\n        packages = ALLOWED_PACKAGES[model_type]\n\n    packages = validate_packages(packages)\n\n    index_url = os.getenv('ML_PACKAGE_MIRROR', 'https://ml-packages.internal.corp')\n    index_url = validate_index_url(index_url)\n\n    # Use --index-url (not --extra-index-url) to prevent fallback to public PyPI\n    # Pass arguments as a list to avoid shell injection\n    install_cmd = [\n        'pip', 'install',\n        '--index-url', index_url,\n    ] + packages\n\n    subprocess.run(install_cmd, shell=False, check=True)\n    return f'ML environment ready for {model_type}'"}