# Dependency Confusion via Untrusted pip Package Resolution

Language: Python
Severity: Critical
CWE: CWE-78

## Source
10

## Flow
10-11-12

## Sink
12

## Vulnerable Code
```python
import subprocess
import os

def bootstrap_ml_dependencies(model_type):
    required_libs = {
        'vision': 'cv2-utils tensorvision imgprocess-core',
        'nlp': 'text-analyzer sentiment-core nlp-utils',
        'audio': 'audio-processor wave-analyzer sound-utils'
    }
    packages = required_libs.get(model_type, 'base-ml-toolkit')
    extra_index = os.getenv('ML_PACKAGE_MIRROR', 'https://ml-packages.internal.corp')
    install_cmd = f'pip install --extra-index-url {extra_index} {packages}'
    subprocess.run(install_cmd, shell=True, check=False)
    return f'ML environment ready for {model_type}'
```

## Explanation

The code constructs a shell command using an environment variable (ML_PACKAGE_MIRROR) that is directly interpolated into the pip install command without validation. This enables command injection attacks through the environment variable and dependency confusion attacks via untrusted package repositories that could serve malicious packages with the same names as legitimate internal packages.

## Remediation

The fix eliminates both command injection and dependency confusion by: (1) using subprocess with a list of arguments and shell=False to prevent shell injection, (2) validating the index URL against an allowlist of trusted internal repositories, (3) using --index-url instead of --extra-index-url so pip does NOT fall back to public PyPI, and (4) validating package names against an allowlist and regex pattern.

## Secure Code
```python
import subprocess
import os
import re
from urllib.parse import urlparse

# Allowlist of known safe internal package indexes
ALLOWED_INDEXES = [
    'https://ml-packages.internal.corp',
    'https://pypi.internal.corp/simple',
]

# Allowlist of known safe package names
ALLOWED_PACKAGES = {
    'vision': ['cv2-utils', 'tensorvision', 'imgprocess-core'],
    'nlp': ['text-analyzer', 'sentiment-core', 'nlp-utils'],
    'audio': ['audio-processor', 'wave-analyzer', 'sound-utils'],
    'default': ['base-ml-toolkit']
}

PACKAGE_NAME_PATTERN = re.compile(r'^[a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?$')


def validate_index_url(url):
    """Validate that the index URL is in the allowlist."""
    if url not in ALLOWED_INDEXES:
        raise ValueError(f"Untrusted package index URL: {url}. Must be one of: {ALLOWED_INDEXES}")
    parsed = urlparse(url)
    if parsed.scheme != 'https':
        raise ValueError("Package index must use HTTPS.")
    return url


def validate_packages(packages):
    """Validate package names against allowlist and naming pattern."""
    for pkg in packages:
        if not PACKAGE_NAME_PATTERN.match(pkg):
            raise ValueError(f"Invalid package name format: {pkg}")
    return packages


def bootstrap_ml_dependencies(model_type):
    """Bootstrap ML dependencies using only trusted, validated sources."""
    if model_type not in ALLOWED_PACKAGES:
        packages = ALLOWED_PACKAGES['default']
    else:
        packages = ALLOWED_PACKAGES[model_type]

    packages = validate_packages(packages)

    index_url = os.getenv('ML_PACKAGE_MIRROR', 'https://ml-packages.internal.corp')
    index_url = validate_index_url(index_url)

    # Use --index-url (not --extra-index-url) to prevent fallback to public PyPI
    # Pass arguments as a list to avoid shell injection
    install_cmd = [
        'pip', 'install',
        '--index-url', index_url,
    ] + packages

    subprocess.run(install_cmd, shell=False, check=True)
    return f'ML environment ready for {model_type}'
```
