- Published on
Production Python: Project Structure and Best Practices for ML
- Authors

- Name
- Jared Chung
Introduction
The gap between a Jupyter notebook and production code is vast. Most ML tutorials stop at model.fit(), but real value comes from maintainable, testable, deployable code. This guide covers the practices that separate hobby projects from production systems.
Project Structure
A well-organized ML project:
my_ml_project/
├── src/
│ └── my_ml_project/
│ ├── __init__.py
│ ├── config.py # Configuration management
│ ├── data/
│ │ ├── __init__.py
│ │ ├── loaders.py # Data loading
│ │ └── processors.py # Data transformations
│ ├── models/
│ │ ├── __init__.py
│ │ ├── architectures.py
│ │ └── training.py
│ ├── inference/
│ │ ├── __init__.py
│ │ └── predictor.py
│ └── utils/
│ ├── __init__.py
│ └── logging.py
├── tests/
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures
│ ├── unit/
│ │ └── test_processors.py
│ └── integration/
│ └── test_pipeline.py
├── scripts/
│ ├── train.py
│ └── evaluate.py
├── notebooks/ # Exploration only
│ └── exploration.ipynb
├── configs/
│ ├── base.yaml
│ └── production.yaml
├── pyproject.toml # Project metadata & deps
├── Makefile # Common commands
├── Dockerfile
├── .env.example
├── .gitignore
└── README.md
Key principles:
- Source code under
src/with package name subdirectory - Tests mirror source structure
- Configuration separate from code
- Scripts for entrypoints
- Notebooks for exploration only, never production
Modern Dependency Management
pyproject.toml
The modern standard for Python projects:
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "my-ml-project"
version = "0.1.0"
description = "Production ML pipeline"
requires-python = ">=3.10"
dependencies = [
"torch>=2.0",
"transformers>=4.30",
"pandas>=2.0",
"pydantic>=2.0",
"pydantic-settings>=2.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"ruff>=0.1",
"mypy>=1.0",
"pre-commit>=3.0",
]
[project.scripts]
train = "my_ml_project.scripts.train:main"
serve = "my_ml_project.scripts.serve:main"
[tool.ruff]
line-length = 100
select = ["E", "F", "I", "N", "W", "UP"]
[tool.mypy]
python_version = "3.11"
strict = true
ignore_missing_imports = true
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --cov=src"
uv for Fast Dependency Management
Modern alternative to pip:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv
# Install dependencies (10-100x faster than pip)
uv pip install -e ".[dev]"
# Lock dependencies
uv pip compile pyproject.toml -o requirements.lock
# Sync from lock file
uv pip sync requirements.lock
Configuration Management
Never hardcode configuration. Use environment variables and config files.
Pydantic Settings
# src/my_ml_project/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field
from pathlib import Path
from typing import Literal
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
extra="ignore"
)
# Environment
environment: Literal["development", "staging", "production"] = "development"
debug: bool = False
# Model settings
model_name: str = "bert-base-uncased"
model_path: Path = Field(default=Path("models/"))
max_sequence_length: int = 512
# Training
batch_size: int = 32
learning_rate: float = 2e-5
epochs: int = 3
# API settings
api_host: str = "0.0.0.0"
api_port: int = 8000
# External services
database_url: str = Field(default="sqlite:///./data.db")
redis_url: str = Field(default="redis://localhost:6379")
@property
def is_production(self) -> bool:
return self.environment == "production"
# Singleton pattern
_settings = None
def get_settings() -> Settings:
global _settings
if _settings is None:
_settings = Settings()
return _settings
YAML Configuration for Experiments
# configs/base.yaml
model:
name: bert-base-uncased
max_length: 512
training:
batch_size: 32
learning_rate: 2e-5
epochs: 3
warmup_steps: 100
data:
train_path: data/train.csv
val_path: data/val.csv
# Load with Hydra or OmegaConf
from omegaconf import OmegaConf
config = OmegaConf.load("configs/base.yaml")
print(config.model.name) # bert-base-uncased
Logging
Structured logging for observability:
# src/my_ml_project/utils/logging.py
import logging
import sys
from pythonjsonlogger import jsonlogger
def setup_logging(level: str = "INFO", json_format: bool = False):
root_logger = logging.getLogger()
root_logger.setLevel(level)
handler = logging.StreamHandler(sys.stdout)
if json_format:
formatter = jsonlogger.JsonFormatter(
"%(timestamp)s %(level)s %(name)s %(message)s",
rename_fields={"levelname": "level", "asctime": "timestamp"}
)
else:
formatter = logging.Formatter(
"%(asctime)s | %(levelname)-8s | %(name)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
handler.setFormatter(formatter)
root_logger.addHandler(handler)
# Reduce noise from libraries
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("transformers").setLevel(logging.WARNING)
def get_logger(name: str) -> logging.Logger:
return logging.getLogger(name)
Usage:
from my_ml_project.utils.logging import get_logger
logger = get_logger(__name__)
def train_model(config):
logger.info("Starting training", extra={"config": config.dict()})
for epoch in range(config.epochs):
logger.info(f"Epoch {epoch}", extra={"epoch": epoch, "loss": loss})
logger.info("Training complete", extra={"final_metrics": metrics})
Error Handling
Define custom exceptions:
# src/my_ml_project/exceptions.py
class MLProjectError(Exception):
"""Base exception for the project."""
pass
class DataValidationError(MLProjectError):
"""Raised when data validation fails."""
pass
class ModelNotFoundError(MLProjectError):
"""Raised when a model file is not found."""
pass
class InferenceError(MLProjectError):
"""Raised when inference fails."""
pass
Use context managers for resources:
from contextlib import contextmanager
import torch
@contextmanager
def inference_mode():
"""Context manager for inference."""
was_training = torch.is_grad_enabled()
try:
torch.set_grad_enabled(False)
yield
finally:
torch.set_grad_enabled(was_training)
# Usage
with inference_mode():
predictions = model(inputs)
Testing
Test Structure
# tests/conftest.py
import pytest
from pathlib import Path
@pytest.fixture
def sample_data():
return {
"texts": ["Hello world", "Test input"],
"labels": [0, 1]
}
@pytest.fixture
def model_path(tmp_path):
return tmp_path / "test_model"
@pytest.fixture
def settings():
from my_ml_project.config import Settings
return Settings(environment="development", debug=True)
# tests/unit/test_processors.py
import pytest
from my_ml_project.data.processors import TextProcessor
class TestTextProcessor:
def test_tokenize_basic(self):
processor = TextProcessor(max_length=128)
result = processor.tokenize("Hello world")
assert "input_ids" in result
assert len(result["input_ids"]) <= 128
def test_tokenize_empty_raises(self):
processor = TextProcessor()
with pytest.raises(ValueError, match="empty"):
processor.tokenize("")
@pytest.mark.parametrize("text,expected_length", [
("Short", 3),
("A bit longer text here", 6),
])
def test_tokenize_lengths(self, text, expected_length):
processor = TextProcessor()
result = processor.tokenize(text)
# Approximate token count
assert len(result["input_ids"]) >= expected_length
Integration Tests
# tests/integration/test_pipeline.py
import pytest
from my_ml_project.inference.predictor import Predictor
@pytest.mark.integration
class TestPredictionPipeline:
@pytest.fixture
def predictor(self, model_path):
return Predictor(model_path=model_path)
def test_end_to_end_prediction(self, predictor, sample_data):
predictions = predictor.predict(sample_data["texts"])
assert len(predictions) == len(sample_data["texts"])
assert all(0 <= p <= 1 for p in predictions)
Run tests:
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run only unit tests
pytest tests/unit
# Run integration tests
pytest -m integration
Pre-commit Hooks
Automate code quality:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.6
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.7.0
hooks:
- id: mypy
additional_dependencies:
- pydantic>=2.0
- types-requests
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
args: ['--maxkb=1000']
Setup:
pip install pre-commit
pre-commit install
Makefile for Common Commands
.PHONY: install test lint format clean
install:
uv pip install -e ".[dev]"
pre-commit install
test:
pytest tests/ -v --cov=src
test-unit:
pytest tests/unit -v
test-integration:
pytest tests/integration -v -m integration
lint:
ruff check src tests
mypy src
format:
ruff format src tests
ruff check --fix src tests
clean:
rm -rf .pytest_cache .mypy_cache .ruff_cache
rm -rf dist build *.egg-info
find . -type d -name __pycache__ -exec rm -rf {} +
docker-build:
docker build -t my-ml-project .
docker-run:
docker run -p 8000:8000 my-ml-project
CI/CD with GitHub Actions
# .github/workflows/ci.yaml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10', '3.11']
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Install dependencies
run: |
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
- name: Lint
run: |
source .venv/bin/activate
ruff check src tests
mypy src
- name: Test
run: |
source .venv/bin/activate
pytest tests/ --cov=src --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: coverage.xml
Docker
# Dockerfile
FROM python:3.11-slim as base
WORKDIR /app
# Install uv
RUN pip install uv
# Copy dependency files
COPY pyproject.toml .
COPY requirements.lock .
# Install dependencies
RUN uv venv && \
. .venv/bin/activate && \
uv pip sync requirements.lock
# Copy source
COPY src/ src/
# Production stage
FROM base as production
# Non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
ENV PATH="/app/.venv/bin:$PATH"
CMD ["python", "-m", "my_ml_project.scripts.serve"]
Conclusion
Production Python is about consistency, maintainability, and reliability. These practices might seem like overhead for small projects, but they pay dividends as projects grow. Start with the essentials project structure, testing, and linting and add more as needed.
The goal isn't perfection. It's a codebase that you and your team can maintain and extend with confidence.