A Python project checklist

When building a new project, it's a smart move to be very strict right from the start. It is much harder to add more linting/typing checks once you have 1000+ lines of code.

That's why I'm providing an opinionated list of libraries for your new Python project. I might write a more in-depth article on the best practices when building a web app with Python. For now, this is mostly a checklist with some obvious recommendations.

Why not a template repo instead of a checklist? Template repositories (e.g. built with cookiecutter) go quickly out of date and discourage learning about the ins and outs of all those best practices. They might make sense for your organization, but they're not the point of this article.

Summary

Development tasks: Makefile
Typechecking: mypy
Dependency and virtualenv management: poetry
Linting: flake8
Code autoformatting: black and isort
Tests: pytest with plugins such as pytest-cov, pytest-factoryboy
Docstring: pydocstyle
Logging: structlog
Configuration: Pydantic BaseSettings with dotenv support
Error reporting: Sentry
Documentation: Docusaurus or Sphinx
Profiling and performance: profile, pyinstrument, sqltap

Running development tasks: Makefile

Makefiles are well understood, work almost everywhere, and shorten the ramp-up time for fellow developers who might not have much experience with Python.

Here's an example:

SHELL := bash
.ONESHELL:
.SHELLFLAGS := -eu -o pipefail -c
.DELETE_ON_ERROR:
MAKEFLAGS += --warn-undefined-variables
MAKEFLAGS += --no-builtin-rules

install:  # Install the app locally
	poetry install
.PHONY: install

ci: typecheck lint test ## Run all checks (test, lint, typecheck)
.PHONY: ci

test:  ## Run tests
	poetry run pytest .
.PHONY: test

lint:  ## Run linting
	poetry run black --check .
	poetry run isort -c .
	poetry run flake8 .
	poetry run pydocstyle .
.PHONY: lint

lint-fix:  ## Run autoformatters
	poetry run black .
	poetry run isort .
.PHONY: lint-fix

typecheck:  ## Run typechecking
	poetry run mypy --show-error-codes --pretty .
.PHONY: typecheck

.DEFAULT_GOAL := help
help: Makefile
	@grep -E '(^[a-zA-Z_-]+:.*?##.*$$)|(^##)' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[32m%-30s\033[0m %s\n", $$1, $$2}' | sed -e 's/\[32m##/[33m/'

This is a great article about Makefile: Your Makefiles are wrong

Typechecking: mypy

Type annotations in Python libraries are not yet pervasive, but getting better every day.

Supported by Python's founder, Guido van Rossum, mypy is the de facto standard. The cheat sheet is a very helpful resource.

You can configure mypy inside setup.cfg:

[mypy]
strict = true

# I prefer to be explicit about ignoring packages which do not yet have types:
[mypy-psycopg2.*]
ignore_missing_imports = True

Dependency and virtualenv management: poetry

Unfortunately because of the way pip installs dependencies, you have to deal with virtualenv in most cases (although things might change rapidly with pdm).

Nowadays I'd recommend using poetry. It is not yet absolutely perfect, but it provides a very elegant CLI API.

It's super easy to start a project:

poetry new service-name
cd service-name
$EDITOR pyproject.toml
rm -Rf tests
mv README.rst README.md

poetry add sqlalchemy  # for instance

Then either go into a virtualenv-enabled shell with poetry shell or prefix your commands with poetry run ....

Linting: flake8

There are two main linters:

I usually rely mostly on flake8 because it has fewer false positives. While pylint is super configurable, it includes too many checks to my taste.

I use the following configuration (in setup.cfg):

[flake8]
max-line-length = 99
extend-ignore =
    # See https://github.com/PyCQA/pycodestyle/issues/373
    E203,

Code autoformatting: black and isort

black autoformats your code so that you don't have to think about it.

isort is a nice addition to black, it will sort your imports to comply with PEP 8:

Sort alphabetically
Group into standard imports, third-party imports, app imports

Here's my isort config to comply with flake8 and black (in pyproject.toml):

[tool.isort]
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
ensure_newline_before_comments = true
line_length = 88

Test: pytest with coverage

pytest has a lot of magical features but it makes writing tests so efficient. The fixture system is brilliant and super powerful. Using plain assert instead of having to learn an assertEqual metalanguage makes your life more meaningful.

Here's the config I use (in pyproject.toml):

[tool.pytest.ini_options]
# Personal preference: I am too used to native traceback
addopts = "--tb=short"

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "if __name__ == .__main__.:",
    "nocov",
    "if TYPE_CHECKING:",
]

[tool.coverage.run]
# Activating branch coverage is super important
branch = true
omit = [
  # add your files to omit here
	]

I usually use the following plugins and lib:

pytest-factoryboy use factories to create your fixture. Super powerful, and avoids having a single file where all your reusable fixtures are defined a thousand times with different variations.
pytest-mock makes it easier to work with unittest.mock.
pytest-cov provides coverage reports for your tests.
doubles: sadly not maintained anymore (but it still works!), it provides a much simpler and stricter mocking experience than unittest.mock.
requests-mock to check integration with HTTP services called with requests. It automatically integrates with pytest and provides a fixture named requests_mock.

Checking docstring: pydocstyle

pydocstyle enforces PEP 257 for docstring styling.

Here's my config (in setup.cfg):

[pydocstyle]
# Do not require any docstring
ignore = D100,D101,D102,D103,D104,D105,D106,D107,D213,D203

Logging: structlog

structlog is a must-have for all your logging needs.

from structlog import get_logger


logger = get_logger(__name__)

def hello(name: str):
		logger.info("saying hello", name=name)
		# instead of :
		# logger.info("saying hello to %s", name)

Structuring your logs has numerous advantages:

Immediately parsable by automated tools (kibana, m/r jobs, etc.)
Easier to write: you don't have to think about the order of your logging message
Flexible: can be further manipulated since all log messages are dicts until they're displayed

You can create yourapp.lib.log:

import json
import logging
from typing import Any, Dict
from uuid import UUID

import structlog

from yourapp.config import config


def default(obj: Any) -> Any:
    if isinstance(obj, UUID):
        return str(obj)

    raise TypeError(f"Can't serialize {type(obj)}")


def dumps(*args: Any, **kwargs: Any) -> str:
    kwargs.pop("default", None)
    return json.dumps(*args, **kwargs, default=default)


def add_version(
    logger: logging.Logger, method_name: str, event_dict: Dict[str, Any]
) -> Dict[str, Any]:
    """Add version to log message."""
    event_dict["version"] = config.git_commit_short
    return event_dict


class ConsoleRenderer(structlog.dev.ConsoleRenderer):
    def _repr(self, val: Any) -> str:
        # Display shorter uuid
        # https://www.structlog.org/en/stable/_modules/structlog/dev.html#ConsoleRenderer
        if isinstance(val, UUID):
            return str(val)
        return super()._repr(val)


def configure_logger(level: str = "INFO", *, console: bool = False) -> None:
    """Configure logging.

    console should be True for console (dev) environment.
    """
    # see https://stackoverflow.com/questions/37703609/using-python-logging-with-aws-lambda
    root = logging.getLogger()
    if root.handlers:
        for handler in root.handlers:
            root.removeHandler(handler)
    logging.basicConfig(format="%(message)s", level=level)

    if not console:
        processors = [
            add_version,
            structlog.stdlib.filter_by_level,
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M.%S"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.JSONRenderer(serializer=dumps),
        ]
    else:  # nocov
        processors = [
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.stdlib.PositionalArgumentsFormatter(),
            structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M.%S"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            ConsoleRenderer(),
        ]

    structlog.configure(
        processors=processors,  # type: ignore
        wrapper_class=structlog.stdlib.BoundLogger,
        logger_factory=structlog.stdlib.LoggerFactory(),
        cache_logger_on_first_use=True,
    )

Configuration: Pydantic's BaseSettings

Like most people, I usually ended up having my own mechanism for handling configuration. Thanks to the web framework fastapi, I've discovered that pydantic provides a very handy BaseSettings class that relies on environment variable for its configuration. BaseSettings provide many things that would be annoying to implement from scratch:

Type hints
Read from environment variables
Validate configuration values
.env support with python-dotenv
Secrets support

import os
from pathlib import Path
from typing import List, Optional

from dotenv import load_dotenv
from pydantic import BaseSettings

ENV_FILENAME = os.environ.get("DOTENV", ".env")


class MisconfiguredException(Exception):
    pass


class Config(BaseSettings):
    # Please use env_name ONLY for informational purpose (see docs)
    env_name: str
    git_commit_short: str = "unknown"

    # Activate this to get profiling - see documentation.
    is_db_enabled: bool = False

    db_user: str = "unconfigured"
    db_password: str = "unconfigured"
    db_name: str = "unconfigured"
    db_port: str = "5432"
    db_host: str = "localhost"

    sentry_dsn: Optional[str]


def get_config() -> Config:
    """Get the config."""
    # We follow serverless's dotenv plugin's behavior here:
    # https://www.npmjs.com/package/serverless-dotenv-plugin

    # First load .env
    load_dotenv(dotenv_path=".env")

    if not Path(ENV_FILENAME).exists():
        raise ValueError(f"Config file {ENV_FILENAME} does not exist.")

    if ENV_FILENAME.endswith(".local"):
        raise ValueError(
            "Expected env filename like '.env.dev', "
            f"got override ending with .local instead: {ENV_FILENAME!r}. "
            f" Try with {ENV_FILENAME.replace('.local', '')!r}"
        )

    # Then load .env.{env}
    load_dotenv(dotenv_path=ENV_FILENAME)

    # Then load .env.{env}.local if it exists
    override = ENV_FILENAME + ".local"
    if Path(override).exists():
        load_dotenv(dotenv_path=override)

    return Config()


config = get_config()

Now you just have to run your commands like this:

DOTENV=.env.test poetry run pytest .

Error reporting: Sentry

Sentry is a service that provides exception monitoring. Its SDK is very simple to integrate.

Usually, I use the following pattern:

# app.py
from app.lib.log import configure_logger
from app.lib.sentry import configure_sentry

configure_logger(config.log_level)
configure_sentry()

You need a route to test the integration (example below is for Flask):

@app.server.route("/health/sentry", methods=["GET"])
def get_sentry_error() -> str:
    """Raise and report an exception to test Sentry integration."""

    class TestSentryException(Exception):
        pass

    try:
        raise TestSentryException("Testing Sentry integration")
    except TestSentryException as exc:
        sentry_sdk.capture_exception(exc)

    return "Reported Sentry errror"

# app.lib.sentry
from typing import Any

import sentry_sdk
from structlog import get_logger

from app.config import config

logger = get_logger(__name__)


# This example is for Flask, adapt to your HTTP framework
@app.route("/health/sentry", methods=["GET"])
def get_sentry_error() -> str:
    """Raise and report an exception to test Sentry integration."""
    logger.info("info: testing Sentry integration")


    class TestSentryException(Exception):
        pass

    try:
        raise TestSentryException("Testing Sentry integration")
    except TestSentryException as exc:
        sentry_sdk.capture_exception(exc)

    return "Reported Sentry error"

Documentation: Docusaurus

Docusaurus is a powerful yet simple documentation management tool.

Sphinx is another great choice (especially if you want to get Python code auto-documentation).

Domain models, data validation: Pydantic

Pydantic Models provides a flexible way to define your domain model objects:

Type hinting
Validators
Export to json-schema

You can also use Python's standard lib dataclass together with something like marshmallow.

ORM: sqlalchemy

If you need to interact with the DB, sqlalchemy is a very safe choice. It comes with loads of features and is the most used non-Django Python ORM, which means that you'll find Stack Overflow solution for all your problems. Using alembic for DB migrations is the next logical move.

Both libraries are written by the insanely productive Mike Bayer.

Web framework: fastapi

You have a lot of excellent choices when it comes to Python web framework. I usually prefer microframework and am currently developing with fastapi, which is a lot of fun to work with.

I usually refrain from using any plugins that come with the framework, because they are usually too coupled to the framework and the context of an HTTP request. I might write an article about my preferred setup.

Utility functions for functional programming: toolz

While Python is not a strict functional programming language, it is possible to write FP-styled code with it.

toolz is a great companion and provides many utility functions that make writing code easier. It has a curried-by-default namespace (from toolz.curried import take). Checkout its cheatsheet.

CLI framework: typer

typer (same author as fastapi and pydantic) leverages type annotations to make it super easy to write powerful scripts with a command line interface:

#!/usr/bin/env python3

"""Say hello.
"""

import typer


def main(name: str) -> int:
    typer.echo(f"Hello {name}")

    password = typer.prompt("password")
    assert len(password) > 8

    return 0


if __name__ == "__main__":
    typer.run(main)

Performance profiling: pyinstrument and sqltap

pyinstrument is a recent Python profiler which can export to HTML.
sqltap integrates with sqlalchemy to allow you to introspect SQL queries and also exports to HTML.

Some services (like Sentry) can also profile code thanks to their SDK. Otherwise, you can also rely on the standard library module profile.

Wishlist

Lots of inconsistencies for where to put configuration: setup.cfg, pyproject.toml, specific files, etc.
The more complex the library, the more useful it would be to have type annotations and... the less probable it is to have those annotations. Libraries such as sqlalchemy (coming in 2.0) and toolz don't have official types for now.
mypy encourages nominal subtyping (see this FAQ) which is a bit sad because it discourages using simple dicts. Fortunately something like PEP 589 TypeDict will improve things.
It would be so nice to avoid having to use virtualenv (even through something like poetry or pipenv). First-time Python developers get so confused about them (compared to Node's simpler node_modules setup). I'm really looking forward to seeing PEP 582 deployed.

Missing something?

Drop me an email at charles at dein.fr if you think I'm missing something!

For more resources related to Python, check out my repo charlax/python-education.