Great projects start as a single file script and evolve into a community-maintained framework. But few projects make it to this level. Most, regardless of their usefulness to others, end up not being used by anyone.

The critical factor that makes your project convenient (or miserable) for others is its structure.

What is a perfect Python project structure that works well?

  • Great projects are always version-controlled. Use git (or mercurial.)
  • They should have a dependency management system. Virtualenv (or conda) is not one.
  • They have automated clean code practices. You can make ugly coding impossible for all your team members.
  • Great projects will always have a README and give more context.
  • There should be a Config file ( a YAML or a TOML file). Separate the software parameters from hardcoding.
  • Secrets should be on an environment (.env) file. You don't want the world to know about your secrets.
  • They should have neat documentation (Optional.)

This article will go through all of them and set up your python project for maximum maintainability. If this seems overwhelming, you can straightaway use the blueprint I have created for you.

Make every project a git repository.

You can make a project a git repository with a single line of command. It wouldn't take a minute to do this.

git init

But we don't do this so often. In most cases, we tend to think what we write a simple script for a simple problem. That's okay.

What's simple for you is complex for others. Someone on the other side of the planet is working on a problem for which your simple script is the solution.

Also, your simple problem may absorb other simple issues along the way and become monstrous.

As it grows, it becomes increasingly difficult to maintain your code. You have no visibility into what changes you made and why you did them. If you're a team, then comes the problem of who and when they made those changes.

Suddenly, you start to worry about the script you initially wrote instead of feeling proud of it.

Start every single line of code as if it's the beginning of the next Facebook. And you need a version control system to do it.

As a first step, make your project a git repository and include a .gitignore file as well. You can generate an ignore file using the online tool gitignore.io.

.
├── .git
│   ├── <Git managed files>
│  
├── .gitignore

As you progress, make sure you commit your changes with an illustrative message. Commits are checkpoints at different times. They are indeed versions of your software you can check out at any time.

How to write a good commit message?

A good commit message should complete the sentence, "if applied, this commit will ..." They should be in sentence case but without a trailing period. An optimal length for a commit message is about 50 characters.

The following is an example commit message.

git commit -am 'Print a hello world message'

You can also create it with more details. You can run git commit without a commit message. This will open up an editor where you can add multi-line commit messages. Yet, use the above convention to create a title of your commit message. You can use a blank line to separate the title and the body of your message.

Print a hello <user> message

Print a hello world message and a hello <user> message

The main function was hardcoded with 'hello world' message.
But we need a dynamic message that takes the an argument and greet.

Amend the main function to take an argument and string formating to
print hello <user> message

These commit message conventions to make it easy to skim through the git log of all the changes you made.

Git history with clean commit messages
Git history with clean commit messages

Use a dependency management tool.

Most developers, especially the new ones, don't pay enough attention to the project dependencies. What's dependency management in the first place?

The software you develop may depend on packages other developers created. They may, in turn, have dependencies on several different packages. This modular approach helps create software products quickly without reinventing the wheel all the time.

Dependencies, even within the same project, may vary between different environments. Your development team may have a set of dependencies that don't go into the production system.

A sound dependency management system should be able to distinguish between these sets.

Python developers use either a virtual (or conda) environment to install project dependencies. But Virtualenv is not a dependency management tool. It doesn't have the benefits discussed above. It only helps to isolate the project environment from your system.

Poetry is a perfect dependency management tool for Python projects. It allows you to,

  • separate development and production dependencies;
  • set Python version for each project separately;
  • create entry points to your software, and;
  • helps you to package and publish it to repositories such as PyPI.
  • Poetry is not a replacement for Virtualenvs. It creates and manages virtual env with convenient utility commands.

If you love the idea, I published a full-length tutorial about how you can use Poetry to manage project dependencies efficiently.

Automate clean code practices in your Python projects.

Python is the most straightforward programming language. It's close to natural languages yet powerful in its applications.

But that doesn't mean your code is always readable. You may end up writing code that is too lengthy and in a style that is too difficult for others to digest. To address this problem, Python introduced a common standard called PEP 8.

PEP 8 is a set of guidelines for python programmers to code concisely and consistently. It talks about,

  • naming conventions of Python classes, functions, and variables;
  • properly using wight spaces;
  • code layout such as optimal line length, and;
  • conventions about comments;

Though this guideline solves a terrible problem for Python programmers, it's challenging to maintain this manually in a large project.

Luckily packages such as black and autopep8 make it easy to do it with one line of command. Here's a line that formats every file inside the blueprint folder.

black blueprint

Autoflake is another tool that helps you get rid of unused variables in your script. The variables we declare but don't use often create inconvenience reading the code.

autoflake --in-place --remove-unused-variables blueprint/main.py

Lastly, I'd like to mention isort, a Python package that optimizes your imports.

isort blueprint/main.py

All these packages clean up your code in a single line. But, even then, running this every time you make changes to your script is more challenging than we think.

This is why I prefer Git pre-commit hooks.

Using pre-commit hooks, you can configure to run black, autoflake, and isort to format your codebase every time you commit a change.

How to configure pre-commit hooks to automatically format Python codes?

You can install the pre-commit package using Poetry add or pip. You should have a .pre-commit-config.yaml file in your project root. You can configure which hooks to run just before every single commit. Then you'll have to install pre-commit to the git repository. You can do it by running the pre-commit install command from your project root.

poetry add pre-commit
# Create the .pre-commit-config.yaml file
poetry run pre-commit install

Here's how your .pre-commit-config.yaml file should look like.

repos:
  - repo: local
    hooks:
      - id: autoflake
        name: Remove unused variables and imports
        entry: bash -c 'autoflake "$@"; git add -u' --
        language: python
        args:
          [
            "--in-place",
            "--remove-all-unused-imports",
            "--remove-unused-variables",
            "--expand-star-imports",
            "--ignore-init-module-imports",
          ]
        files: \.py$
      - id: isort
        name: Sorting import statements
        entry: bash -c 'isort "$@"; git add -u' --
        language: python
        args: ["--filter-files"]
        files: \.py$
      - id: black
        name: Black Python code formatting
        entry: bash -c 'black "$@"; git add -u' --
        language: python
        types: [python]
        args: ["--line-length=120"]

That's it. Now try making some changes to your code and commit it. You'll be amazed to see how it automatically corrects your coding style issues.

Use a configuration file to separate project parameters.

A configuration file is like the central control panel for your application. A new user to your code will only have to change the configuration file to get it running.

What goes in a configuration file?

We know hard-coding static variables is a bad practice. For example, if you need to set a server URL, you shouldn't put it on the code directly. Instead, what's most suitable is to put it in a separate file and read from it. If you or someone else wants to change it, they only have to do it once, and they know where to do it.

In the early days, we used to read configurations from text files. I've even used JSONs and CSVs too. But we have more evolved alternatives to manage configurations.

A perfect configuration file should be easy to understand and allow comments. I found TOML files are incredible for this matter. Poetry already creates a TOML file to manage its configuration. Thus, I don't have to create a new one.

You can read a TOML file with the toml python package. It only takes a single line of command to convert your configuration file into a dictionary.

Here's how to read a TOML configuration file.

  1. Install toml package. If you're using Poetry to manage dependencies, you can install it using the add command. If not the plain old pip works.
poetry add toml
# If you're still using pip and virtualenv,
# pip install toml
  1. Create (or Edit if it's already there) a TOML file. You can use any name. If you're using poetry, it already creates one called pyproject.toml.
[app]
name='blueprint'
  1. Load it in your python project.
import toml
app_config = toml.load('pyproject.toml')

# now you can access the configuration parameters
print(app_config)

# {'app': {'name': 'blueprint'}}

Store secrets in an environment file

A set of confidential items should never go to your code repository. A common practice is to put them on a .env file and read them at run time. You don't commit a .env file to your code repository.

You can use this file to store information such as API keys and database credentials.

It's perfectly fine to use your configuration file to store secrets if you don't commit it to your repository. Also, you could use the .env file as your project configuration file if you don't have many complex configurations.

A critical difference between environment and config files is how you read their values. You can access environment variables anywhere on your project using Python's built-in os module. But configuration file values aren't visible to every module of your project. You'll have to either read the file on every module or read once and pass it along with function arguments.

But I strongly recommend using two separate files as they both serve different purposes. The config file conveniently configures your project without hard coding anything, and .env files store secrets.

You can use the .gitignore to stop your env files from accidentally sneak into your repository. This is one of the things you should do on your day 0.

Here's how to create and read an environment file in Python.

  1. Create a .env file in the project root
SECRET_KEY='R9p9BRDshkwzpsooPEmZS86OWjWxQvn7aPunVexFoDw'
  1. install python-dotenv.
poetry add python-dotenv
# pip install python-dotenv
  1. Load env file to your project.
from dotenv import load_dotenv

load_dotenv()
# If your env file is different from .env you can specify that too,
# load_dotenv('.secrets/.environ')
  1. Access environment variables from anywhere in your project.
import os

print(os.getenv('SECRET_KEY'))
# R9p9BRDshkwzpsooPEmZS86OWjWxQvn7aPunVexFoDw

Environment files are an age-old convention. Hence most technologies support them upfront. Also, you can set environment variables directly on your OS.

Use a README and give additional context.

You should always give some context to someone who reads your code. What is it about, and why you wrote it?

A Readme file is short documentation for your project. It should include instructions for another person to set up your project on their system without your help.

README files are usually markdown files. GitHub, Bit Bucket, and GitLab are using it to render styled documentation on the project repository.

Markdown adds a few conventions to make an ordinary text appear special. For instance, you may add a # mark in front of a line to make it a title. ## to make it a subtitle. Here's a cheat sheet to learn more about markdown.

Short Markdown Cheatsheet to create README files
A short markdown cheetsheet for README files

Help readers with accompanying docs (optional.)

You don't have to have a multipage web app to document every project. Thus, I made this optional. But it's a good idea to have one.

README files are to hold basic information about your application. A doc is helpful to give specific details about your project.

For instance, you may talk about package installation and configuration on the README. But talking about 101 API endpoints you have in your application is not recommended. You may have to organize it better in an HTML doc and host it separately.

Python has several excellent tools to create HTML docs. Mkdocs (and it's the material theme) Sphinx and Swagger are the popular ones to pick. How would you choose the right documentation tool?

If your application is about many API endpoints, Swagger is the best way to generate docs. If not, Mkdocs works well. With Mkdocs, you can create custom pages as well as convert your docstrings using the mkdocstring extension. But in my opinion, you rarely need to create a documentation for your code. YOUR CODE IS A DOCUMENTATION ITSELF!

Here's how to use Mkdocs on Python projects

  1. You can install mkdocs as a dev dependency. You don't need it on a production system.
poetry add -D mkdocs-material
  1. Create new docs pages
mkdocs new docs
cd docs
  1. Edit the index.md file (This too is a markdown file.)
  2. Start mkdocs server to see it on a browser
mkdos serve

Do you know you can host your docs with GitHub Pages for free?

You can also host your documentation if you have a GitHub account and push your master branch to the GitHub repository. It takes one (and only one) command and a few seconds.

mkdocs gh-deploy

The above command will build static versions of your documentation and host it on GitHub Pages. The hosted URL usually be like

https://<Your username>.github.io/<Your repository name/

You can also make it running on your custom domain or subdomain of your main site. Here's a guide from GitHub for custom domains.

You can see the project blueprint's hosted documentation here

How to use the Python project blueprints?

I've created a couple of GitHub repositories that you can use as a starting point for your Python project. The first is a blueprint for Python projects in general, and the other is specifically for Django applications.

Python project blueprint

Django project blueprint

You can clone the repository and start working on it. Whenever you think you should have it on a remote GitHub (or Bitbucket, Gitlab) you can create a remote repository and connect it with your local one. Here's how to do it.

git clone git@github.com:thuwarakeshm/blueprint.git
git remote set-url <Your New Repository>

But there is a better way.

Go to the GitHub links, and you'll see a button at the top right corner saying "fork." Forking allows you to create a copy of the repository that you can own. You can make changes without affecting the original sources at all.

So do fork the repository and clone the new one to your local computer.

These repositories are built with all the best practices I discussed above. But feel free to make it yours. You can change anything you want the way you want.

Final thoughts,

Python is an elegant language. But we need more discipline around it to do excellent projects with it. What if we have most of them automated?

That's what we've just finished discussing.

We talked about why and how important having a git repository is for all of your projects. Then we discussed managing dependencies with Poetry. We learned how to use README files, create documentation, managing configuration and environment files too.

Each of the practices mentioned above themselves worth a 30day course. But I believe this article has given an idea about these techniques and why we use them.

Once you realize the benefits of these practices, you'd want to create projects that support them. I've already made a blueprint for you to don't have to start everything from scratch.

This guide is based on my understandings and experience. If you have anything to add or correct, I'd be more than willing to discuss them with you.


Say Hi to me on LinkedIn, Twitter, and Medium. Please share it in your network if you find this helpful.

A blog about data science, machine learning, artificial intelligence, and analytics by Thuwarakesh Murallie.