I was daunted by the complexities of projects when I started my data science career. We were using Virutalenv in all our Python projects.
I’m impressed by the Node Package Manager (npm) and always wondered why we don’t have one like that in Python.
I was yearning for a single tool to maintain isolated environments, manage dev and production dependencies, packaging, and publishing.Thankfully, we have Poetry now.
In a nutshell, Poetry is a tool for dependency management and packaging in Python.
But this official definition is incomplete because I found Poetry does more than manage dependencies and packaging.
Poetry is not a substitute for virtual environments. It complements them with intelligent ways to manage environments and more.
Here’s why I fell in love with Poetry at first sight.
Unlike Virtualenv, I can rename and relocate my project with Poetry.
Virtual environments are tied to a specific path. If I move or rename the project folder, the original path doesn’t change.
I’d be in great trouble if I ever wanted to do it.
Whenever I changed the path, I created a new virtual environment and installed packages.
It was painstaking.
Virtualenv has a
--relocatable flag to help with it. But I’m not satisfied with this option either. Whenever I installed a new package, I had to flag the environment.
It’s annoyingly repetitive! I believe data scientists and developers have more significant problems than remembering to run this every time.
Poetry projects are relocatable.
Poetry isolates the virtualenv from the project. It automatically creates an env at the
.cache folder in the $HOME directory. When I relocate the project, I can tell Poetry to use the same env in a single command.
If you prefer to have the env in a custom location, you can specify the path the same way. That way, you can tie it to an external environment.
I find it incredibly useful for testing purposes. I can change the interpreter anytime if my code needs to be compatible with different Python versions.
In Poetry, I can manage development dependencies separately.
This one is an obvious drawback of virtual env.
Virtualenv manages dependencies in an isolated environment. But they don’t maintain a unique set of them for development only.
Yet, Python packages such as black, flake8, and isort are only needed for development. They have no purpose in a production server.
I also have to be extra careful about security leakages with development packages on a production server.
I usually maintain two requirements.txt files to differentiate them. But this practice is highly ineffective. I can use pip freeze to update the development version. But for the production one, I have to edit it manually.
That’s enough reason to ruin the whole day with frustration.
On the other hand, poetry has intelligent ways of managing project dependencies.
When adding a new package to the project, I can specify if it’s only for development using the
When I install dependencies on a production server, I can use the
no-dev flag to filter out dev dependencies.
I can also remove redundant packages I was using in the past with the
Managing dependencies for Python projects haven’t been easier.
Poetry ensures consistent versioning among team members.
If you work as a team, you’d already have experienced problems because of inconsistencies.
People use different versions of dependencies. Thus the code either breaks or doesn’t give you the expected results.
Of course! The pip freeze command does capture the versions of packages. But they don’t grab the Python interpreter version. Also, if you add a package manually to the requirements file and don’t specify the version, It’ll create inconsistencies.
Poetry has a clever way of maintaining consistency.
pyproject.tomlfile is the equivalent of a requirement.txt in virtualenv. But when Poetry installs a package, it first checks if there is a
poetry.lock file available. If so, it’ll fetch dependencies from the lock file.
You don’t edit the lock file manually. Poetry creates and updates it every time you alter project dependencies. You can either use the poetry add command or specify dependencies on the TOML file and run the install command.
Poetry docs encourage you to commit the lock file to your code repository and share it with other members. Thus when they set up dependencies, it’s always in sync with others.
Packaging and publishing Python packages.
If you publish packages to PyPI or other repositories, you have to build them in a way that helps to index them. It hasn’t been an easy task for me.
There are lots of configurations involved, and they certainly discourage new authors.
Yet, I could publish packages to any repository with Poetry for much less effort.
Here’s a package I published to PyPI using Poetry. It did not take more than a couple of minutes to do it.
This package helps you generate HTML analysis reports for any dataset in a single terminal command.
Guess what? It’s pip installable:
You can find the source code in this GitHub repository. Also, if you like this package, you may want to check out my article about it.
Before wrapping up, I want to take you through the exact steps I followed to publish this package.
Let’s create and publish a Python project with Poetry.
Unlike Virtuelenvs, where you create the project folder and the env, I can make the Poetry project straightaway. Poetry automatically puts a project structure and initial files.
Next, I installed the project’s core and dev dependencies with the
I will not explain how I used the dev dependencies to keep this post concise. But you can find countless resources to use these packages to maintain a clean code.
I then added a file inside the python_eda folder named
main.py. I replaced its content with the code from my previous post. This is the entry point to everything in my application.
Up to this point, everything is an ordinary Python application.
Now, let’s add a small code snippet to the
pyproject.toml file to tell Poetry, which is your entry point.
What do we do here? We call the app in the main.py that’s in the python_eda folder.
Now with one command, you can build the app.
This will create a
dist folder inside your project with wheel and tar files of your project.
To test the project locally, you can run
poetry install , and you’ll be able to use the CLI to generate EDA reports.
You need an account and create an API token to publish your package to PyPI. You can find more information in the official docs.
Once you have the API token, you only need two more lines of commands.
Awesome! The app is published.
python-eda is available for installation through pip.
For many years, I’ve used Virtualenv on every project. The only advantages of using them were an isolated environment and listing out project dependencies.
But even then, there had been several issues using it, such as
- difficult to differentiate between development and production dependencies;
- unable to relocate or rename project folder;
- Difficulty in maintaining consistent environments between teams, and;
- Lots of boilerplates when packaging and publishing.
Poetry comes in as a one-stop solution for all of these problems. It fulfills my long craving for an npm-like package manager for Python.
Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.