I was daunted by the complexities of projects when I started my data science career. We were using Virutalenv in all our python projects.
I'm impressed by the Node Package Manager (npm) and always wondered why we don't have one like that in Python.
I was yearning for a single tool to maintain isolated environments, manage dev and production dependencies, packaging, and publishing.
Thankfully, we have Poetry now.
In a nutshell, Poetry is a tool for dependency management and packaging in Python.
But this official definition is incomplete because I found Poetry does more than manage dependencies and packaging.
Poetry is not a substitute for virtual environments. It complements them with intelligent ways to manage environments and more.
Here's why I fell in love with Poetry at first sight.
Unlike Virtualenv, I can rename and relocate my project with Poetry.
Virtual environments are tied to a specific path. If I move or rename the project folder, the original path doesn't change with it.
I'd be in great trouble if I ever want to do it.
Every time I changed the path, I created a new virtual environment and installed packages again.
It was painstaking.
Virtualenv has a
--relocatable flag to help with it. But I'm not satisfied with this option either. Every time I installed a new package, I had to flag the environment
It's annoyingly repetitive! I believe data scientists and developers have bigger problems than remembering to run this every time.
Poetry projects are relocatable.
Poetry isolates the virtualenv from the project. It automatically creates an env at the
.cache folder in the $HOME directory. When I relocate the project, I can tell Poetry to use the same env in a single command.
poetry env use <your env location>
If you prefer to have the env in a custom location, you can specify the path the same way. That way you can tie it to an external environment.
I find it incredibly useful for testing purposes. If my code needs to be compatible with different Python versions, I can change the interpreter anytime.
poetry env use python3.8 poetry env use python3.6
In Poetry, I can manage development dependencies separately.
This one is an obvious drawback of virtual env.
Virtualenv manages dependencies in an isolated environment. But they don't maintain a special set of them for development only.
Yet, Python packages such as black, flake8, and isort are only needed for development. They have no purpose in a production server.
I also have to be extra careful about security leakages with development packages on a production server.
I usually maintain two requirements.txt files to differentiate them. But this practice is highly ineffective. I can use pip freeze to update the development version. But for the production one, I have to edit it manually.
That's enough reason to ruin the whole day with frustration.
Poetry, on the other hand, has intelligent ways to manage project dependencies.
When adding a new package to the project, I can specify if it's only for development using the
poetry add -D black
When I install dependencies on a production server, I can use the
no-dev flag to filter out dev dependencies.
poetry install --no-dev
I can also remove redundant packages I was using in the past with the
poetry install --remove-untracked
Managing dependencies for Python projects haven't been easier.
Poetry ensures consistent versioning among team members.
If you are working as a team, you'd already have experienced problems because of inconsistencies.
People use different versions of dependencies. Thus the code either breaks or doesn't give you the expected results.
Of course! The pip freeze command does capture the versions of packages. But, they don't grab the Python interpreter version. Also, if you add a package manually to the requirements file and don't specify the version, It'll create inconsistencies.
Poetry has a clever way of maintaining consistency.
pyproject.tomlfile is the equivalent of a requirement.txt in virtualenv. But when Poetry installs a package, it first checks if there is a
poetry.lock file available. If so, it'll fetch dependencies from the lock file.
You don't edit the lock file manually. Poetry creates and updates it every time you alter project dependencies. You can either use the poetry add command or specify dependencies on the TOML file and run the install command.
poetry add pandas
Poetry docs encourage you to commit the lock file to your code repository and share it with other members. Thus when they set up dependencies, it's always in sync with others'.
Packaging and publishing python packages.
If you publish packages to PyPI or other repositories, you have to build them in a way that helps to index. It hasn't been an easy task for me.
There are lots of configurations involved and they certainly discourage new authors.
Yet, with Poetry, I was able to publish packages to any repository for much less effort.
Here's a package I published to PyPI using Poetry. It did not take more than a couple of minutes to do it.
This package helps you generate HTML analysis reports for any dataset in a single terminal command.
Guess what, it's pip installable:
pip install python-eda
You can find the source code in this GitHub repository. Also, if you like this package, you may want to check out my article about it.
Before wrapping up I want to take you through the exact steps I followed to publish this package.
Let's create and publish a Python project with Poetry.
Unlike Virtuelenvs, where you create the project folder and then the env, I can create the Poetry project straightaway. Poetry automatically puts a project structure and initial files.
poetry init python-eda\ cd python-eda/
Next step, I installed the project's core dependencies and dev dependencies with the
poetry add pandas sweetviz typer -D black flake8 isort pre-commit
I'm not going to explain how I used the dev dependencies to keep this post concise. But you can find countless resources on how you can use these packages to maintain a clean code.
I then added a file inside the python_eda folder named
main.py. I replaced its content with the code from my previous post. This is the entry point to everything in my application.
Up to this point, everything is an ordinary Python application.
Now, let's add a small code snippet to the
pyproject.toml file to tell Poetry, which is your entry point.
[tool.poetry.scripts]\ pyeda = "python_eda.main:app"
What do we do here? We call the app in the main.py that's in the python_eda folder.
Now with one command, you can build the app.
This will create a
dist folder inside your project with wheel and tar files of your project.
To test the project locally, you can run
poetry install , and you'll be able to use the CLI to generate EDA reports.
poetry install\ pyeda report https://raw.githubusercontent.com/ThuwarakeshM/analysis-in-the-blink-of-an-eye/main/titanic.csv
To publish your package to PyPI, you need an account and create an API token. You can find more information from the official docs.
Once you have the API token, you only need two more lines of commands.
poetry config pypi-token.pypi <TOKEN>\ poetry publish
Awesome! The app is published.
python-eda is available for installation through pip.
For many years, I've used Virtualenv on every project. The only advantages of using them were an isolated environment and listing out project dependencies.
But even then, there had been several issues using it, such as
- difficult to differentiate between development and production dependencies;
- unable to relocate or rename project folder;
- Difficulty in maintaining consistent environments between teams, and;
- Lots of boilerplate when packaging and publishing.
Poetry comes in as a one-stop solution for all of these problems. It fulfills my long craving for an npm-like package manager for Python.
Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.