Python Web Apps Are a Terrible Idea for Analytics Projects.

Aug. 1, 2021

It's instinct. We data scientists love Python. Thus we gravitate towards Python frameworks in every application. And the omnipotent language seems to work well in most cases too. One such scenario is building web apps for your analytics projects.

The watchword in that is "most cases." Python is a beautiful language to use in almost any problem. Yet, a closer look may reveal nuances that might make Python irrelevant in some instances.

For many years, I've been a fan of Django. It's the most popular python framework to build web applications. Django comes with everything a typical web framework would need—Authentication, database migrations, the admin interface, and a lot more. Integrating a machine learning algorithm is effortless as both are in the same language.

Yet, I had to change my mind after discovering a terrifying truth about the framework. More specifically, its usage in Data science projects. But it isn't a Django thing; It's Python. You'll have to face the same issue even if you use other Python web frameworks such as Flask.

But before rushing to dump Django/Flask, I had to say it isn't a dead end. We can make Python great again; as a web framework.

In this article, we will discuss

  • a comparison of Flask with ExpressJS to perform a long-running task;
  • investigate why python web apps fall apart in analytics projects, and;
  • workarounds to make Python web apps serve requests better.

Python and JavaScript web apps.

We need a demonstration to grasp the issue with Python compared to JavaScript.

Hence, let's use two popular frameworks in each language to serve the same task—calculating the 30th Fibonacci number. Here is how we do it using Flask (Python.)

from flask import Flask

app = Flask(__name__)


def fib(n):
    return n if n < 2 else fib(n - 1) + fib(n - 2)


@app.route("/")
def hello_world():
    return {"data": fib(30)}

To run the above task, you can use the below commands in a terminal:

pip install flask # if Flask is not installed yet.
flask app.py

Now let's do the same with Express JS (JavaScript):

const express = require("express");
const app = express();
const port = 5000;

const fib = (n) => {
  if (n < 2) return n;
  return fib(n - 1) + fib(n - 2);
};

app.get("/", (req, res) => {
  res.send({ data: fib(30) });
});

app.listen(port, () => {
  console.log(`Example app listening at http://localhost:${port}`);
});

Here is how you can start the server using node:

npm install express --save # if express is not installed already\
node app.js

As you can see, both are similar in every respect. It takes no parameters in the request, calculates the 30th Fibonacci number, and returns the value in response.

We start both apps in development mode in a single thread. It is precisely what we need because we measure their performance in single a single thread.

Finally, let's simulate the real world by firing multiple requests to the server and recording the time elapses. The below code shoots 1000 requests to our servers in 10 parallel threads. The program will also print the average elapse time in milliseconds.

import requests
from multiprocessing import Pool


def fetch(i):
    return requests.get("http://localhost:5000/").elapsed.microseconds


if __name__ == "__main__":
    with Pool(10) as p:
        res_times = p.map(fetch, list(range(1000)))

    avg_time = sum(res_times) / len(res_times) if len(res_times) else 0

    print(f"On average each request took {round(avg_time/1000)} milliseconds.\n\n")

Let's compare the results. Here are the results when the Node server is serving:

NodeJS response time

And here is the same when Flask is serving the requests:

Flask response time

Doing this experiment multiple times will result almost in similar numbers. Express (JavaScript) serves requests nearly four times faster than Flask (Python.)

Calculating the 30th Fibonacci number isn't a long-running task. But it is enough to help us understand the magnitude of the issue.

What makes Python webservers slow?

You go to a restaurant with the name 'Sync' where the waiter's name is Python. A stranger orders his food first. Python went to the kitchen and returned twenty minutes and served the stranger. Then he came to you and asked, "what can I get you, Sir?"

If you order now, you'll have to wait for another twenty minutes.

Out of frustration, you left that restaurant and moved to a different one called 'Async.' There, the name of the water is JavaScript. Again, a stranger orders his food first. JavaScript went to the kitchen and returns in a minute. He gabbled something to the stranger and came along your way and asked, "what can I get you, Sir?"

You ordered your food; the stranger got his one in the next eighteen minutes, and you got yours in twenty.

It is how Python and JavaScript work. Python works on tasks one at a time in a synchronous manner. Yet, JavaScript takes in another request while there is already one in progress. This asynchronous behavior makes JavaScript-based web applications faster.

How to make Python web apps perform better for analytics?

As long as you have only short-living requests or long-running tasks with only a few anticipated requests from users, you're good. Python frameworks like Flask and Django are ideal because you can keep everything in one language—Python.

The complexity arises when you have long-running tasks with significant demand. To keep the server up, you may have to deploy multiple instances and load balance it well. This workaround is not sustainable in most cases. It shoots up costs to a crazy height.

Data science projects often come with such long-running tasks. Heavy computations without asynchronous behavior may demand more computational power. Yet, there are some workarounds you can try before moving into a different framework.

Decouple computation from the request-response cycle.

High-performance computing coupled with web servers is a terrible idea anyway. Even asynchronous servers aren't supposed to serve them within the request-response cycle. If so, how do gigantic platforms such as Instagram and Spotify work with massive computations? Here's an article to answer that question.

Related: How to Serve Massive Computations Using Python Web Apps.

The idea is to send a message to the computation engine that runs on a separate thread. The web server does not need to wait till the computation finishes to send the response. Instead, the computation engine updates the database. The webserver can read its value at any time using a separate endpoint.

Try optimizing your code.

Regardless of which framework you'll be using, it would be best if you always tried to optimize your code. A recent library seems to work fantastic in optimizing Python programs.

Tuplex converts your Python code to native LLVM and runs it in parallel. The team behind the library, in a research article, showed it's about 91X faster. Here is a detailed article I wrote on the topic before:

Related: How to Speed up Python Data Pipelines up to 91X?

Bring in asynchronous behavior.

Both Flask and Django have ways to bring in asynchronous behavior. For potentially time-consuming tasks, this workaround is a wonderful alternative.

Yet, both frameworks mentioned in their documentation that they come with several drawbacks. Hence, using them across all your web requests is not advisable.

The standard Python library, asyncio, help you convert some of your functions asynchronous. Try using it whenever possible.

Because Python is not natively asynchronous, all these workarounds depend on running an infinite event loop. Although they aren't perfect in many respect, it's worth considering them before migrating.

Final Thoughts

Python is a fantastic programing language for data science. Since it's a general-purpose language, Python's applications aren't limited to data science. But it doesn't mean it's the perfect language for every use case.

Because most data scientists already love Python, we choose frameworks like Flask and Django to build web apps. Yet, its synchronous behavior may lead to serious cost disadvantages in production.

JavaScript, which is asynchronous by nature, tends to perform better with long-running tasks. But, python frameworks also have some workaround to improve their performance. Hence it's worth considering them before deciding to migrate.

If you still prefer a Python web app to your data science projects, try streamlit.

Related: How to Create Stunning Web Apps for your Data Science Projects

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us