Secure Your Streamlit App With Django

May 16, 2022

Secure Your Streamlit App With Django

Streamlit is an excellent tool for data scientists to convert their work into a web app.

In an earlier post, I discussed the basics of creating an app around K-Means clustering. Users can change parameters such as the number of clusters and visualize the groups changing.

You can read the post below and access the source code from this GitHub repo.

Related: How to Create Stunning Web Apps for your Data Science Projects

In this and a few posts in the future, I wanted to share a few exciting ways I’ve solved some challenging issues around Stramlit. This post explains how you could build an authentication system around your Stramlit app.

What’s wrong with Streamlit’s authentication system?

Before using Django, let me first talk about Streamlit’s inbuilt authentication option.

Streamlit is still young and improving. It has a suggested way to authenticate the app using the secrets.toml file.

You specify the users to the app and their passwords in a TOML file and upload it to the deployment environment. Then inside the app, you put a code snippet to lookup and match the password to authenticate.

I see three main shortcomings in this method.

Firstly, storing passwords in a file is insecure. Anyone who has (or gets) access to the deployment environment can also access the passwords.

Second, there is no password hashing. It’s a good practice to store passwords hashed. If hashed, even if someone gets access to read the password, it’ll be a scrambled version of the original. The intruder should know the hashing algorithm used and another secret key to decode it. The other secret key is often challenging to break through brute force methods as it’s a lengthy string — usually around 64 characters.

Finally, we need an interface to manage users to the platform. In Streamlit’s suggested approach, if a new user needs access to the system, a developer should add his credentials to the TOML file.

Besides these three, the inability to manage access control is another reason we need a sophisticated authentication system.

Django has a default admin interface to manage users. It has an authentication mechanism to store users in a database with password hashing. So, let’s use it.

Related: How I Create Dazzling Dashboards Purely in Python.

Let’s build a Django authenticator for Streamlit.

I’ll follow the same project I used in my previous post in this tutorial. You can use the following code to clone it from the GitHub repository.

git clone [email protected]:thuwarakeshm/Streamlit-Intro.git streamdj
cd streamdj
# This will create the following project files
.
├── quickstart.py
├── README.md
└── requirements.txt

Also, let’s create a virtual env and install the requirements.

virtualenv env
source env/bin/activate # On Linux
source env\Scripts\activate # on Windows
pip install -r requirements

Now let’s create a Django app inside the folder.  The folder structure will look like the below. 

django-admin startproject config .
# The project structure will look like this now
.
├── config
│   ├── asgi.py
│   ├── __init__.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── manage.py
├── quickstart.py
├── README.md
└── requirements.txt

Let’s continue to create a superuser to manage all other users.

python manage.py migrate
python manage.py createsuperuser
# Follow the on-screen instructions to create the user

Let’s start the server with python manage.py runserverand login to the admin portal via http://localhost:8000/admin/.

Let’s also create a couple of users to test on our app. Click on the little + button next to the user and add a few.

Adding new user through Django admin site to authenticate Streamlit apps.

Now, we have our users ready. The next step is to tap the Django authentication framework to let users access private pages of the Steramlit app.

Authenticate the Streamlit app with Django.

Django’s WSGI application instance should run inside our Stramlit app for this to work. This is required for Django to establish a database connection and authenticate.

Now we need another function to display the username and password input fields. We can do it by copying the content wsgi.py to our quickstart file where Streamlit runs.

# Other streamlit imports
-----------------------------------------
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')
application = get_wsgi_application()
------------------------------------------
# Rest of the Streamlit content

This function will also call Django’s authenticate function.

def check_password():
    """Returns `True` if the user had a correct password."""

    def password_entered():
        """Checks whether a password entered by the user is correct."""
        user = authenticate(
            username=st.session_state['username'], 
            password=st.session_state['password']
            )
        
        if (user is not None):
            st.session_state["password_correct"] = True
            del st.session_state["password"]  # don't store username + password
            del st.session_state["username"]
        else:
            st.session_state["password_correct"] = False

    if "password_correct" not in st.session_state:
        # First run, show inputs for username + password.
        st.text_input("Username", on_change=password_entered, key="username")
        st.text_input(
            "Password", type="password", on_change=password_entered, key="password"
        )
        return False
    elif not st.session_state["password_correct"]:
        # Password not correct, show input + error.
        st.text_input("Username", on_change=password_entered, key="username")
        st.text_input(
            "Password", type="password", on_change=password_entered, key="password"
        )
        st.error("😕 User not known or password incorrect")
        return False
    else:
        # Password correct.
        return True

We need now to call this function and proceed with the rest of the Streamlit app only if it returns True.

if if check_password():
   # Our regular Streamlit app.

Here’s the complete version of the quickstart.py file.

# Imports
# -----------------------------------------------------------
import os

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import streamlit as st
from django.core.wsgi import get_wsgi_application
from sklearn.cluster import KMeans

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings")

application = get_wsgi_application()

from django.contrib.auth import authenticate


def check_password():
    """Returns `True` if the user had a correct password."""

    def password_entered():
        """Checks whether a password entered by the user is correct."""
        user = authenticate(
            username=st.session_state["username"], password=st.session_state["password"]
        )

        if user is not None:
            st.session_state["password_correct"] = True
            del st.session_state["password"]  # don't store username + password
            del st.session_state["username"]
        else:
            st.session_state["password_correct"] = False

    if "password_correct" not in st.session_state:
        # First run, show inputs for username + password.
        st.text_input("Username", on_change=password_entered, key="username")
        st.text_input(
            "Password", type="password", on_change=password_entered, key="password"
        )
        return False
    elif not st.session_state["password_correct"]:
        # Password not correct, show input + error.
        st.text_input("Username", on_change=password_entered, key="username")
        st.text_input(
            "Password", type="password", on_change=password_entered, key="password"
        )
        st.error("😕 User not known or password incorrect")
        return False
    else:
        # Password correct.
        return True


if check_password():

    sns.set_theme()
    # -----------------------------------------------------------

    # Helper functions
    # -----------------------------------------------------------
    # Load data from external source
    @st.cache
    def load_data():
        df = pd.read_csv(
            "https://raw.githubusercontent.com/ThuwarakeshM/PracticalML-KMeans-Election/master/voters_demo_sample.csv"
        )
        return df

    df = load_data()

    def run_kmeans(df, n_clusters=2):
        kmeans = KMeans(n_clusters, random_state=0).fit(df[["Age", "Income"]])

        fig, ax = plt.subplots(figsize=(16, 9))

        ax.grid(False)
        ax.set_facecolor("#FFF")
        ax.spines[["left", "bottom"]].set_visible(True)
        ax.spines[["left", "bottom"]].set_color("#4a4a4a")
        ax.tick_params(labelcolor="#4a4a4a")
        ax.yaxis.label.set(color="#4a4a4a", fontsize=20)
        ax.xaxis.label.set(color="#4a4a4a", fontsize=20)
        # --------------------------------------------------

        # Create scatterplot
        ax = sns.scatterplot(
            ax=ax,
            x=df.Age,
            y=df.Income,
            hue=kmeans.labels_,
            palette=sns.color_palette("colorblind", n_colors=n_clusters),
            legend=None,
        )

        # Annotate cluster centroids
        for ix, [age, income] in enumerate(kmeans.cluster_centers_):
            ax.scatter(age, income, s=200, c="#a8323e")
            ax.annotate(
                f"Cluster #{ix+1}",
                (age, income),
                fontsize=25,
                color="#a8323e",
                xytext=(age + 5, income + 3),
                bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#a8323e", lw=2),
                ha="center",
                va="center",
            )

        return fig

    # -----------------------------------------------------------

    # SIDEBAR
    # -----------------------------------------------------------
    sidebar = st.sidebar
    df_display = sidebar.checkbox("Display Raw Data", value=True)

    n_clusters = sidebar.slider(
        "Select Number of Clusters",
        min_value=2,
        max_value=10,
    )

    sidebar.write(
        """
        Hey friend!It seems we have lots of common interests. 
        I'd love to connect with you on 
        - [LinkedIn](https://linkedin.com/in/thuwarakesh/)
        - [Twitter](https://www.twitter.com/thuwarakesh/)
        And please follow me on [Medium](https://thuwarakesh.medium.com/), because I write about data science.
        """
    )
    # -----------------------------------------------------------

    # Main
    # -----------------------------------------------------------
    # Create a title for your app
    st.title("Interactive K-Means Clustering")
    """
    An illustration by [Thuwarakesh Murallie](https://thuwarakesh.medium.com) for the Streamlit introduction article on Medium.
    """

    # Show cluster scatter plot
    st.write(run_kmeans(df, n_clusters=n_clusters))

    if df_display:
        st.write(df)
    # -----------------------------------------------------------

Test the authentication flow

Our app is secure now. We need to start the streamlit app. Let’s try it out.

streamlit run quickstart.py

The above command should start the server and open the app in your default browser. Instead of your Streamlit app, now you’ll see the following login page.

Streamlit login page powered by Django authentication system.

You could also see an error message for incorrect and empty passwords. You must type in the correct username and password combination to log in and view your streamlit app.

Final Thoughts

Streamlit is a fantastic solution for Data Scientists. It allows creating web apps around their machine learning models without worrying about HTML, CSS, JavaScript, and other web development complexities.

But the tool is still very young. Thus its authentication feature needs improvement. I trust the amazing development team behind the tool will build it soon.

But for the moment, to securely deploy our Streamlit apps, we can borrow the authentication capabilities from one of the mature web development frameworks.

In this post, we used Django authentication to secure Streamlit apps. It addresses some of the key missing features of Streamlit’s suggested method. Django stores passwords hashed and inside a database. Also, we get a web interface to manage users.

This isn’t complete yet without access control. But that's for a future post.


Thanks for reading, friend! Say Hi to me on LinkedIn, Twitter, and Medium. It seems you, and I have lots of common interests.
Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

Related:

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us