Creating Wordclouds in Python: The Quick and Easy Guide

May 7, 2022

Word clouds are a great way to visualize text data. And python makes it easy to create one. In this post, we'll go over an example of using the Wordcloud library to generate word clouds.

If you don't know what it is, a word cloud visualizes how often words appear in a given piece of text. The more often a word appears, the larger it will be in the word cloud.

Word clouds are great for finding out customer sentiment from reviews. Also, some websites use word clouds to show the most popular topics on the site.

You can use the Monkeylearn word cloud generator tool for offline use cases. But you'd often have to generate word clouds dynamically or in large batches.

So, let's get started!

1. Install the Wordcloud library

Wordcloud is a free and open-source Python library. As of writing this post, Wordcloud's GitHub repo has 8.7k stars and 2.2k forks, and 62 people contribute to it.

You can install the Wordcloud library from PyPI.

pip install wordcloud

If you're using Poetry to manage Python packages, you can use the following command.

poetry add wordcloud

2. Load Data

For this tutorial, we'll be using the following text.

"Python is a widely used high-level interpreted language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales."

You can set this as a variable in Python as shown below.

text = """
Python is a widely used high-level interpreted language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.
"""

However, you might have to load a lot more data than this. For instance, if you have a dataset of reviews, you have to load them in a different way.

You can create word clouds from a Pandas Data Frame. The following code prepares a combined review text by

  • reads a pandas dataframe from a CSV
  • join all the user reviews
  • assign it to a new variable 'text'
text = pd.read_csv("data.csv").Reviews.str.cat()

If you don't have a pandas dataframe, you can also read the text from a file.

with open("data.txt", "r") as file:
   text = file.read()

Either way, you should end up with a string of text.

3. Generate the word cloud

Now that we have the text data loaded, let's move on to creating the word cloud.

Creating a word cloud is easy with the Wordcloud library. The following code will create a word cloud from the text we loaded earlier.

from wordcloud import WordCloud

import matplotlib.pyplot as plt
% matplotlib inline

wordcloud = WordCloud().generate(text)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

First, we import the WordCloud class from the word cloud library. Then, we also import matplotlib. We use the %matplotlib inline magic command so that the word cloud appears inline in the notebook.

Then, we create a WordCloud instance and generate the word cloud using the text variable.

Finally, we use the plt.imshow() function to display the word cloud. The word cloud is displayed using the default settings.

Wordcloud generated in Python

If you want to change the appearance of the word cloud, you can use different settings. For example, you can change the background color, the max_words, the max_font_size, and so on.

The following code shows how to change the background color to #e2e1eb and the max_words to 10.

wordcloud = WordCloud(background_color="#e2e1eb", max_words=10).generate(text)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

As you can see, the word cloud looks different with these settings. Play around with the settings to get the word cloud that you want.

Create Word clouds in a shape

You could get creative with creating word clouds. You can set the mask option to an image to get the word clouds created in shape. The masking image should have a black object on a white background. Here's how we made our word cloud into a heart shape.

from PIL import Image
import numpy as np

mask_img = np.array(Image.open("./Untitled design.png"))

wordcloud = WordCloud(background_color="#e2e1eb", max_words=100, mask=mask_img).generate(text)

The above code will result in the following image.

Word cloud created in a heart shape using Python

Conclusion

Word clouds are an excellent way to communicate what are the hot topics.

We've briefly discussed how we can create a word cloud and export it into PNG using Python. We've also created word clouds in different shapes.

Here's what the complete code will look like.

import pandas as pd
from wordcloud import WordCloud

import matplotlib.pyplot as plt
% matplotlib inline

text = """
Python is a widely used high-level interpreted language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.
"""

# Read and convert an mask image. It should have a white (not transparent) background with a black object.
mask_img = np.array(Image.open("./heart.png"))

#
wordcloud = WordCloud(background_color="#e2e1eb", max_words=100, mask=mask_img).generate(text)

# store to file
wordcloud.to_file("wc.png")

# Show the image
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us