How Do I Read Multiple Excel Sheets in Python? | 1
|

How Do I Read Multiple Excel Sheets in Python?

Pandas have a few limitations when working with multiple sheets in an Excel file. However, there is a small hack that can help us work around these limitations and make working with numerous sheets more efficient.

One of the challenges when working with multiple sheets in Pandas is keeping track of which sheet data is coming from. We can alleviate this issue by creating a dictionary of sheet names and DataFrames. We can reference the sheets by name instead of remembering which sheet number corresponds to which DataFrame.

Related: A Better Way to Summarize Pandas Dataframes.

Another limitation when working with multiple sheets in Pandas is that the read_excel() function can only read one sheet at a time. If we want to read data from various sheets, we must use the read_excel() function multiple times.

The trick to efficiently reading Excel files with multiple sheets in Pandas

We can work around this limitation by setting the sheet name argument to None in the read_excel function.

df_dict = pd.read_excel('/path/to/exel.xlsx', sheet_name=None)
Python

The above function creates a dictionary with sheet names in the Excel files as keys and dataframe as values. You can now access the dataframe with its sheet name.

{
  'sheet1': pd.DataFrame,
  'sheet2': pd.DataFrame,
  'sheet3': pd.DataFrame,
}
Python

df1 = df_dict.get('sheet1')
Python

Combining multiple sheets in an Excel file using Pandas

Often data scientists and analysts receive Excel files with multiple sheets. However, they are just a logical partitioning of a larger dataset in most cases.

For instance, a dataset is partitioned by month and stored in individual sheets rather than one big file.

It is also possible to conveniently combine sheets in an Excel file using the values method of the df_dict.

df = pd.concat(df_dict.values())
Python

Please don’t loop through sheet names to read them

Before I learned this trick, I was, in fact, looping through the sheet names and reading them individually. The code snippet to do this would look like the one below.

# Create an ExcelFile object by reading the file
f = pd.ExcelFile('path/to/excel.xlsx')

# Create a placeholder array to store the dataframes
dfs = []

# Iterate through each sheet's name
for sheet in f.sheet_names:
    
    # Parse the sheet and convert it to a Pandas dataframe
    df = f.parse(sheet)

    # append it to our dataframes collection
    dfs.append(df)
Python

This code does the job with not much difficulty. However, we can eliminate these extra lines of code by simply setting sheetname=None when reading the file.

Similar Posts