How Do I Read Multiple Excel Sheets in Python?
Pandas have a few limitations when working with multiple sheets in an Excel file. However, there is a small hack that can help us work around these limitations and make working with numerous sheets more efficient.
One of the challenges when working with multiple sheets in Pandas is keeping track of which sheet data is coming from. We can alleviate this issue by creating a dictionary of sheet names and DataFrames. We can reference the sheets by name instead of remembering which sheet number corresponds to which DataFrame.
Related: A Better Way to Summarize Pandas Dataframes.
Another limitation when working with multiple sheets in Pandas is that the read_excel() function can only read one sheet at a time. If we want to read data from various sheets, we must use the read_excel() function multiple times.
The trick to efficiently reading Excel files with multiple sheets in Pandas
We can work around this limitation by setting the sheet name argument to None in the read_excel function.
df_dict = pd.read_excel('/path/to/exel.xlsx', sheet_name=None)
PythonThe above function creates a dictionary with sheet names in the Excel files as keys and dataframe as values. You can now access the dataframe with its sheet name.
{
'sheet1': pd.DataFrame,
'sheet2': pd.DataFrame,
'sheet3': pd.DataFrame,
}
Pythondf1 = df_dict.get('sheet1')
PythonCombining multiple sheets in an Excel file using Pandas
Often data scientists and analysts receive Excel files with multiple sheets. However, they are just a logical partitioning of a larger dataset in most cases.
For instance, a dataset is partitioned by month and stored in individual sheets rather than one big file.
It is also possible to conveniently combine sheets in an Excel file using the values method of the df_dict.
df = pd.concat(df_dict.values())
PythonPlease don’t loop through sheet names to read them
Before I learned this trick, I was, in fact, looping through the sheet names and reading them individually. The code snippet to do this would look like the one below.
# Create an ExcelFile object by reading the file
f = pd.ExcelFile('path/to/excel.xlsx')
# Create a placeholder array to store the dataframes
dfs = []
# Iterate through each sheet's name
for sheet in f.sheet_names:
# Parse the sheet and convert it to a Pandas dataframe
df = f.parse(sheet)
# append it to our dataframes collection
dfs.append(df)
PythonThis code does the job with not much difficulty. However, we can eliminate these extra lines of code by simply setting sheetname=None when reading the file.