I have an excel file that has a small table with the title on a single sheet. (Like this)
Requirement For each table, I need a CSV file with its title.
Try this code. First, I try to get data in dataframe but getting null value dataframe
import pandas as pd
# Install the openpyxl library
from openpyxl import load_workbook
import re
excelList=[]
# Loading our Excel file
wb = load_workbook("AD1.xlsx")
# creating the sheet 1 object
ws = wb.worksheets[0]
# Iterating rows for getting the values of each row
for row in ws.iter_rows(min_row=1, max_row=500, min_col=1, max_col=100):
excelList.append([cell.value for cell in row if cell.value is not None])
# using list comprehension
res = [ele for ele in excelList if ele != []]
df=pd.DataFrame(res)
Any one have suggestion How i achieved my goal
Example excel link :https://github.com/aleem75321/excle_mutiple_table-/blob/main/AD.XLSX
Given an excel sheet which has the following sub-table properties:
You can employ the following approach to create a list of tables in the form of tuples where the first tuple entry is the table title, the second tuple is the resulting dataframe.
The following function will implement the above approach
import pandas as pd
from dataclasses import dataclass
def extractTables(filepath: str, tableWidth: int=3,
startRow: int=2) -> list[tuple[str, pd.DataFrame]]:
"""
Given a path to an excel file with multiple tables on a single sheet
Return a list of tuples containing the Table Name and Table contents
as a dataframe, the width of each table is defined by tablewidth,
the start od table data is defined by startRow
"""
@dataclass
class Table:
start: int # Starting column of Table
name: str # Table Title
def getTableTitles(df: pd.DataFrame) -> list[Table]:
# return list of table titles and column number where table starts
tlts = []
for c, n in enumerate(df.columns):
if 'Unnamed' not in n:
tlts.append(Table(c, n))
return tlts
rslt = [] # response repository
titles = getTableTitles(pd.read_excel(filepath))
for tbl in titles:
df = pd.read_excel(filepath, skiprows=startRow,
usecols=[tbl.start + x for x in range(tableWidth)])
for i, ct in enumerate(list(df.columns)):
if '.' in ct:
df.columns.values[i] = ct[:ct.index('.')]
df.dropna(axis=0, how='all', inplace=True)
rslt.append((tbl.name, df))
return rslt
If the sub-table names are 'Site 1', 'Site 2', and 'Site 3' using the above function on the excel file will produce something similar to:
[('Site 1',
Pages From To
0 12 0 16
1 14 17 34
2 16 35 48
3 18 49 64
4 20 65 80
5 22 81 96
6 24 97 112
7 26 113 128
8 28 129 144
9 30 145 160
10 32 161 176
11 34 177 192
12 36 193 224
13 38 225+ -),
('Site 2',
Pages From To
0 12 0 16
1 14 17 34
2 16 35 48
3 18 49 64
4 20 65 80
5 22 81 96
6 24 97 112
7 26 113 128
8 28 129 144
9 30 145 160
10 32 161 176
11 34 177 192
12 36 193 224
13 38 205+ -),
('Site 3',
Pages From To
0 12.0 0 16
1 14.0 17 34
2 16.0 35 48
3 18.0 49 64
4 20.0 65 80
5 22.0 81 96
6 24.0 97 112
7 26.0 113 128
8 28.0 129 144
9 30.0 145 160
10 32.0 161+ -)]