menu

Questions & Answers

excel file which has a small table with the title on a single sheet. I need csv file of each table with the name of it Title

I have an excel file that has a small table with the title on a single sheet. (Like this) enter image description here

Requirement For each table, I need a CSV file with its title.

Try this code. First, I try to get data in dataframe but getting null value dataframe

enter image description here

import pandas as pd
# Install the openpyxl library
from openpyxl import load_workbook
import re
excelList=[]
# Loading our Excel file
wb = load_workbook("AD1.xlsx")
  
# creating the sheet 1 object
ws = wb.worksheets[0]
  
# Iterating rows for getting the values of each row
for row in ws.iter_rows(min_row=1, max_row=500, min_col=1, max_col=100):
  excelList.append([cell.value for cell in row if cell.value is not None])

# using list comprehension
res = [ele for ele in excelList if ele != []]

df=pd.DataFrame(res)

Any one have suggestion How i achieved my goal

Example excel link :https://github.com/aleem75321/excle_mutiple_table-/blob/main/AD.XLSX

Comments:
2023-01-22 00:30:13
In your excel file is each table a defined range?
2023-01-22 00:30:13
the range is not defined. If you see my example excel there was a column whose name is Pages. Its max page level is 40
Answers(1) :

Given an excel sheet which has the following sub-table properties:

  1. All Table Titles are in the First row.
  2. All Tables are of equal width.
  3. The sub-tables within the sheet are not defined by excel ranges or tables.

You can employ the following approach to create a list of tables in the form of tuples where the first tuple entry is the table title, the second tuple is the resulting dataframe.

  • First load the file into a single dataframe, and extract the sub-table titles and there starting columns.
  • Then reload the file for each sub-table limiting the columns read to only the columns needed for each individual table.

The following function will implement the above approach

import pandas as pd
from dataclasses import dataclass
def extractTables(filepath: str, tableWidth: int=3, 
                  startRow: int=2) -> list[tuple[str, pd.DataFrame]]:
    """ 
       Given a path to an excel file with multiple tables on a single sheet
       Return a list of tuples containing the Table Name and Table contents
       as a dataframe, the width of each table is defined by tablewidth,
       the start od table data is defined by startRow
    """
    @dataclass
    class Table:
        start: int  # Starting column of Table
        name: str   # Table Title
            
    def getTableTitles(df: pd.DataFrame) -> list[Table]:
        # return list of table titles and column number where table starts
        tlts = []
        for c, n in enumerate(df.columns):
            if 'Unnamed' not in n:
                tlts.append(Table(c, n))
        return tlts
    rslt = []                    # response repository
    titles = getTableTitles(pd.read_excel(filepath))
    for tbl in titles:
        df = pd.read_excel(filepath, skiprows=startRow,
                          usecols=[tbl.start + x for x in range(tableWidth)])
        for i, ct in enumerate(list(df.columns)):
            if '.' in ct:
                df.columns.values[i] = ct[:ct.index('.')]
        df.dropna(axis=0, how='all', inplace=True)
        rslt.append((tbl.name, df))
    return rslt

If the sub-table names are 'Site 1', 'Site 2', and 'Site 3' using the above function on the excel file will produce something similar to:

[('Site 1',
      Pages  From   To
  0      12     0   16
  1      14    17   34
  2      16    35   48
  3      18    49   64
  4      20    65   80
  5      22    81   96
  6      24    97  112
  7      26   113  128
  8      28   129  144
  9      30   145  160
  10     32   161  176
  11     34   177  192
  12     36   193  224
  13     38  225+    -),
 ('Site 2',
      Pages  From   To
  0      12     0   16
  1      14    17   34
  2      16    35   48
  3      18    49   64
  4      20    65   80
  5      22    81   96
  6      24    97  112
  7      26   113  128
  8      28   129  144
  9      30   145  160
  10     32   161  176
  11     34   177  192
  12     36   193  224
  13     38  205+    -),
 ('Site 3',
      Pages  From   To
  0    12.0     0   16
  1    14.0    17   34
  2    16.0    35   48
  3    18.0    49   64
  4    20.0    65   80
  5    22.0    81   96
  6    24.0    97  112
  7    26.0   113  128
  8    28.0   129  144
  9    30.0   145  160
  10   32.0  161+    -)]
Comments:
2023-01-22 00:30:13
thanks for the replay see my original xlsx which has an unequal size of the table github.com/aleem75321/excle_mutiple_table-/blob/main/AD.XLSX
2023-01-22 00:30:13
@itorih66 see example file from above link