Module containing ExcelSource class.

ExcelSource class handles loading of Excel data.



class ExcelSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Data source for loading excel files.


  • path: The path or URL to the excel file.
  • sheet_name: The name(s) of the sheet(s) to load. If not provided, the all sheets will be loaded.
  • column_names: The names of the columns if not using the first row of the sheet. Can only be used for single sheet excel files.
  • dtype: The dtypes of the columns.
  • `read_excel_kwargs**: Additional arguments to be passed to pandas.read_excel`.


  • TypeError: If the path does not have the correct extension denoting an excel file.
  • ValueError: If multiple sheet names are provided and column names are also provided.
  • ValueError: If sheets are referenced which do not exist in the excel file.

You must install a backend library to read excel files to use this data source. Currently supported engines are “xlrd”, “openpyxl”, “odf” and “pyxlsb”.

By default, the first row is used as the column names unless column_names or the header keyword argument is provided.


def get_data(    self, table_name: Optional[str] = None, **kwargs: Any,)> Optional[pandas.core.frame.DataFrame]:

Loads and returns data from Excel dataset.

Returns A DataFrame-type object which contains the data.


  • ValueError: If the table name provided does not exist.
def get_values(    self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,)> Dict[str, Iterable[Any]]:

Get distinct values from columns in Excel dataset.


  • col_names: The list of the columns whose distinct values should be returned.

Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.


  • ValueError: If the table name provided does not exist.


  • multi_table : bool - Attribute to specify whether the datasource is multi table.
  • table_names : List[str] - Excel sheet names in datasource.