Skip to main content

excel_source

Module containing ExcelSource class.

ExcelSource class handles loading of Excel data.

Classes

ExcelSource

class ExcelSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Data source for loading excel files.

Arguments

  • path: The path or URL to the excel file.
  • sheet_name: The name(s) of the sheet(s) to load. If not provided, the all sheets will be loaded.
  • column_names: The names of the columns if not using the first row of the sheet. Can only be used for single sheet excel files.
  • dtype: The dtypes of the columns.
  • `read_excel_kwargs**: Additional arguments to be passed to pandas.read_excel`.

Raises

  • TypeError: If the path does not have the correct extension denoting an excel file.
  • ValueError: If multiple sheet names are provided and column names are also provided.
  • ValueError: If sheets are referenced which do not exist in the excel file.
info

You must install a backend library to read excel files to use this data source. Currently supported engines are “xlrd”, “openpyxl”, “odf” and “pyxlsb”.

  • ::: :::info

By default, the first row is used as the column names unless column_names or the header keyword argument is provided.

Methods


def get_data(    self, table_name: Optional[str] = None, **kwargs: Any,)> Optional[pandas.core.frame.DataFrame]:

Loads and returns data from Excel dataset.

Returns A DataFrame-type object which contains the data.

Raises

  • ValueError: If the table name provided does not exist.
def get_values(    self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,)> Dict[str, Iterable[Any]]:

Get distinct values from columns in Excel dataset.

Arguments

  • col_names: The list of the columns whose distinct values should be returned.

Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.

Raises

  • ValueError: If the table name provided does not exist.

Variables

  • multi_table : bool - Attribute to specify whether the datasource is multi table.
  • table_names : List[str] - Excel sheet names in datasource.