excel_source
Module containing ExcelSource class.
ExcelSource class handles loading of Excel data.
Classes
ExcelSource
class ExcelSource( *args: Any, data_splitter: Optional[DatasetSplitter] = None, seed: Optional[int] = None, modifiers: Optional[Dict[str, DataPathModifiers]] = None, ignore_cols: Optional[Union[str, Sequence[str]]] = None, **kwargs: Any,):
Data source for loading excel files.
Arguments
path
: The path or URL to the excel file.sheet_name
: The name(s) of the sheet(s) to load. If not provided, the all sheets will be loaded.column_names
: The names of the columns if not using the first row of the sheet. Can only be used for single sheet excel files.dtype
: The dtypes of the columns.- `read_excel_kwargs
**: Additional arguments to be passed to
pandas.read_excel`.
Raises
TypeError
: If the path does not have the correct extension denoting an excel file.ValueError
: If multiple sheet names are provided and column names are also provided.ValueError
: If sheets are referenced which do not exist in the excel file.
info
You must install a backend library to read excel files to use this data source. Currently supported engines are “xlrd”, “openpyxl”, “odf” and “pyxlsb”.
::
: :::info
By default, the first row is used as the column names unless column_names
or the
header
keyword argument is provided.
Ancestors
Methods
def get_data( self, table_name: Optional[str] = None, **kwargs: Any,) ‑> Optional[pandas.core.frame.DataFrame]:
Loads and returns data from Excel dataset.
Returns A DataFrame-type object which contains the data.
Raises
ValueError
: If the table name provided does not exist.
def get_values( self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,) ‑> Dict[str, Iterable[Any]]:
Get distinct values from columns in Excel dataset.
Arguments
col_names
: The list of the columns whose distinct values should be returned.
Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.
Raises
ValueError
: If the table name provided does not exist.
Variables
multi_table : bool
- Attribute to specify whether the datasource is multi table.
table_names : List[str]
- Excel sheet names in datasource.