base_source

Module containing BaseSource class.

BaseSource is the abstract data source class from which all concrete data sources must inherit.

Classes

BaseSource

class BaseSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Abstract Base Source from which all other data sources must inherit.

Arguments

data_splitter: Approach used for splitting the data into training, test, validation. Defaults to None.
seed: Random number seed. Used for setting random seed for all libraries. Defaults to None.
modifiers: Dictionary used for modifying paths/ extensions in the dataframe. Defaults to None.
ignore_cols: Column/list of columns to be ignored from the data. Defaults to None.

Attributes

data: A Dataframe-type object which contains the data.
data_splitter: Approach used for splitting the data into training, test, validation.
seed: Random number seed. Used for setting random seed for all libraries.
train_idxs: A numpy array containing the indices of the data which will be used for training.
validation_idxs: A numpy array containing the indices of the data which will be used for validation.
test_idxs: A numpy array containing the indices of the data which will be used for testing.

Ancestors

abc.ABC

Subclasses

Static methods

def init_subclass(    ,) ‑> Callable[[~T, Any, Optional[DatasetSplitter], Optional[int], Optional[Dict[str, DataPathModifiers]], Union[str, Sequence[str], None], Any], None]:

Decorate subclass init to call super class init .

Force all data sources to call super class init to set required attributes.

Methods

def get_column(    self, col_name: str, **kwargs: Any,) ‑> Union[numpy.ndarray, pandas.core.series.Series]:

Implement this method to get single column from dataset.

def get_data(self, **kwargs: Any) ‑> Optional[pandas.core.frame.DataFrame]:

Implement this method to load and return dataset.

def get_dtypes(    self,    **kwargs: Any,) ‑> Dict[str, Union[ExtensionDtype, str, numpy.dtype, Type[Union[str, float, int, complex, bool, object]]]]:

Implement this method to get the columns and column types from dataset.

def get_values(self, col_names: List[str], **kwargs: Any) ‑> Dict[str, Iterable[Any]]:

Implement this method to get distinct values from list of columns.

def load_data(self, **kwargs: Any) ‑> None:

Load the data for the datasource.

We wrap get_data with lru_cache so this method is idempotent so it can be called multiple times with the same arguments without reloading the data.

Raises

TypeError: If data format is not supported.

Variables

data : [pandas.core.frame.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) - Data.

hash : str - The hash associated with this BaseSource.
This is the hash of the static information regarding the underlying DataFrame, primarily column names and content types but NOT anything content-related itself. It should be consistent across invocations, even if additional data is added, as long as the DataFrame is still compatible in its format.
Returns: The hexdigest of the DataFrame hash.

multi_table : bool - Implement this method to define whether the data source is multi-table.

MultiTableSource

class MultiTableSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Abstract base source that supports multiple tables.

Ancestors

Subclasses

Methods

def get_data(    self, table_name: Optional[str] = None, **kwargs: Any,) ‑> Optional[pandas.core.frame.DataFrame]:

Implement this method to loads and return dataset.

Variables

table_names : List[str] - Implement this method to define whether the data source is multi-table.

Inherited members

BaseSource:

base_source

Classes​

BaseSource​

Ancestors​

Subclasses​

Static methods​

Methods​

Variables​

MultiTableSource​

Ancestors​

Subclasses​

Methods​

Variables​

Inherited members​

Classes

BaseSource

Ancestors

Subclasses

Static methods

Methods

Variables

MultiTableSource

Ancestors

Subclasses

Methods

Variables

Inherited members