Skip to main content

dataset_operations

Dataset-related transformations.

This module contains the base class and concrete classes for dataset transformations, those that potentially act over the entire dataset.

Classes

CleanDataTransformation

class CleanDataTransformation(    *,    name: Optional[str] = None,    output: bool = True,    cols: Union[str, List[str]] = 'all',):

Dataset transformation that will "clean" the specified columns.

For continuous columns this will replace all infinities and NaNs with 0. For categorical columns this will replace all NaN's with "nan" explicitly.

Method generated by attrs for class CleanDataTransformation.

Variables

  • static cols : Union[str, List[str]]
  • static output : bool

DatasetTransformation

class DatasetTransformation(    *,    name: Optional[str] = None,    output: bool = True,    cols: Union[str, List[str]] = 'all',):

Base transformation for all dataset transformation classes.

User can specify "all" to have it act on every relevant column as defined in the schema.

Arguments

  • output: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.
  • cols: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.

Raises

  • ValueError: If output is False.

Method generated by attrs for class DatasetTransformation.

Variables

  • static cols : Union[str, List[str]]
  • static output : bool

NormalizeDataTransformation

class NormalizeDataTransformation(    *,    name: Optional[str] = None,    output: bool = True,    cols: Union[str, List[str]] = 'float',):

Dataset transformation that will normalise the specified continuous columns.

Arguments

  • cols: The columns to act on as a list of strings. By default, this transformation will only apply to columns of type float.

If this transformation should be applied to all continuous columns, the cols attribute should be set to 'all'.

Method generated by attrs for class NormalizeDataTransformation.

Variables

  • static cols : Union[str, List[str]]

ScalarAdditionDataTransformation

class ScalarAdditionDataTransformation(    *,    name: Optional[str] = None,    output: bool = True,    cols: Union[str, List[str]] = 'all',    scalar: Union[int, float, Mapping[str, Union[int, float]]] = 0,):

Dataset transformation that adds a scalar to the specified columns.

Transformation applied to the dataset in place. Only applies to continuous columns.

Arguments

  • scalar: the scalar to be used for multiplication. It can be provided as a number, in which case all numerical columns will be multiplied by the respective scalar or as a dictionary mapping column names to scalars for multiplication. Defaults to 0.

Raises

  • TransformationApplicationError: if the scalar variable is not correctly instantiated.

Method generated by attrs for class ScalarAdditionDataTransformation.

Variables

  • static scalar : Union[int, float, Mapping[str, Union[int, float]]]

ScalarMultiplicationDataTransformation

class ScalarMultiplicationDataTransformation(    *,    name: Optional[str] = None,    output: bool = True,    cols: Union[str, List[str]] = 'all',    scalar: Union[int, float, Mapping[str, Union[int, float]]] = 1,):

Dataset transformation that multiplies the specified columns by a scalar.

Transformation applied to the dataset in place. Only applies to continuous columns.

Arguments

  • scalar: the scalar to be used for multiplication. It can be provided as a number, in which case all numerical columns will be multiplied by the respective scalar or as a dictionary mapping column names to scalars for multiplication. Defaults to 1.

Raises

  • TransformationApplicationError: if the scalar variable is not correctly instantiated.

Method generated by attrs for class ScalarMultiplicationDataTransformation.

Variables

  • static scalar : Union[int, float, Mapping[str, Union[int, float]]]