schema

Classes concerning data schemas.

Classes

BitfountSchema

class BitfountSchema(datasource: Optional[BaseSource] = None, **kwargs: Any):

A schema that defines the tables of a BaseSource.

It lists all the tables found in the BaseSource and the features in those tables.

Ancestors

bitfount.data.schema._BitfountSchemaMarshmallowMixIn

Methods

def add_datasource_tables(    self,    datasource: BaseSource,    table_name: Optional[str] = None,    table_descriptions: Optional[Mapping[str, str]] = None,    column_descriptions: Optional[Mapping[str, Mapping[str, str]]] = None,    ignore_cols: Optional[Mapping[str, Sequence[str]]] = None,    force_stypes: Optional[Mapping[str, MutableMapping[_SemanticTypeValue, List[str]]]] = None,) ‑> None:

Adds the tables from a BaseSource to the schema.

Arguments

datasource: The BaseSource to add the tables from.
table_name: The name of the table if there is only one table in the BaseSource.
table_descriptions: A mapping of table names to descriptions.
column_descriptions: A mapping of table names to a mapping of column names to descriptions.
ignore_cols: A mapping of table names to a list of column names to ignore.
force_stypes: A mapping of table names to a mapping of semantic types to a list of column names.

Raises

BitfountSchemaError: If the schema is already frozen.
ValueError: If a table name hasn't been provided for a single table BaseSource.

def apply(    self, dataframe: pd.DataFrame, keep_cols: Optional[List[str]] = None,) ‑> pandas.core.frame.DataFrame:

Applies the schema to a dataframe and returns the transformed dataframe.

Sequentially adds missing columns to the dataframe, removes superfluous columns from the dataframe, changes the types of the columns in the dataframe and finally encodes the categorical columns in the dataframe before returning the transformed dataframe.

Arguments

dataframe: The dataframe to transform.
keep_cols: A list of columns to keep even if they are not part of the schema. Defaults to None.

Returns The dataframe with the transformations applied.

Raises

BitfountSchemaError: If the schema cannot be applied to the dataframe.

def freeze(self) ‑> None:

Freezes the schema, ensuring no more datasources can be added.

If this schema was loaded from an already generated schema, this will also check that the schema is compatible with the datasources set.

def get_categorical_feature_size(    self, table_name: str, var: Union[str, List[str]],) ‑> int:

Gets the column dimensions.

Arguments

table_name: The name of the table to get the column dimensions from.
var: A column name or a list of column names for which to get the dimensions.

Returns The number of unique value in the categorical column.

def get_categorical_feature_sizes(    self, table_name: str, ignore_cols: Optional[Union[str, List[str]]] = None,) ‑> List[int]:

Returns a list of categorical feature sizes.

Arguments

table_name: The name of the table to get the categorical feature sizes.
ignore_cols: The column(s) to be ignored from the schema.

def get_feature_names(    self, table_name: str, semantic_type: Optional[SemanticType] = None,) ‑> List[str]:

Returns the names of all the features in the schema.

Arguments

table_name: The name of the table to get the features from.
semantic_type: if semantic type is provided, only the feature names corresponding to the semantic type are returned. Defaults to None.

Returns features: A list of feature names.

def get_table_schema(self, table_name: str) ‑> TableSchema:

Gets a table schema from the schema.

Arguments

table_name: The name of the table schema to get.

Returns The table with the given name.

Raises

BitfountSchemaError: If the table is not found.

def unfreeze(self) ‑> None:

Unfreezes the schema, allowing more datasources to be added.

Variables

hash : str - The hash of this schema.
This relates to the BaseSource(s) that were used in the generation of this schema to assure that this schema is used against compatible data sources.
Returns: A sha256 hash of the _datasource_hashes.

table_names : List[str] - Returns a list of table names.

TableSchema

class TableSchema(name: str, description: Optional[str] = None):

A schema that defines the features of a dataframe.

It lists all the (categorical, continuous, image, and text) features found in the dataframe.

Arguments

name: The name of the table.
description: A description of the table.

Attributes

name: The name of the table.
description: A description of the table. Optional.
features: An ordered dictionary of features (column names).

Ancestors

bitfount.data.schema._TableSchemaMarshmallowMixIn

Methods

def add_dataframe_features(    self,    datasource: BaseSource,    ignore_cols: Optional[Sequence[str]] = None,    force_stype: Optional[MutableMapping[_SemanticTypeValue, List[str]]] = None,    descriptions: Optional[Mapping[str, str]] = None,) ‑> None:

Adds datasource features to schema.

Arguments

dataframe: The dataframe whose features this method adds.
ignore_cols: Columns to ignore from the BaseSource. Defaults to None.
force_stype: Columns for which to change the semantic type.
Format: semantictype: [columnnames]. Defaults to None.
Example: {'categorical': ['target_column'],
'continuous': ['age', 'salary']}
descriptions: Descriptions of the features. Defaults to None.

Raises

BitfountSchemaError: if the schema is already frozen

def apply(    self, dataframe: pd.DataFrame, keep_cols: Optional[List[str]] = None,) ‑> pandas.core.frame.DataFrame:

Applies the schema to a dataframe and returns the transformed dataframe.

Arguments

dataframe: The dataframe to transform.
keep_cols: A list of columns to keep even if they are not part of the schema. Defaults to None.

Returns The dataframe with the transformations applied.

def get_categorical_feature_size(self, var: Union[str, List[str]]) ‑> int:

Gets the column dimensions.

Arguments

var: A column name or a list of column names for which to get the dimensions.

Returns The number of unique value in the categorical column.

def get_categorical_feature_sizes(    self, ignore_cols: Optional[Union[str, List[str]]] = None,) ‑> List[int]:

Returns a list of categorical feature sizes.

Arguments

ignore_cols: The column(s) to be ignored from the schema.

def get_feature_names(self, semantic_type: Optional[SemanticType] = None) ‑> List[str]:

Returns the names of all the features in the schema.

Returns features: A list of feature names.

def get_num_categorical(self, ignore_cols: Optional[Union[str, List[str]]] = None) ‑> int:

Get the number of (non-ignored) categorical features.

Arguments

ignore_cols: Columns to ignore when counting categorical features.

def get_num_continuous(self, ignore_cols: Optional[Union[str, List[str]]] = None) ‑> int:

Get the number of (non-ignored) continuous features.

Arguments

ignore_cols: Columns to ignore when counting continuous features.

schema

Classes​

BitfountSchema​

Ancestors​

Methods​

Variables​

TableSchema​

Ancestors​

Methods​

Classes

BitfountSchema

Ancestors

Methods

Variables

TableSchema

Ancestors

Methods