Skip to main content

database_source

Module containing DatabaseSource class.

DatabaseSource class handles loading data stored in a SQL database.

Classes

DatabaseSource

class DatabaseSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Data source for loading data from databases.

Methods


def get_data(    self,    sql_query: Optional[str] = None,    table_name: Optional[str] = None,    **kwargs: Any,)> Optional[pandas.core.frame.DataFrame]:

Loads and returns data from Database dataset.

Arguments

  • sql_query: A SQL query string required for multi table data sources. This comes from the DataStructure and takes precedence over the table_name.
  • table_name: Table name for multi table data sources. This comes from the DataStructure and is ignored if sql_query has been provided.

Returns A DataFrame-type object which contains the data.

def get_values(    self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,)> Dict[str, Iterable[Any]]:

Get distinct values from columns in Database dataset.

Arguments

  • col_names: The list of the columns whose distinct values should be returned.
  • table_name: The name of the table to which the column exists. Required for multi-table databases.

Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.

Variables

  • con : sqlalchemy.engine.base.Engine - Sqlalchemy engine.

    Connection options are set to stream results using a server side cursor where possible (depends on the database backend's support for this feature) with a maximum client side row buffer of self.max_row_buffer rows.

  • multi_table : bool - Attribute to specify whether the datasource is multi table.
  • query : Optional[str] - A Database query as a string.

    The query is resolved in the following order:

    1. The query specified in the database connection.
    2. The table name specified in the database connection if just 1 table.
    3. The query specified by the datastructure (if multi-table).
    4. The table name specified by the datastructure (if multi-table).
    5. None.
  • table_names : List[str] - Database table names.