database_source
Module containing DatabaseSource class.
DatabaseSource class handles loading data stored in a SQL database.
Classes
DatabaseSource
class DatabaseSource( *args: Any, data_splitter: Optional[DatasetSplitter] = None, seed: Optional[int] = None, modifiers: Optional[Dict[str, DataPathModifiers]] = None, ignore_cols: Optional[Union[str, Sequence[str]]] = None, **kwargs: Any,):
Data source for loading data from databases.
Ancestors
Methods
def get_data( self, sql_query: Optional[str] = None, table_name: Optional[str] = None, **kwargs: Any,) ‑> Optional[pandas.core.frame.DataFrame]:
Loads and returns data from Database dataset.
Arguments
sql_query
: A SQL query string required for multi table data sources. This comes from the DataStructure and takes precedence over the table_name.table_name
: Table name for multi table data sources. This comes from the DataStructure and is ignored if sql_query has been provided.
Returns A DataFrame-type object which contains the data.
def get_values( self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,) ‑> Dict[str, Iterable[Any]]:
Get distinct values from columns in Database dataset.
Arguments
col_names
: The list of the columns whose distinct values should be returned.table_name
: The name of the table to which the column exists. Required for multi-table databases.
Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.
Variables
con : sqlalchemy.engine.base.Engine
- Sqlalchemy engine.Connection options are set to stream results using a server side cursor where possible (depends on the database backend's support for this feature) with a maximum client side row buffer of
self.max_row_buffer
rows.
multi_table : bool
- Attribute to specify whether the datasource is multi table.
query : Optional[str]
- A Database query as a string.The query is resolved in the following order:
- The query specified in the database connection.
- The table name specified in the database connection if just 1 table.
- The query specified by the datastructure (if multi-table).
- The table name specified by the datastructure (if multi-table).
- None.
table_names : List[str]
- Database table names.