Skip to main content

intermine_source

Module containing IntermineSource class.

IntermineSource class handles loading data stored in Intermine templates.

Intermine is an open source biological data warehouse developed by the University of Cambridge http://intermine.org/ . The IntermineSource launches a pod that can access all templates defined under a specified service. Please see Intermine's tutorials for a detailed overview of there python API: https://github.com/intermine/intermine-ws-python-docs .

Classes

IntermineSource

class IntermineSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Data Source for loading data from Intermine templates.

Intermine is an open source biological data warehouse developed by the University of Cambridge http://intermine.org/ . The IntermineSource launches a pod that can access all templates defined under a specified service. Please see Intermine's tutorials for a detailed overview of their python API: https://github.com/intermine/intermine-ws-python-docs.

info

You must pip install intermine to use this data source.

Methods


def get_data(    self, table_name: Optional[str] = None, **kwargs: Any,)> Optional[pandas.core.frame.DataFrame]:

Loads and returns data from Intermine template.

Arguments

  • table_name: Table name for multi table data sources. This comes from the DataStructure.

Returns A DataFrame-type object which contains the data.

def get_values(    self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,)> Dict[str, Iterable[Any]]:

Get distinct values from list of columns.

Arguments

  • col_names: The list of the columns whose distinct values should be returned.
  • table_name: The name of the table to which the column exists. Required for multi-table databases.

Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.

Variables

  • multi_table : bool - Attribute to specify whether the datasource is multi table.
  • table_names : List[str] - The names of the tables accessible from this data source.