Skip to main content


Module containing IntermineSource class.

IntermineSource class handles loading data stored in Intermine templates.

Intermine is an open source biological data warehouse developed by the University of Cambridge . The IntermineSource launches a pod that can access all templates defined under a specified service. Please see Intermine's tutorials for a detailed overview of there python API: .



class IntermineSource(    *args: Any,    data_splitter: Optional[DatasetSplitter] = None,    seed: Optional[int] = None,    modifiers: Optional[Dict[str, DataPathModifiers]] = None,    ignore_cols: Optional[Union[str, Sequence[str]]] = None,    **kwargs: Any,):

Data Source for loading data from Intermine templates.

Intermine is an open source biological data warehouse developed by the University of Cambridge . The IntermineSource launches a pod that can access all templates defined under a specified service. Please see Intermine's tutorials for a detailed overview of their python API:


You must pip install intermine to use this data source.


def get_data(    self, table_name: Optional[str] = None, **kwargs: Any,)> Optional[pandas.core.frame.DataFrame]:

Loads and returns data from Intermine template.


  • table_name: Table name for multi table data sources. This comes from the DataStructure.

Returns A DataFrame-type object which contains the data.

def get_values(    self, col_names: List[str], table_name: Optional[str] = None, **kwargs: Any,)> Dict[str, Iterable[Any]]:

Get distinct values from list of columns.


  • col_names: The list of the columns whose distinct values should be returned.
  • table_name: The name of the table to which the column exists. Required for multi-table databases.

Returns The distinct values of the requested column as a mapping from col name to a series of distinct values.


  • multi_table : bool - Attribute to specify whether the datasource is multi table.
  • table_names : List[str] - The names of the tables accessible from this data source.