ToppGene database function

topppy.topp_gene

get_entrez

get_entrez(genes: list) -> list

Convert genes into Entrez format

Parameters:
  • genes (list) –

    A list of genes

Returns: a vector of genes in Entrez format

Examples:

from topppy import get_entrez
get_entrez(genes)

get_topp_cats

get_topp_cats() -> list

Get a list of ToppFun categories

Returns: a list

Examples:

from topppy import get_topp_cats
get_topp_cats()

topp_fun

topp_fun(markers: DataFrame, topp_categories: list = None, cluster_col: str = 'cluster', gene_col: str = 'gene', p_val_col: str = 'adj_p_val_col', logFC_col: str = 'avg_logFC', num_genes: int = 1000, pval_cutoff: float = 0.5, fc_cutoff: float = 0, fc_filter: str = 'ALL', clusters: list = None, correction: str = 'FDR', key_type: str = 'SYMBOL', min_genes: int = 2, max_genes: int = 1500, max_results: int = 50) -> DataFrame

The topp_fun() function takes a DataFrame and selects genes to use in querying ToppGene.

Parameters:
  • markers (DataFrame) –

    A vector of markers or DataFrame with columns as cluster labels

  • topp_categories (list, default: None ) –

    A string or vector with specific toppfun categories for the query

  • cluster_col (str, default: 'cluster' ) –

    Column name for the groups of cells (e.g. cluster or celltype)

  • gene_col (str, default: 'gene' ) –

    Column name for genes (e.g. gene or feature)

  • p_val_col (str, default: 'adj_p_val_col' ) –

    Column name for the p-value or adjusted p-value (preferred)

  • logFC_col (str, default: 'avg_logFC' ) –

    Column name for the avg log FC column

  • num_genes (int, default: 1000 ) –

    Number of genes per group to use for toppGene query

  • pval_cutoff (float, default: 0.5 ) –

    (adjusted) P-value cutoff for filtering differentially expressed genes

  • fc_cutoff (float, default: 0 ) –

    Avg log fold change cutoff for filtering differentially expressed genes

  • fc_filter (str, default: 'ALL' ) –

    Include "ALL" genes, or only "UPREG" or "DOWNREG" for each cluster

  • clusters (list, default: None ) –

    Which clusters to include in toppGene query

  • correction (str, default: 'FDR' ) –

    P-value correction method ("FDR" is "BH")

  • key_type (str, default: 'SYMBOL' ) –

    Gene name format

  • min_genes (int, default: 2 ) –

    Minimum number of genes to match in a query

  • max_genes (int, default: 1500 ) –

    Maximum number of genes to match in a query

  • max_results (int, default: 50 ) –

    Maximum number of results per cluster

Returns: DataFrame

Examples:

from topppy import *
toppdata=topp_fun(ifnb_de,topp_categories=None,cluster_col='celltype',gene_col='gene',p_val_col='p_val_adj',logFC_col='avg_log2FC')

topp_save

topp_save(topp_data: DataFrame, filename: str = None, save_dir: str = None, split: bool = False, format: str = 'xlsx') -> None

Save topp_data results (optionally) split by celltype/cluster

Parameters:
  • topp_data (DataFrame) –

    Results from topp_fun as a dataframe

  • filename (str, default: None ) –

    File name prefix for each split file. Default: Current working directory

  • save_dir (str, default: None ) –

    The directory to save files. Default: $HOME

  • split (bool, default: False ) –

    Boolean, whether to split the dataframe by celltype/cluster. Default: False

  • format (str, default: 'xlsx' ) –

    Saved file format, one of ["xlsx", "csv", "tsv"]. Default: "xlsx"

Returns: None

Examples:

from topppy import topp_save, topp_data
topp_save(topp_data, filename="toppFun_results", split = True, format = "xlsx")