ToppGene database function

topppy.topp_gene

get_entrez

get_entrez(genes: list) -> list

Convert genes into Entrez format

Parameters:	`genes` (`list`) – A list of genes

Returns: a vector of genes in Entrez format

Examples:

from topppy import get_entrez
get_entrez(genes)

get_topp_cats

get_topp_cats() -> list

Get a list of ToppFun categories

Returns: a list

Examples:

from topppy import get_topp_cats
get_topp_cats()

topp_fun

topp_fun(markers: DataFrame, topp_categories: list = None, cluster_col: str = 'cluster', gene_col: str = 'gene', p_val_col: str = 'adj_p_val_col', logFC_col: str = 'avg_logFC', num_genes: int = 1000, pval_cutoff: float = 0.5, fc_cutoff: float = 0, fc_filter: str = 'ALL', clusters: list = None, correction: str = 'FDR', key_type: str = 'SYMBOL', min_genes: int = 2, max_genes: int = 1500, max_results: int = 50) -> DataFrame

The topp_fun() function takes a DataFrame and selects genes to use in querying ToppGene.

Parameters:

markers (DataFrame) –

A vector of markers or DataFrame with columns as cluster labels
topp_categories (list, default: None ) –

A string or vector with specific toppfun categories for the query
cluster_col (str, default: 'cluster' ) –

Column name for the groups of cells (e.g. cluster or celltype)
gene_col (str, default: 'gene' ) –

Column name for genes (e.g. gene or feature)
p_val_col (str, default: 'adj_p_val_col' ) –

Column name for the p-value or adjusted p-value (preferred)
logFC_col (str, default: 'avg_logFC' ) –

Column name for the avg log FC column
num_genes (int, default: 1000 ) –

Number of genes per group to use for toppGene query
pval_cutoff (float, default: 0.5 ) –

(adjusted) P-value cutoff for filtering differentially expressed genes
fc_cutoff (float, default: 0 ) –

Avg log fold change cutoff for filtering differentially expressed genes
fc_filter (str, default: 'ALL' ) –

Include "ALL" genes, or only "UPREG" or "DOWNREG" for each cluster
clusters (list, default: None ) –

Which clusters to include in toppGene query
correction (str, default: 'FDR' ) –

P-value correction method ("FDR" is "BH")
key_type (str, default: 'SYMBOL' ) –

Gene name format
min_genes (int, default: 2 ) –

Minimum number of genes to match in a query
max_genes (int, default: 1500 ) –

Maximum number of genes to match in a query
max_results (int, default: 50 ) –

Maximum number of results per cluster

Returns: DataFrame

Examples:

from topppy import *
toppdata=topp_fun(ifnb_de,topp_categories=None,cluster_col='celltype',gene_col='gene',p_val_col='p_val_adj',logFC_col='avg_log2FC')

topp_save

topp_save(topp_data: DataFrame, filename: str = None, save_dir: str = None, split: bool = False, format: str = 'xlsx') -> None

Save topp_data results (optionally) split by celltype/cluster

Parameters:

topp_data (DataFrame) –

Results from topp_fun as a dataframe
filename (str, default: None ) –

File name prefix for each split file. Default: Current working directory
save_dir (str, default: None ) –

The directory to save files. Default: $HOME
split (bool, default: False ) –

Boolean, whether to split the dataframe by celltype/cluster. Default: False
format (str, default: 'xlsx' ) –

Saved file format, one of ["xlsx", "csv", "tsv"]. Default: "xlsx"

Returns: None

Examples:

from topppy import topp_save, topp_data
topp_save(topp_data, filename="toppFun_results", split = True, format = "xlsx")