fnGO

Functions for Gene Ontology (GO) overrepresentation analysis using the communicated set of proteins.

CoRe.fnGO.MinMaxGOsets(GO_embedding, GO_container, GO_BP_names)[source]

Identifies the minimum gene sets that do not contain any other gene sets and the maximum gene sets that are not contained in any other gene sets.

Parameters:
  • GO_embedding (dict) – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that contains the key gene set.

  • GO_container (dict) – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that are contained in the key gene set.

  • GO_BP_names (list) – Names of gene ontology gene sets.

CoRe.fnGO.compute_p_values(sources, GO_BPs_set, interaction_set, total_genes, minimum_GOBP=False, size_threshold=inf, full=False)[source]

Benjamini-Hochberg p-value correction for multiple hypothesis testing.

Parameters:
  • sources (list) – Names of factors that are causing the information transfer in the network.

  • minimum_GOBP (list) – Names of gene sets for biological processes at the lowest level, i.e. these sets do not contain other gene sets.

  • GO_BPs (array_like) – Gene sets for Gene Ontology Biological Processes.

  • interaction_set (dict) – Set of genes receiving information from the sources.

  • total_genes (int) – Total number of unique genes across all gene sets.

Returns:

  • go_names (dict) – Gene Ontology Biological Processes that are over-represented by the sources.

  • p_values (dict) – Fisher’s exact test p-value for Gene Ontology over-represenation analysis.

CoRe.fnGO.compute_p_values_old(sources, GO_BPs, interaction_set, total_genes, minimum_GOBP=False, size_threshold=inf, full=False)[source]

Benjamini-Hochberg p-value correction for multiple hypothesis testing.

Parameters:
  • sources (list) – Names of factors that are causing the information transfer in the network.

  • minimum_GOBP (list) – Names of gene sets for biological processes at the lowest level, i.e. these sets do not contain other gene sets.

  • GO_BPs (array_like) – Gene sets for Gene Ontology Biological Processes.

  • interaction_set (dict) – Set of genes receiving information from the sources.

  • total_genes (int) – Total number of unique genes across all gene sets.

Returns:

  • go_names (dict) – Gene Ontology Biological Processes that are over-represented by the sources.

  • p_values (dict) – Fisher’s exact test p-value for Gene Ontology over-represenation analysis.

CoRe.fnGO.compute_q_values(p_values, go_names, go_tags, alpha=0.01, return_all=False)[source]

Benjamini-Hochberg p-value correction for multiple hypothesis testing.

Parameters:
  • p_values (array_like) – A list or array of p-values, from Fisher’s exact text, for multiple hypothesis.

  • go_names (list) – A list of gene ontology (GO) gene set names associated with each p-value.

  • go_tags (list) – A list of GO gene set tagnames associated with each p-value.

  • alpha (float) – Threshold for identifying significant gene sets.

Returns:

  • q_values (array_like) – Sorted positive False Discovery Rate corrected for multiple hypothesis testing.

  • p_values_go (list) – Names of the associated gene ontology biological processes.

  • p_values_go_tags (list) – Names of the associated gene ontology biological processes.

CoRe.fnGO.findGOcontainer(GO_embedding, outputfile)[source]

Identifies that gene sets that contains other gene sets. The gene sets that do not contain other gene sets are returned as dictionary keys with an empty list as the entry.

Parameters:
  • GO_sets (dict) – Dictionary with gene set index as keys and the list of gene set that contains it as entries.

  • outputfile (string) – Name of the file to store the output.

Returns:

GO_container – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that are contained in the key gene set.

Return type:

dict

CoRe.fnGO.findGOembedding(GO_sets, outputfile)[source]

Identifies that gene sets that are embedded within other gene sets. The gene sets that are not embedded in any other gene set are returned as dictionary keys with an empty list as the entry.

Parameters:
  • GO_sets (dict) – Dictionary with gene set name as keys and the list of associated genes as entries.

  • outputfile (string) – Name of the file to store the output.

Returns:

GO_embedding – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that contains the key gene set.

Return type:

dict

CoRe.fnGO.p_adjust_bh(p)[source]

Benjamini-Hochberg p-value correction for multiple hypothesis testing.

Parameters:

p (array_like) – A list or array of p-values, from Fisher’s exact text, for multiple hypothesis.

Returns:

q – Adjusted positive False Discovery Rate (pFDR).

Return type:

array_like

CoRe.fnGO.readGOBPs(GO_directory)[source]

Reads the gene sets associated with Gene Ontology Biological Processes.

Parameters:

GO_directory – Directory containing the GO data files from Moleculary Signatures Database. This directory contains a set of .csv files with GOBP names as filenames containing the list of associated genes.

Returns:

GO_BPs – Dict with GOBP names as keys and the list of associated genes as the dictionary entry.

Return type:

dict

CoRe.fnGO.readGOsets(GO_file, GO_category)[source]

Reads the gene ontology data set as a python dictionary.

Parameters:
  • GO_file (string) – Name of the file containing Gene Ontology gene sets.

  • GO_category (string) – Name of the gene ontology category, one among the three, ‘GOBP’, ‘GOCC’, or ‘GOMF’.

Returns:

  • GO_BPs (dict) – Dict with gene ontology gene set as keys and the list of associated genes as dictionary entries.

  • total_unique_genes (list) – Names of unique genes in the total gene ontology data set.

CoRe.fnGO.read_embedding(filename)[source]
CoRe.fnGO.total_genes(GOBPs)[source]

Determines the list of unique genes across all the GO gene sets.

Parameters:

GO_BPs (dict) – Dict with GOBP names as keys and the list of associated genes as the dictionary entry.

Returns:

unique_genes – Name of genes present in the gene set database.

Return type:

list