View on GitHub

Pathway-Network-Analysis

This is the pathway and network analysis workflow we use in the lab.

Zhanglab@Columbia by Hanrui Zhang [2019-07-01], updated on [2021-07-02].

The material is modified from the CBW workshop on pathway and network analysis 2021.

Summary of the Workflow

The main purpose of pathway and network analysis is to understand what a list of genes is telling us, i.e. gain mechanistic insights and interpret lists of interesting genes from experiments (usually omics and functional genomic experiments).

1. Laptop set-up instruction

Follow the link to download and install the latest version of GSEA (Gene Set Enrichment Analysis) and Cytoscape.

2. Reading materials and references

To further understand the sources of pathway and network data, statistical approaches, and results interpretation.

The protocol uses publicly available software packages (GSEA v.3.0 or higher, g:Profiler, Enrichment Map v.3.0 or higher, Cytoscape v.3.6.0 or higher) and custom R scripts that apply publicly available R packages (edgeR, Roast, Limma, Camera). Custom scripts are available in the Supplementary Protocols and at GitHub web sites https://github.com/BaderLab/Cytoscape_workflows/tree/master/EnrichmentMapPipeline and https://baderlab.github.io/Cytoscape_workflows/EnrichmentMapPipeline/index.html.

This video is also informative https://www.youtube.com/watch?v=KY6SS4vRchY.

3. Over-representation analysis and enrichment analysis

3.1 g:Profiler for over-representation analysis: Using two lists of genes as the inputs,

e.g. differentially expressed (DE) genes and all the expressed genes; top screen hits and background genes; GWAS candidate genes and all the annotated genes. This workflow uses DE genes as examples.

Answers the question: Are any pathways (gene sets) surprisingly enriched in my gene list?
Statistical test: Fisher’s Exact Test (aka hypergeometric test).

Fishers_Exact_Test

Answers the question: Are any pathways (gene sets) ranked surprisingly high or low in a ranked list of genes? e.g. Individual genes in a pathway may be only up- or down-regulated by a small amount but the addition of all these subtle changes may have a great impact on the pathway.
Statistical test: GSEA (modified KS test), Wilcoxon rank sum test etc.

# To prepare .rnk file
## Read the DESeq2 output
DE <- read.csv("../output/DESeq2.csv"), header = TRUE, sep = ",")

## Filter to remove all the NAs 
DEnoNA <- DE %>% filter(!is.na(SYMBOL) & !is.na(pvalue) & !is.na(padj))

## Filter to remove duplicated SYMBOLS
duplicate <- DEnoNA[which(duplicated(DEnoNA$SYMBOL)),]
duplicate_SYMBOL <- duplicate$SYMBOL
DEfinal <- DEnoNA[!grepl(paste(duplicate_SYMBOL, collapse = "|"), DEnoNA$SYMBOL),]

## Prepare the rnk file and save to the output folder
### Add a rank column using the following calculation, make sure to use p value, not padj.  
DEfinal$rank = -log10(DEfinal$pvalue) * sign(DEfinal$log2FoldChange)
### order by rank and subset SYMBOL and rank column
rnk = DEfinal[order(DEfinal$rank, decreasing = TRUE), 8:9]

### Write to a .rnk file
write.table(rnk, file="../output/DESeq2.rnk"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)


4. Network Visualization and Analysis with Cytoscape - Enrichment Map

The objective is to transform enrichment results from g:profiler, gsea or other enrichment algorithms to a network, and summarize enrichment results with annotation using the Autoannotate App.

Network Visualization and Analysis with Cytoscape: Enrichment Map from g:Profiler results.

https://baderlab.github.io/CBW_Pathways_2020/gprofiler-mod3.html

Network Visualization and Analysis with Cytoscape: create an enrichment map from GSEA results.

https://baderlab.github.io/CBW_Pathways_2020/gsea-mod3.html

Notes:

5. Network Analysis by ReactomeFI

Investigate and visualize functional interaction among genes in hit pathways.

6. Predict gene function: GeneMANIA - predict the function of a gene or gene set.

One may refer to the GeneMANIA help page to find information about, e.g. network categories, search tips, etc.

7. Discover the Regulons: iRegulon - sequence based discovery of the TF, the targets and the motifs/tracks from a set of genes.

8. Additional information

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.