Chapter 7 Differential OCR analysis
Differential peak analysis can help us identify differences in gene regulation under different conditions and reveal important regulatory mechanisms in cellular states and biological processes. This section of the analysis is an important step in the search for differential TFBSs and is the basis for performing a functional difference analysis.
7.1 Using DEseq2 to identify differentail OCR (Two groups)
DESeq2 is an R package for differential expression analysis, but can also be used for differential peak analysis with ATAC-seq. Here we use DEseq2 as an example to identify the difference peaks between the two groups.
data(Diff_Analysis_Example)
<- Diff_Analysis_Example
quant_data <- getDiffPeak(norm_data = quant_data, condition = c("Control","Exper"), control = "Control", experment = "Exper",rep_N= 2) diff_res
## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates: 22 workers
## mean-dispersion relationship
## final dispersion estimates, fitting model and testing: 22 workers
head(diff_res)
## baseMean log2FoldChange lfcSE stat pvalue
## chr1:10016-10607 54.030792 0.5635555 0.5826459 0.9672349 0.3334266
## chr1:237589-237928 11.903993 0.3358489 0.7164559 0.4687643 0.6392381
## chr1:521379-521712 6.946804 0.1798363 0.8110918 0.2217212 0.8245309
## chr1:540496-541113 8.700218 -0.1107456 0.7780795 -0.1423320 0.8868178
## chr1:559303-559603 8.693440 -0.2740060 0.7751323 -0.3534957 0.7237168
## chr1:564421-565042 24.555285 0.2952452 0.6325501 0.4667538 0.6406760
## padj sig
## chr1:10016-10607 0.7779954 None
## chr1:237589-237928 0.9452912 None
## chr1:521379-521712 0.9800009 None
## chr1:540496-541113 0.9845317 None
## chr1:559303-559603 0.9657660 None
## chr1:564421-565042 0.9456528 None
We can use the commonly used volcano diagram to show these differential peaks.
plotVolcano(diff_data = diff_res)
7.2 Get the differential peaks target genes
Having obtained these differential peaks, we were more interested in knowing which genes’ expression might be affected by these differential peaks. In general, we annotate these differential peaks to the nearest genes. We can run the getDiffTargetGenes function to complete this analysis.
<- getDiffTargetGenes(diff_data=diff_res)
target head(target)
## chrom start end baseMean log2FoldChange lfcSE stat
## 1 chr1 100009953 100010254 20.142897 -0.4939808 0.6575732 -0.7512181
## 2 chr1 100012048 100012826 23.103692 -0.3060881 0.6464743 -0.4734730
## 3 chr1 100017890 100018149 8.223809 -0.8358235 0.8052071 -1.0380231
## 4 chr1 100023076 100023997 22.811175 0.2890909 0.6505642 0.4443696
## 5 chr1 10002486 10002743 8.659492 1.0989500 0.7945523 1.3831059
## 6 chr1 10002897 10003700 55.147809 -0.1827790 0.5806837 -0.3147652
## pvalue padj sig peak_center TSS gene strand distance
## 1 0.4525214 0.8669849 None 100010103 99929934 1 + -80169
## 2 0.6358758 0.9436658 None 100012437 99929934 1 + -82503
## 3 0.2992593 0.7424509 None 100018019 99929934 1 + -88085
## 4 0.6567754 0.9507410 None 100023536 100111499 1 + 87963
## 5 0.1666324 0.5544505 None 10002614 10003465 1 - -851
## 6 0.7529400 0.9699603 None 10003298 10003465 1 - -167
7.3 PlotMA
In addition, we can also use the plotMA function to display information about both the difference peaks and their summit to the nearest TSS distance.
plotMA(target)
## Warning: Removed 21 rows containing missing values (`geom_point()`).
7.4 Compare the three samples chromatin accessibility (Three samples)
For the analysis of differences among the three groups, a common method is a two-by-two comparison. Here, we are inspired by R. H. Ramírez-González et al. For three groups of ATAC-seq data, we can use this triads approach to get the difference peaks at the same time and visualized with a ternary plot. We take the previous quantized matrix and extract the Bulk_B, Mem_B, Naive_B to compare the differences between the three groups.
data("All_quant_data")
<- All_quant_data[,1:3]
quant_data getTriads(group_name=c("Bulk_B","Mem_B","Naive_B"), quant_df = quant_data, return_matrix=F)
## Registered S3 methods overwritten by 'ggtern':
## method from
## grid.draw.ggplot ggplot2
## plot.ggplot ggplot2
## print.ggplot ggplot2
We can categorize them into seven categories and use a boxplot to show if their categorization is accurate.
<- getTriads(group_name=c("Bulk_B","Mem_B","Naive_B"), quant_df = quant_data, return_matrix=T)
triads_res plotTriads(triads_data = triads_res)
## Using group as id variables
Finally, we can similarly use the getTraidsTargetGenes function to get the target genes for each category of peaks.
<- getTriadsTargetGenes(triads_data = triads_res)
triads_targets head(triads_targets)
## chrom Peak_center Bulk_B Mem_B Naive_B group Peak_start
## 1 chr1 10002672 10.9749479 8.274032 10.8202002 Balanced 10002429
## 2 chr1 10003391 49.4307820 41.862457 46.2388672 Balanced 10002983
## 3 chr1 10010875 25.6538466 23.160242 25.3581749 Balanced 10010299
## 4 chr1 100151436 2.8350983 3.277752 1.8329218 Balanced 100151228
## 5 chr1 100165806 0.6880433 0.615438 0.4909362 Balanced 100165637
## 6 chr1 100194636 0.8866477 1.970254 1.2235983 Balanced 100194551
## Peak_end TSS gene strand distance
## 1 10002916 10003465 ENSG00000162441 - -793
## 2 10003799 10003465 ENSG00000162441 - -74
## 3 10011452 10003465 ENSG00000162441 - 7410
## 4 100151645 100163798 ENSG00000223656 + 12362
## 5 100165976 100163798 ENSG00000223656 + -2008
## 6 100194721 100163798 ENSG00000223656 + -30838