maftools : efficient and comprehensive analysis of somatic variants in cancer
With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected. The Cancer Genome Atlas Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. Resulting data consisting of somatic variants are stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.
簡單來說就是癌症基因體計畫所產生的突變資料都存在一種叫MAF格式的檔案裡。maftools就是透過視覺化輔助分析的方法來讓人看得懂MAF的內容。
Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP. 2018. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Resarch PMID: 30341162
原始連結: https://www.bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html
GDC MAF Format: Mutation Annotation Format (MAF) is a tab-delimited text file with aggregated mutation information from VCF Files and are generated on a project-level.
https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/
read.maf function reads MAF files, summarizes it in various ways and stores it as an MAF object.
Acute Myeloid Leukemia (LAML)
# 1.載入maftools套件
library(maftools)
# path to TCGA LAML MAF file
laml.maf = system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools')
# clinical information containing survival information and histology. This is optional
laml.clin = system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools')
# 利用read.maf() 建立 maf object: laml
laml = read.maf(maf = laml.maf, clinicalData = laml.clin)
# 檢視maf object, 直接輸入laml
laml
# 2.Shows sample summary
getSampleSummary(laml)
# 3.Shows gene summary
getGeneSummary(laml)
# 4.Shows clinical data associated with samples
getClinicalData(laml)
# 5.Shows all fields in MAF
getFields(laml)
# 6.Writes maf summary to an output file with basename laml.
write.mafSummary(maf = laml, basename = 'laml')
# 7.plotmafSummary
We can use plotmafSummary to plot the summary of the maf file, which displays number of variants in each sample as a stacked barplot and variant types as a boxplot summarized by Variant_Classification.
plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
# 8,9. Oncoplots
oncoplot(maf = laml, top = 10)
oncoplot(maf = laml, top = 10, fontSize = 10,showTumorSampleBarcodes = TRUE, SampleNamefontSize = 2)
# 10.Transition and Transversions
laml.titv = titv(maf = laml, plot = FALSE, useSyn = TRUE)
plotTiTv(res = laml.titv)
# 11, 12. Lollipop plots for amino acid changes
#lollipop plot for DNMT3A, which is one of the most frequent mutated gene in Leukemia.
lollipopPlot(maf = laml, gene = 'DNMT3A', AACol = 'Protein_Change', showMutationRate = TRUE)
lollipopPlot(maf = laml, gene = 'DNMT3A', AACol = 'Protein_Change', showMutationRate = TRUE, showDomainLabel=FALSE,labelPos=882)
# 13. Cross project comparison
tcgaCompare(maf = laml, cohortName = 'Example-LAML')
# 14. Somatic Interactions
Mutually exclusive or co-occurring set of genes can be detected using somaticInteractions function, which performs pair-wise Fisher’s Exact test to detect such significant pair of genes.
somaticInteractions(maf = laml, top = 25, pvalue = c(0.05, 0.1))
# 15. Oncogenic Signaling Pathways
OncogenicPathways function checks for enrichment of known Oncogenic Signaling Pathways in TCGA cohorts
OncogenicPathways(maf = laml)
-Reading
-Validating
-Silent variants: 475
-Summarizing
-Processing clinical data
-Finished in 0.293s elapsed (2.224s cpu)
An object of class MAF
ID summary Mean Median
1: NCBI_Build 37 NA NA
2: Center genome.wustl.edu NA NA
3: Samples 193 NA NA
4: nGenes 1241 NA NA
5: Frame_Shift_Del 52 0.271 0
6: Frame_Shift_Ins 91 0.474 0
7: In_Frame_Del 10 0.052 0
8: In_Frame_Ins 42 0.219 0
9: Missense_Mutation 1342 6.990 7
10: Nonsense_Mutation 103 0.536 0
11: Splice_Site 92 0.479 0
12: total 1732 9.021 9
HGNC refseq.ID protein.ID aa.length
1: DNMT3A NM_175629 NP_783328 912
2: DNMT3A NM_022552 NP_072046 912
3: DNMT3A NM_153759 NP_715640 723
HGNC refseq.ID protein.ID aa.length
1: DNMT3A NM_175629 NP_783328 912
2: DNMT3A NM_022552 NP_072046 912
3: DNMT3A NM_153759 NP_715640 723
gene1 gene2 pValue oddsRatio 00 11 01 10 Event
1: ASXL1 RUNX1 0.0001541586 55.215541 176 4 12 1 Co_Occurence
2: IDH2 RUNX1 0.0002809928 9.590877 164 7 9 13 Co_Occurence
3: IDH2 ASXL1 0.0004030636 41.077327 172 4 1 16 Co_Occurence
4: FLT3 NPM1 0.0009929836 3.763161 125 17 16 35 Co_Occurence
5: SMC3 DNMT3A 0.0010451985 20.177713 144 6 42 1 Co_Occurence
---
296: PLCE1 ASXL1 1.0000000000 0.000000 184 0 5 4 Mutually_Exclusive
297: RAD21 FAM5C 1.0000000000 0.000000 183 0 5 5 Mutually_Exclusive
298: PLCE1 FAM5C 1.0000000000 0.000000 184 0 5 4 Mutually_Exclusive
299: PLCE1 RAD21 1.0000000000 0.000000 184 0 5 4 Mutually_Exclusive
300: EZH2 PLCE1 1.0000000000 0.000000 186 0 4 3 Mutually_Exclusive
pair event_ratio
1: ASXL1, RUNX1 4/13
2: IDH2, RUNX1 7/22
3: ASXL1, IDH2 4/17
4: FLT3, NPM1 17/51
5: DNMT3A, SMC3 6/43
---
296: ASXL1, PLCE1 0/9
297: FAM5C, RAD21 0/10
298: FAM5C, PLCE1 0/9
299: PLCE1, RAD21 0/9
300: EZH2, PLCE1 0/7
Pathway N n_affected_genes fraction_affected
1: RTK-RAS 85 18 0.21176471
2: Hippo 38 7 0.18421053
3: NOTCH 71 6 0.08450704
4: MYC 13 3 0.23076923
5: WNT 68 3 0.04411765
6: TP53 6 2 0.33333333
7: NRF2 3 1 0.33333333
8: PI3K 29 1 0.03448276
9: Cell_Cycle 15 0 0.00000000
10: TGF-Beta 7 0 0.00000000
# 載入TCGAmutations套件
library(TCGAmutations)
# 1.查看TCGA cancer projects
tcga_available()
# 2.利用Study_Abbreviation載入maf object
coad = tcga_load(study = 'COAD')
# 3.plotmafSummary()
plotmafSummary(coad)
# 4.oncoplots
oncoplot(maf = coad, top = 10, fontSize = 1,showTumorSampleBarcodes = TRUE, SampleNamefontSize = 1)
# 5.getFields
getFields(coad)
# 6.Lollipop plots for amino acid changes
lollipopPlot(maf = coad, gene = 'APC', AACol = 'HGVSp_Short', showMutationRate = TRUE)
[1] "Hugo_Symbol" "Chromosome"
[3] "Start_Position" "End_Position"
[5] "Reference_Allele" "Tumor_Seq_Allele2"
[7] "Variant_Classification" "Variant_Type"
[9] "Tumor_Sample_Barcode" "HGVSp_Short"
[11] "n_depth" "n_ref_count"
[13] "n_alt_count" "t_depth"
[15] "t_ref_count" "t_alt_count"
[17] "Tumor_Sample_Barcode_full" "sample_type_description"
[19] "ExAC_AF" "GMAF"
[21] "Cohort" "IMPACT"