A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data

Minzhe Zhang, Qiwei Li, Yang Xie

Research output: Contribution to journalArticle

Abstract

Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei.shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution. [Figure not available: see fulltext.].

LanguageEnglish (US)
Pages275-286
Number of pages12
JournalQuantitative Biology
Volume6
Issue number3
DOIs
StatePublished - Sep 1 2018
Externally publishedYes

Fingerprint

RNA Sequence Analysis
Bayesian Hierarchical Model
RNA
Immunoprecipitation
Sequencing
Methylation
Spatial Resolution
Zero
Negative Binomial Model
Markov Chains
Spatial Analysis
Markov Chain Monte Carlo Algorithms
Statistical Models
Bayesian inference
Bioinformatics
Hidden Markov models
Computational Biology
C++
Epigenomics
Model

Keywords

  • Bayesian inference
  • hidden Markov model
  • MeRIP-seq data
  • RNA epigenomics
  • zero-inflated negative binomial

ASJC Scopus subject areas

  • Modeling and Simulation
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computer Science Applications
  • Applied Mathematics

Cite this

A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data. / Zhang, Minzhe; Li, Qiwei; Xie, Yang.

In: Quantitative Biology, Vol. 6, No. 3, 01.09.2018, p. 275-286.

Research output: Contribution to journalArticle

@article{f9c2af336bad4d5294a53ae095837d25,
title = "A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data",
abstract = "Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei.shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution. [Figure not available: see fulltext.].",
keywords = "Bayesian inference, hidden Markov model, MeRIP-seq data, RNA epigenomics, zero-inflated negative binomial",
author = "Minzhe Zhang and Qiwei Li and Yang Xie",
year = "2018",
month = "9",
day = "1",
doi = "10.1007/s40484-018-0149-2",
language = "English (US)",
volume = "6",
pages = "275--286",
journal = "Quantitative Biology",
issn = "2095-4689",
publisher = "Higher Education Press",
number = "3",

}

TY - JOUR

T1 - A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data

AU - Zhang, Minzhe

AU - Li, Qiwei

AU - Xie, Yang

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei.shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution. [Figure not available: see fulltext.].

AB - Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei.shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution. [Figure not available: see fulltext.].

KW - Bayesian inference

KW - hidden Markov model

KW - MeRIP-seq data

KW - RNA epigenomics

KW - zero-inflated negative binomial

UR - http://www.scopus.com/inward/record.url?scp=85053228281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053228281&partnerID=8YFLogxK

U2 - 10.1007/s40484-018-0149-2

DO - 10.1007/s40484-018-0149-2

M3 - Article

VL - 6

SP - 275

EP - 286

JO - Quantitative Biology

T2 - Quantitative Biology

JF - Quantitative Biology

SN - 2095-4689

IS - 3

ER -