
A Practical Guide to Benchmarking Accuracy in Epitranscriptomics Studies
How can you be sure the epitranscriptomics data you’re producing is accurate? With a flood of RNA modification papers being published, often using new methods that disagree with prior publications, this question is being raised more frequently than ever. Unlike transcriptomics, epitranscriptomics lacks a comprehensive reference that maps all RNA modification sites. However, this should not discourage the question but only set a higher bar for scientific rigor. And while there isn’t one gold standard that rules them all, a surprising number of approaches for data validation exist that can give you confidence in your data interpretation. Let’s dive into the most common controls and validation approaches, their pros and cons, and explore when, where, and how they ought to be used.
Spike-in Controls: A Staple for Assay Validation
Many researchers include spike-in controls, synthetic RNA sequences with known RNA modification content, in their epitranscriptomics workflows. Spike-in controls come in two primary forms: negative controls and positive controls. Negative controls are unmodified RNA sequences, devoid of any RNA modifications, and serve as baselines for false positive detection. Positive controls, on the other hand, contain known RNA modifications and are used to assess the sensitivity and specificity of modification detection.
Negative controls: Unmodified RNA sequences that help detect false positives.
Positive controls: RNAs with known modifications to test assay sensitivity and specificity.
Positive controls can be designed in various ways. Modifications may be incorporated randomly by in vitro transcription (IVT) of DNA fragments using modified nucleotide triphosphates. The primary reason for using this method is ease of implementation. IVT is inexpensive, high yielding and provides RNA that can be thousands of nucleotides long. Alternatively, RNA modifications can be introduced site-specifically—a method that allows precise placement of modifications at defined positions. Controls with site-specific modifications are paramount for the validation of epitranscriptomic methods that locate modification sites with single base resolution. Site-specific controls can be produced by solid-phase RNA synthesis, which is limited to short RNA oligonucleotides under 100 bases in length. While such short controls are useful for researchers who study small RNA species (e.g. tRNA), most researchers prefer matching the length of their controls with the length of their RNA targets. Longer site-specifically modified RNAs can be produced by ligating short, site-specifically modified RNA oligos onto long, unmodified IVT-derived RNA backbones. Unfortunately, this process is cumbersome and low yielding, which is why randomly modified IVT controls are more common. Spike-in controls can be added to virtually any epitranscriptomics assay, from sequencing-based methods to mass spectrometry. Their readout is typically analyzed alongside sample data using bioinformatic pipelines, enabling real-time assessment of assay performance. One major advantage of using spike-in controls is their ability to immediately reveal the success or failure of the assay, helping to troubleshoot and optimize protocols. However, there are limitations. Many RNA modification detection methods suffer from sequence bias, and using only a few synthetic control sequences may not accurately reflect the assay’s performance across the full transcriptome. Thus, while spike-in controls are essential, their design and interpretation must be carefully considered within the broader context of the assay.
Unmodified IVT Transcriptomes: Powerful Tools for Distinguishing Signal from Noise
IVT transcriptomes are powerful tools for distinguishing true biological signals from false positives transcriptome-wide. They can be generated by reverse transcribing RNA from any sample using primers containing a T7 promoter, followed by transcription with T7 RNA polymerase (1, 2). The resulting IVT transcriptomes replicate the sequence composition of the original RNA but lack native modifications. This makes them ideal negative controls for identifying non-specific signals. Because IVT transcriptomes preserve the full sequence context of cellular RNA, they allow researchers to assess assay performance across diverse sequences and filter out sequence-dependent artifacts. A side-by-side comparison of experimental samples and their IVT counterparts reveals which signals are modification-specific and which are assay noise. Because the preparation of IVT transcriptomes is time consuming and requires additional input RNA, IVT transcriptomes remain one of the most comprehensive strategies for accurate signal interpretation.
Knockdown of RNA Modification Writer Enzymes
Using RNA from knockout (KO) or knockdown cell lines is a powerful strategy for generating negative controls for individual types of RNA modifications. To investigate a specific RNA modification, researchers can knock down the writer enzyme responsible for depositing that modification (4). For example, in mammalian cells, the m⁶A writer complex includes METTL3, METTL14, and WTAP, which are all essential for adenosine methylation; ADAR enzymes catalyze the deamination of adenosine to inosine; and a variety of PUS enzymes, including PUS1, PUS3, PUS7, PUS10, and TRUB1, install pseudouridine.
Key RNA Modification Enzymes
- m⁶A writer complex: comprising METTL3, METTL14, and WTAP, is essential for adenosine methylation.
- ADAR enzymes: catalyze the conversion of adenosine to inosine via deamination, leading to A-to-G substitutions in sequencing data.
- PUS enzymes: such as PUS1, PUS3, PUS7, PUS10, and TRUB1, install pseudouridine.
Knockdown can be achieved using RNA interference (RNAi) or CRISPR interference (CRISPRi), but the efficiency of knockdown must be validated, typically via Western blotting to confirm reduced protein levels. Samie Jaffrey’s lab studied the origin of residual METTL3 activity in nominal METTL3 knockout cell lines and discovered that alternative splicing events can produce functional METTL3 isoforms even after CRISPR/Cas9-mediated gene editing, leading to incomplete knockouts. This finding highlights the necessity of validating METTL3 KO models to ensure accurate interpretation of epitranscriptomic data (5). Martinez et al. knocked out individual PUS enzymes (e.g., PUS1, PUS7, and RPUSD4) in human cells and demonstrated how loss of each enzyme altered the pseudouridylation landscape. The study revealed that pseudouridylation can occur prior to splicing and modulate pre-mRNA processing, inferring a role in alternative splicing (6). Upregulation of METTL3 (7) and ADAR (8) is a hallmark of many cancers, giving rise to several therapeutics companies that develop small molecule inhibitors and other strategies to reduce writer production. For example, Gowen et al. describes a large number of Cas9 guide RNAs for therapeutic targeting of ADARs (9), which can be used for the construction of ADAR KO control cell lines. There are countless other examples of writer knockdowns in the literature that can serve as inspiration.
In summary, the key advantage of writer knockdowns is that it selectively removes the signal associated with one specific modification, allowing precise interpretation of assay results. However, creating and validating these modified cell lines can be labor-intensive, and some RNA modification-deficient lines may suffer from impaired growth or viability, complicating long-term experiments. Additionally, this method is limited to cultured cells, restricting its applicability to systems amenable to genetic manipulation.
Sequence Motifs: Mining the Code
Certain RNA modification writer enzymes recognize specific sequence motifs, and these motifs can be leveraged to evaluate the quality of epitranscriptomics assays. While most sequence motifs aren’t a hard measure for accuracy, the coincidence of RNA modification sites and their writer motifs are a strong indicator for accuracy. Note though, that not every writer motif is expected to be modified, depending on cell state, and some motifs are loose and overly abundant. For example, the m6A writer complex preferentially modifies adenosines within the DRACH motif—where D = A/G/U, R = A/G, A = the modified base, C = C, and H = A/C/U (10, 11). The presence of DRACH motifs at m6A sites can thus serve as a strong indicator of true modification events. Similarly, pseudouridine synthases (PUS enzymes) such as PUS1, PUS7, PUS10, and TRUB1 recognize distinct sequence and structural features (12); for instance, PUS7 often targets the UGΨAR motif (where Ψ indicates pseudouridine), and PUS1 can act more broadly but often modifies sites within stem-loop structures. A particularly clear case is inosine, produced by ADAR enzymes through A-to-I editing. Because inosine preferably base pairs with cytosine (C), editing results in A-to-G mutations in sequencing libraries and the associated data, providing a direct, unambiguous signal for accurate detection. A-to-I editing occurs preferentially in double-stranded regions of RNA to mark them as “self” and distinguish them from double-stranded RNA virus genomes (13).
RNA Modification Motifs
- m⁶A writer complex: favors the methylation of adenosines within a DRACH motif (D = A/G/U; R = A/G; A = target; C; H = A/C/U).
- 10+ PUS enzymes: install pseudouridine within motifs of the general form NNΨNN or in structural features like stem loops.
- ADAR enzymes: convert adenosine to inosine in double-stranded RNA regions and near Alu repeats.
Informatically quantifying the presence of known sequence motifs in epitranscriptomic data offers an additional layer of data validation.
Integrating Motif Analysis in Bioinformatics Pipelines: Adding Confidence Through Known Patterns
Incorporating motif analysis into bioinformatics pipelines can enhance confidence in detected modification sites. Tools like AlidaBio’s EpiScout software analysis pipeline includes quality metrics based on DRACH motif enrichment and A-to-G mutation frequency under m6A and inosine peaks, respectively. This type of analysis provides evidence that detected signals align with established biological mechanisms. While it may not capture novel or atypical modification sites, motif-based validation is highly applicable to studies investigating modifications installed by well-characterized enzymes with defined sequence or structural preferences. That said, it’s important to note that this strategy cannot identify novel modification sites or modifications installed by unknown enzymes without known motif preferences. As such, while motif-based filtering improves accuracy, it may limit discovery potential in exploratory studies.
Published Consensus Sites: Leveraging Public Data
When interpreting RNA modification data from experiments lacking appropriate spike-in or genetic controls, it’s still possible to validate at least a subset of detected sites by leveraging public resources and published data. Numerous peer-reviewed studies have produced transcriptome-wide maps of RNA modifications using a variety of biochemical and sequencing-based methods. However, it’s important to approach these resources critically; technical variability is often high, and concordance between different methods and research groups can be as low as 40%, even when analyzing the same cell types. To improve confidence, focus on consensus sites: RNA modification positions that have been identified repeatedly across multiple studies and methods. These sites are often conserved and robustly detected, even across different biological contexts. Still, keep in mind that RNA modification landscapes are dynamic and influenced by cell type and cellular state, so comparisons should be limited to datasets derived from comparable input material (e.g., same tissue or cultured cell line).
A practical strategy is to extract high-confidence sites from curated databases such as:
- RMBase: Integrates multiple modification types across organisms with genomic coordinates and motif annotations.
- m6A-Atlas: Offers experimentally validated and predicted m⁶A sites along with tissue- and disease-specific context.
Closing Thoughts
Ensuring accuracy in epitranscriptomic data isn’t just a quality check, it’s essential for drawing meaningful biological conclusions. As of today, universal RNA modification controls are not commercially available and it will take an effort akin to the “Genome in a Bottle” initiative for human genomes to develop RNA modification standards. Over time, consistently integrating such universal standards in experiments, standardizing analysis workflows and pipelines and cross-validating datasets will significantly advance the field of epitranscriptomics, further our understanding of biology and drive impactful applications in diagnostics and RNA therapies.
About the Author
Gudrun Stengel
Gudrun Stengel is the CEO and scientific co-founder of AlidaBio. Gudrun specializes in developing genomics platforms, drawing on her experience in NGS development and background in biophysics and biochemistry. She has made significant contributions to the field, including sequencing chemistries for the HiSeqX and NovaSeq platforms at Illumina, and the AVITI platform at Element. Gudrun was a postdoctoral researcher at the Scripps Research Institute and the University of Colorado Boulder, studying the molecular mechanisms of DNA replication and transcription. Altogether Gudrun published more than 20 peer-reviewed articles. She received a Masters of Biochemistry and Ph.D. in Biophysical Chemistry from the Max Planck Institute in Germany.
References
1. McCormick, C. A.; Akeson, S.; Tavakoli, S.; Bloch, D.; Klink, I. N.; Jain, M.; Rouhanifard, S. H. Multicellular, IVT‑derived, unmodified human transcriptome for nanopore‑direct RNA analysis. GigaByte 2024, 2024, gigabyte129. https://doi.org/10.46471/gigabyte.129.
2. Tan, L.; Guo, Z.; Shao, Y.; Ye, L.; Wang, M.; Deng, X.; Chen, S.; Li, R. Analysis of bacterial transcriptome and epitranscriptome using nanopore direct RNA sequencing. Nucleic Acids Res. 2024, 52(15), 8746–8762. https://doi.org/10.1093/nar/gkae601.
3. Zhang, Z.; Chen, T.; Chen, H. X.; Xie, Y. Y.; Chen, L.‑Q.; Zhao, Y.‑L.; Liu, B.‑D.; Jin, L.; Zhang, W.; Liu, C.; Ma, D.‑Z.; Chai, G.‑S.; Zhang, Y.; Zhao, W.‑S.; Ng, W.‑H.; Chen, J.; Jia, G.; Yang, J.; Luo, G.‑Z. Systematic calibration of epitranscriptomic maps using a synthetic modification‑free RNA library. Nat. Methods 2021, 18(10), 1213–1222. https://doi.org/10.1038/s41592-021-01280-7.
4. Esteve‑Puig, R.; Bueno‑Costa, A.; Esteller, M. Writers, readers and erasers of RNA modifications in cancer. Cancer Lett. 2020, 474, 127–137. https://doi.org/10.1016/j.canlet.2020.01.021.
5. Poh, H. X.; Mirza, A. H.; Pickering, B. F.; Jaffrey, S. R. Alternative splicing of METTL3 explains apparently METTL3‑independent m6A modifications in mRNA. PLoS Biol. 2022, 20(7), e3001683. https://doi.org/10.1371/journal.pbio.3001683.
6. Martinez, N. M.; Su, A.; Burns, M. C.; Nussbacher, J. K.; Schaening, C.; Sathe, S.; Yeo, G. W.; Gilbert, W. V. Pseudouridine synthases modify human pre‑mRNA co‑transcriptionally and affect pre‑mRNA processing. Mol. Cell 2022, 82(3), 645–659.e9. https://doi.org/10.1016/j.molcel.2021.12.023.
7. Paz‑Yaacov, N.; Bazak, L.; Buchumenski, I.; Porath, H. T.; Danan‑Gotthold, M.; Knisbacher, B. A.; Eisenberg, E.; Levanon, E. Y. Elevated RNA editing activity is a major contributor to transcriptomic diversity in tumors. Cell Rep. 2015, 13(2), 267–276. https://doi.org/10.1016/j.celrep.2015.08.080.
8. Yankova, E.; Blackaby, W.; Albertella, M.; Rak, J.; De Braekeleer, E.; Tsagkogeorga, G.; Pilka, E. S.; Aspris, D.; Leggate, D.; Hendrick, A. G.; Webster, N. A.; Andrews, B.; Fosbeary, R.; Guest, P.; Irigoyen, N.; Eleftheriou, M.; Gozdecka, M.; Dias, J. M. L.; Bannister, A. J.; Vick, B.; Jeremias, I.; Vassiliou, G. S.; Rausch, O.; Tzelepis, K.; Kouzarides, T. Small‑molecule inhibition of METTL3 as a strategy against myeloid leukaemia. Nature 2021, 593(7860), 597–601. https://doi.org/10.1038/s41586-021-03536-w.
9. Gowen, B. G.; Melton, K.; Leong, W. I.; Khekare, P.; McCawley, S.; Chan, J.; Boivin, P.; Jani, V.; Cantor, A. J.; Tambe, A.; Haak‑Frendscho, M.; Janatpour, M. J.; Wei, S. C. Systematic identification and characterization of high‑efficiency Cas9 guide RNAs for therapeutic targeting of ADAR. PLoS One 2025, 20(2), e0317745. https://doi.org/10.1371/journal.pone.0317745.
10. Martinez, N. M.; Su, A.; Burns, M. C.; Nussbacher, J. K.; Schaening, C.; Sathe, S.; Yeo, G. W.; Gilbert, W. V. Pseudouridine synthases modify human pre‑mRNA co‑transcriptionally and affect pre‑mRNA processing. Mol. Cell 2022, 82(3), 645–659.e9. https://doi.org/10.1016/j.molcel.2021.12.023.
11. Paz‑Yaacov, N.; Bazak, L.; Buchumenski, I.; Porath, H. T.; Danan‑Gotthold, M.; Knisbacher, B. A.; Eisenberg, E.; Levanon, E. Y. Elevated RNA editing activity is a major contributor to transcriptomic diversity in tumors. Cell Rep. 2015, 13(2), 267–276. https://doi.org/10.1016/j.celrep.2015.08.080.
12. Rintala‑Dempsey, A. C.; Kothe, U. Eukaryotic stand‑alone pseudouridine synthases—RNA modifying enzymes and emerging regulators of gene expression? RNA Biol. 2017, 14(9), 1185–1196. https://doi.org/10.1080/15476286.2016.1276150.
13. Eisenberg, E.; Levanon, E. Y. A‑to‑I RNA editing—immune protector and transcriptome diversifier. Nat. Rev. Genet. 2018, 19(8), 473–490. https://doi.org/10.1038/s41576-018-0006-1.