Last week, Mangul Lab at USC shared a new preprint, which presents our group’s review of the role algorithms have on speed and efficiency of read aligners.
Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines that ultimately help answer important biological questions, such as detecting mutations driving various human diseases. Read alignment is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of modern sequencing platforms.
Today’s diverse array of techniques for aligning short and long reads address these challenges with a variety of algorithmic foundations and methodologies. To meet the growing need for evaluation of read alignment speed and efficiency, Serghei and Jeremy Rotman (Software engineer, Mangul Lab) collaborated with Mohammed Alser (Senior researcher, ETH
Zürich), Kodi Taraszka (PhD student, UCLA), Huwenbo Shi (postdoctoral scholar, Harvard T.H. Chan School of Public Health), Pelin Icer Baykal (PhD student, Georgia State), Harry Taegyun Yang (PhD student, UCLA), Victor Xue (undergraduate student, UCLA), Sergey Knyazev (PhD student, Georgia State), Benjamin D. Singer (faculty, Northwestern University), Brunilda Balliu (Research fellow, UCLA), David Koslicki (faculty, Penn State), Pavel Skums (faculty, Georgia State), Alex Zelikovsky (faculty, Georgia State), Can Alkan (faculty, Bilkent University), and Onur Mutlu (faculty, ETH Zürich) to systematically evaluate 104 read aligners.
We demonstrate the effect each underlying algorithm can have on the speed and efficiency of read aligners. Longer read lengths offer some unique advantages and some limitations to read alignment techniques—our review quantifies the impact that read length may have on an alignment study and offers recommendations for matching read alignment tools with types of read lengths and analytical aims. We also discuss specific ways that general alignment algorithms have been tailored to the needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies.
We are witnessing an exciting time in bioinformatics during which rapid advances in sequencing technologies shape the landscape of modern read alignment algorithms. With modern read aligners, a researcher can often maintain a good balance between speed and memory usage—while preserving small and large genetic variations. While there is still room for some improvements in today’s read alignment technology, we believe the future is bright for read alignment algorithms.