RAMBO speeds searches on huge DNA databases
HOUSTON – (June 28, 2021) – Rice University computer scientists send a RAMBO to rescue genomic researchers who sometimes wait days or weeks to find search results on large amounts of DNA.

DNA sequencing is very popular, genomic data sets are doubling in size every two years, and data search tools did not continue. Researchers comparing DNA to genomes or studying the evolution of COVID-19 genes often wait weeks for software to identify large, “metagenomic” data, which grows every month and is now measured in petabytes.

RAMBO, short for “duplicate and integrated flower filter,” is a new method that can reduce the time to identify such information from weeks to hours and search times from hours to seconds. Rice University computer scientists unveiled the RAMBO last week at the Association for Computing Machinery data science conference SIGMOD 2021.

“Asking for billions of DNA sequences in a large database with traditional methods can take hours in a large computer database and can take a few weeks on a single server,” said one of RAMBO’s founders, Todd Treangen, a Rice computer scientist with his laboratory focused on metagenomics. “Reducing the identification times of the database, in addition to question times, is very important as the size of the genomic data base continues to grow at an incredible pace.”

To solve the problem, Treangen teamed up with Rice computer scientist Anshumali Shrivastava, who specializes in making algorithms that make big data and machine learning faster and faster, and graduate students Gaurav Gupta and Minghao Yan, co-authors of a peer-reviewed conference in a peer-reviewed RAMBO paper.

RAMBO uses data-based questionnaire times much faster than state-of-the-art genome indexing methods and other advantages such as simplicity, false false scale and low false rating.

“The RAMBO search time is 35 times faster than existing methods,” said Gupta, who is studying for a doctorate in electrical and computer engineering. In a trial using the 170-terabyte dataset of microbial genomes, Gupta said RAMBO had reduced targeting times from “six weeks in a high-quality, dedicated nine-hour collection of shared goods.”

Yan, a master’s student in computer science, said, “In this vast archive, RAMBO can research genetic sequences for a few milliseconds, or a few sub-milliseconds using a standard 100-server server.”

RAMBO is improving on the performance of Bloom filters, a hundred-year-old search process used in genomic sequence search in many previous studies. RAMBO improves Bloom’s previous genomic search filters using a data structure that may be known as a minute-drawing diagram that “leads to better question time and memory sales” than previous methods, and “hits the current foundations by achieving more robust, low-memory and ultrafast documents. indexing data, “the authors wrote in the study.

Gupta and Jan claim that RAMBO has the power to democratize search by making any lab quicker and less expensive to search large genomic depots with shelf computers.

“RAMBO could reduce the waiting time for tons of investigations into bioinformatics, such as the demand for the presence of SARS-CoV-2 in contaminated water metagenomes worldwide,” Yan said. “RAMBO can be helpful in studying the genomics of cancer and the emergence of the bacterial genome, for example.”

Shrivastava is a computer science professor and Treangen is an assistant professor of computer science.

Co-authors of the study include Benjamin Coleman, Bryce Kille, Leo Elworth and Tharun Medini.

The research is funded by the National Science Foundation, the Office of the Scientific Research Institute and the Bureau of Water Research.

