Written by Stephen Beach, via SWNS
Artificial intelligence is being used to identify potentially deadly coronavirus variants much faster than traditional methods.
Mathematicians at the Universities of Manchester and Oxford have developed an AI framework that can identify and track new forms of the virus that have caused a global pandemic.
And they say the method could be useful for other infectious diseases in the future.
The framework combines dimensionality reduction techniques with a new explainable clustering algorithm called CLASSIX, developed by mathematicians at the University of Manchester.
This allows us to quickly identify groups of viral genomes that may pose future risks from vast amounts of data.
The scientists say their findings, published in the journal PNAS, could support traditional methods of tracking virus evolution.
Dr Roberto Kawanzi, lead author of the study and a researcher at the University of Manchester, said: “Since the emergence of COVID-19, we have seen multiple waves of new variants, increased transmissibility, evasion of the immune response, and increased severity of disease.
“Scientists are currently ramping up efforts to identify worrying new variants, such as alpha, delta and micron, in their early stages of emergence.
“If we can find ways to do this quickly and efficiently, we can be more proactive, such as developing individualized vaccines, and may even be able to eliminate variants before they take hold. .”
He explained that, like many other RNA viruses, COVID-19 evolves very rapidly due to its high mutation rate and short generation time.
This means that it takes a lot of effort to identify new species that may become a problem in the future.
Currently, approximately 16 million sequences are available in the GISAID database (a global initiative to share all influenza data) that provides access to genomic data for influenza viruses.
Mapping the evolution and history of all COVID-19 genomes from data is currently taking vast amounts of computer and human time.
Dr. Cahuantzi says new methods allow automation of such tasks.
The researchers processed 5.7 million high-coverage sequences in just one to two days using standard modern laptops.
Dr. Cahuantzi said it is not possible with existing methods to place identification of pathogenic strains of concern in the hands of more researchers because fewer resources are needed.
Professor Thomas House, from the University of Manchester, said: 'The unprecedented amount of genetic data generated during the pandemic requires improved ways to thoroughly analyze it.
“Data continues to grow rapidly, but if we can't demonstrate the benefits of organizing this data, it's at risk of being deleted or deleted.
“We know that human experts have limited time, so our approach is not to completely replace human work, but rather to collaborate with humans to get the job done faster. We need to be able to do that and free up our expertise to do other important developments.”
The proposed method works by counting the coronavirus's genetic sequence and breaking it down into small “words” (called 3-mers) that are represented as numbers. It then uses machine learning techniques to group similar sequences based on word patterns.
Stefan Güttel from the University of Manchester said: “The clustering algorithm we developed, CLASSIX, is much less computationally intensive than traditional methods and is fully explainable, meaning that it uses textual and visual descriptions of the computed clusters. will be provided.”
Dr. Cahuantzi added: “Our analysis serves as a proof of concept and demonstrates the potential of machine learning methods to be used as a warning tool for early detection of emerging major mutations without relying on phylogenetic requirements.
“While phylogenetics remains the 'gold standard' for understanding viral ancestry, these machine learning methods can process orders of magnitude more sequences and at lower computational cost than current phylogenetic methods. ”