The open source software DeepVariant is now the best way to scan a genetic code for mutations.
The AI created by Google researchers Mark DePristo and Ryan Poplin was built to recognise images of cats and dogs. Now, a year after the project started, the software based on this same neural network (now called DeepVariant) is better at recognising gene mutations than any other program out there.
Humans have worked for years on programs meant to identify mutations in genes. Their most accomplished program is GATK, an algorithm that took ten human scientists five years to build. “It wasn’t even clear it was possible to do better,” DePristo told The Atlantic. “We built tons of different models. Nothing really moved the needle at all.” But they hadn’t tried using artificial intelligence. DeepVariant is now outperforming GATK, only a year after its inception. DeepVariant has been published by Google as an open source software, so other scientists around the world can use it and alter it.
As The Atlantic explains, both DeepVariant and GATK solve a technical problem in gene sequencing that arises from the way gene sequencers analyse DNA in broken up strands, each about 100 letters long. These snippets are compared to a reference genome, and differences suggest a possible mutation. But snippets in the sequence also overlap with each other, so, theoretically, their overlapping bits should match each other as well as the reference genome. When two overlapping pairs don’t match up and differ from the reference genome to boot, scientists know there was an error reading the DNA. If the pairs match, but the genome doesn’t, you’ve just spotted a mutation.
Unlike the human-led program GATK, which used a lot of data in its attempt to figure out where the sequencing may have gone wrong, DeepVariant uses a totally different method to try to solve these glitches: It turns the data into an image. Since Google’s AI was originally used for image recognition, this technique ended up working really well.
Per The Atlantic:
“The letters—A, T, C, or G—got assigned a red value; the quality of the sequencing at that location a green value; and which strand of DNA’s two strands it is on a blue value. Together, they formed an RGB (red, green, blue) image.”
Though DeepVariant is technically better at identifying mistakes in coding than GATK, it has its limitations. The program functions at about half the speed of GATK. But it could herald new applications for neural networks.
“The test will really be how it can translate to other technologies,” Manuel Rivas, a geneticist at Stanford, told The Atlantic. Programs like DeepVariant could use their complex data analysis abilities to predict the effects of a mutation, predicting which genes might activate. The potential for the technology is unlimited, though we still have a way to go to catch up to the complexity of genes themselves.