“This DNA is not real”: Why scientists are deepfaking the human genome

Researchers taught an AI to make artificial genomes, possibly opening new doors for genetic research.

February 15, 2021

Researchers have taught an AI to make artificial genomes — possibly overcoming the problem of how to protect people’s genetic information while also amassing enough DNA for research.

Generative adversarial networks (GANs) pit two neural networks against each other to produce new, synthetic data that is so good it can pass for real data. Examples have been popping up all over the web — generating pictures and videos (a la “this city does not exist“). AIs can even generate convincing news articles, food blogs, or human faces (take a look here for a complete list of all the oddities created by GANs).

Now, researchers from Estonia are going more in-depth with deepfakes of human DNA. They created an algorithm that repeatedly generates the genetic code of people that don’t exist.

Deepfaking Human DNA

It may seem simple — randomly mix A, T, C, and G, the letters that make up the genetic code — and voila, a human genetic sequence. But not any random pattern of the letters will work. The AI needs to understand humans at the molecular level. This AI has figured it out.

Like the horse deepfakes, the artificial genomes are a convincing copy of a viable person — a human, the researchers believe, who really could exist but doesn’t.

Most importantly, they could play an important role in genetic research.

“A known limitation in the field (of genetic studies) is the reduced access to many genetic databases due to concerns about violations of individual privacy,” the team writes in their study, published in PLOS Genetics.

The team reports that these “artificial genomes” mimic real genomes so much that they are indistinguishable. But since they aren’t real, researchers can mine the data without worrying about privacy concerns. They can experiment with genomes without actual people giving up their private information.

Protecting the privacy of the people behind genetic information is challenging and often limits how researchers can use that DNA and their willingness to share datasets. But with artificial genomes, researchers don’t have to worry about many of these ethical privacy concerns.

Faking Something You Don’t Fully Understand

The process of using GANs to generate synthetic genomes isn’t akin to making a deepfake of a person’s face. A face is something we are all familiar with and have countless examples with which to train the AI.

But there is so much about DNA and the genome that remains a mystery.

“My initial take is that it is interesting, but I’m not sure I see real practical implications for research right now,” Deanna Church, vice president of the Mammalian Business Area and Software Strategy at the biotech company Inscripta, told Futurism.

“Just because you can’t computationally distinguish these generated genomes from real genomes doesn’t mean they’ve really preserved functional motifs and domains that are important — there is much of this we still don’t understand.”

Even if the artificial genomes resolve the privacy hurdle in genetic research, they raise some possible new concerns.

“In the near term, it’s going to get easier for bad actors to create fake personas that can stand up to even the most rigorous inspection. Not that we envision a scenario where a scam artist needs to provide a fake transcript of their genome, but the unknown unknowns are where security holes tend to grow the fastest,” writes Tristan Greene in The Next Web.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at tips@freethink.com.