Science

New public database of AI-predicted protein structures could transform biology

Computers can now rapidly and reliably predict the 3D shape of most proteins, such as this structure from a fruit fly.

DeepMind

Last week, two groups unveiled the culmination of years of work by computer scientists, biologists, and physicists: advanced modeling programs that can predict the precise 3D atomic structures of proteins and some molecular complexes. And now, the biggest payoff of that work has arrived. One of those teams reports today it has used its newly minted artificial intelligence (AI) programs to solve the structures of 350,000 proteins from humans and 20 model organisms, such as Escherichia coli bacteria, yeast, and fruit flies, all mainstays of biological research. In the coming months, the group says it plans to expand its list of modeled proteins to cover all cataloged proteins, some 100 million molecules.

“It’s pretty overwhelming,” says John Moult, a protein folding expert at the University of Maryland, Shady Grove, who runs a biennial competition called the Critical Assessment of protein Structure Prediction (CASP). Moult says structural biologists have dreamed for decades that accurate computer models would one day augment extremely precise protein shapes derived from experimental methods such as x-ray crystallography. “I never thought the dream would come true,” Moult says.

The computer model, called AlphaFold, is the work of researchers at DeepMind, a U.K. AI company owned by Alphabet, the parent company of Google. In fall of 2020, AlphaFold swept the CASP competition, tallying a median accuracy score of 92.4 out of 100, well ahead of the next closest competitor. But because DeepMind researchers didn’t reveal the details of how they mapped protein shapes theoretically, specifically AlphaFold’s underlying computer code, other teams were left frustrated, unable to build on the progress. That began to change last week. On 15 July, researchers led by Minkyung Baek and David Baker at the University of Washington, Seattle, reported online in Science that they had created a highly accurate protein structure prediction program called RoseTTAFold, which they released publicly. The same day, Nature rushed out details of AlphaFold in a paper by DeepMind researchers led by Demis Hassabis and John Jumper.

Both programs use AI to spot folding patterns in vast databases of solved protein structures. The programs compute the most likely structure of unknown proteins by also considering basic physical and biological rules governing how neighboring amino acids in a protein interact. In their paper, Baek and Baker used RoseTTAFold to create a structure database of hundreds of G-protein coupled receptors, a class of common drug targets.

Now, DeepMind researchers report in Nature the creation of 350,000 predicted structures—more than twice as many as previously solved by experimental methods. The researchers say AlphaFold produced structures for nearly 44% of all human proteins, covering nearly 60% of all the amino acids encoded by the human genome. AlphaFold determined that many of the other human proteins were “disordered,” meaning their shape doesn’t adopt a single structure. Such disordered proteins may ultimately adopt a structure when they bind to a protein partner, Baker says. They may also naturally adopt multiple conformations, says David Agard, a structural biologist at the University of California, San Francisco.

A database of DeepMind’s new protein predictions, assembled with collaborators at the European Molecular Biology Laboratory (EMBL), is freely accessible online. “It’s fantastic they have made this available,” Baker says. “It will really increase the pace of research.”

Because the 3D structure of a protein largely dictates its function, the DeepMind library is apt to help biologists sort out how thousands of unknown proteins do their jobs. “We at EMBL believe this will be transformative to understanding how life works,” says the lab’s director general, Edith Heard.

DeepMind collaborators say AlphaFold2 has already spurred the development of novel enzymes that break down plastics in the environment more quickly than those found previously and led to novel possibilities for drugs to treat neglected diseases. “This will be one of the most important data sets since the mapping of the human genome,” says Ewan Birney, director of EMBL’s European Bioinformatics Institute.

The impacts aren’t likely to stop there. The predictions will help experimentalists who solve structures, Baek says. Data from x-ray crystallography and cryo–electron microscopy experiments can be difficult to interpret, Baek and others say, and having a model can help. “In the short term, it will boost structure determination efforts,” she predicts. “And over time it will also slowly replace [experimental] structural determination efforts.”

If that happens, structural biologists won’t find themselves out of work. Baker notes that both experimental and computational scientists are already beginning to turn their efforts to the more complex challenge of understanding exactly which proteins interact with one another and what molecular changes happen during these interactions. “It’s going to reset the field,” Baker says. “It’s a very exciting time.” 

 Source link

Back to top button
SoundCloud To Mp3