Google’s DeepMind division has developed some interesting and impressive AI models, including one that makes you (and almost anyone else) play StarCraft II better. DeepMind isn’t just interested in AI for playing games. Last year, the company unveiled AlphaFold, a machine learning model that can predict the shape of proteins. Now, DeepMind has announced that it has created structures for all 200+ million proteins in the centralized UniProt database. This is a big deal for basic biological research as well as efforts to tackle some of the most pressing scientific problems of our time.
Proteins are the basis of all biological life on Earth, but even if you know the amino acid sequence of a protein, that doesn’t mean you know what it does or how it works. A protein’s sequence gives it a pattern of positive and negative charges, hydrophilic and hydrophobic regions, and cross-linked segments. This is what determines the protein’s active shape, or “conformation” as it’s known in the lab, and the structure of a protein is what gives it its function. Even a few mistakes in structural predictions can be the difference between an enzyme that correctly catalyzes a reaction and one that does literally nothing.
Determining the structure can be a laborious process, often relying on advanced techniques such as X-ray crystallography. AlphaFold helps put that data into context with highly accurate conformation predictions. In the video below, you can watch a team from the University of Colorado, Boulder talk about the challenges of studying proteins involved in bacterial resistance to antibiotics. The team spent ten years puzzling over the shape of a protein that Alphafold was able to predict in just a few minutes. This is possible because Alphafold has been trained on more than 170,000 known protein structures, giving it the ability to predict what new sequences will look like in 3D.
When DeepMind announced Alphafold last year, it decided to make the Alphafold database freely accessible. At the time, there were just over a million structures available, making a 200-fold increase over the past 12 months quite impressive. DeepMind says Alphafold has been cited in more than 4,000 scientific papers since its debut and could help scientists understand important topics such as antibiotic resistance, food safety and the effects of plastic pollution.
With the entire UniProt database now complete, DeepMind will provide a predicted sequence directly on the web page. The complete database of all 200 million structures will be available as a bulk download from a Google Cloud public database.