Beijing, April 23 (Science and Technology Daily) — By Zhang Mengran
American researchers have harnessed artificial intelligence (AI) to design novel proteins that go beyond those found in nature. Their newly developed machine learning algorithm can generate proteins with specific structural features, which could be used to create materials with tailored mechanical properties—such as stiffness or elasticity—potentially replacing petroleum- or ceramic-based raw materials. The findings were published in the latest issue of the journal Chem.
Researchers from the Massachusetts Institute of Technology (MIT), the IBM Watson AI Lab, and Tufts University employed a generative model based on the same machine learning architecture used in AI systems like DALL·E 2. However, they adapted the model’s architecture to predict amino acid sequences capable of forming proteins with desired structural characteristics.
The model learns the biochemical relationships that govern how proteins fold and assemble, enabling it to generate entirely new proteins not observed in nature—opening doors to unique applications. For instance, coatings designed by this tool could extend the shelf life of fresh produce while remaining safe for human consumption. Moreover, the model can generate millions of protein candidates within just a few days, rapidly expanding the landscape of possibilities for scientific exploration.
In this study, the team developed two machine learning models to predict new amino acid sequences that would fold into proteins meeting specific structural design goals. One model operates at the level of overall protein structural properties, while the other works at the individual amino acid level. Both models construct proteins by assembling these amino acid building blocks.
These models are integrated with protein-folding prediction algorithms, which the researchers used to determine the 3D structures of the generated proteins. They then computed the resulting physical properties and evaluated them against the original design specifications. Testing revealed partial overlap—typically around 50% to 60%—with existing natural amino acid sequences, though some sequences were entirely novel. This degree of similarity suggests that many of the AI-generated proteins are likely synthesizable in practice.