By Dean A. Haycock

Special To BioWorld Today

Designer proteins are here.

With the aid of a supercomputer, it is now possible to create custom-designed proteins. The technology, called automated protein design, is new but its potential is obvious. In a few years, researchers may be able to order protein structures of their choice as routinely as they now order other custom-made reagents. Enzymes, hormones and other proteins could be redesigned to acquire properties that would make them easier to study or more effective in the clinic.

Biochemists have been able to synthesize proteins for years, of course, by connecting their amino acid building blocks together one after another. But the results of these efforts were unpredictable because no one could predict what 3-D shape such a protein would assume. And the 3-D shape of a protein is crucial for its biological activity. If you can't accurately predict it, you can't modify or design a protein to do your will.

The difficulty in predicting the final 3-D structure of a protein is so overwhelming it requires the use of a supercomputer. That is the machine Bassil Dahiyat and Stephen Mayo used in the first fully automated design of a novel sequence for a protein. Their wished-for protein was synthesized, and its structure confirmed, using standard technology. A report of their accomplishment, "De Novo Protein Design: Fully Automated Sequence Selection," appears in the current issue of Science.

Dahiyat contributed to the research while a graduate student at California Institute of Technology (Caltech), in Pasadena, Calif. He is now president of Xencor Inc. in Pasadena, a start-up company established to develop and commercialize the new technology he helped develop. Mayo is an assistant investigator at the Howard Hughes Medical Institute and an assistant professor at Caltech.

The two scientists demonstrated the validity of their approach by successfully making from scratch a protein with 28 amino acids that assumed the 3-D shape they wanted it to assume. Twenty-eight amino acids can be connected in different sequences to create 1036 different chains. Sorting through these potential combinations to find the ones that will produce an acceptable geometry requires a tremendous amount of computer power and an algorithm that is up to the job. The computer was supplied by Silicon Graphics Inc., of Mountain View, Calif., and the algorithm by Dahiyat and Mayo, who worked on the project for five years.

Their instructions to the computer allowed it to identify — using lessons it has taken protein chemists decades to learn — a sequence of amino acids that would fold into the final geometrical shape the authors desired. In this case, the researchers created a custom-designed version the polypeptide backbone structure of a zinc finger domain. The parent protein is involved in gene expression.

They have shown they have "an excellent algorithm for finding the optimal sequence for a given fold. If that can be translated into finding the optimal sequence for proteins of biotechnological interest, we may be able to obtain proteins that are much more stable," William DeGrado, professor of biochemistry and biophysics at University of Pennsylvania School of Medicine, in Philadelphia, told BioWorld Today.

The success of Dahiyat and Mayo suggests it should be possible to improve on naturally occurring proteins. Proteins might be redesigned to work better at extreme temperatures or in different environments, for instance. They also might be made more or less resistant to enzymes that break them down.

Twenty-eight amino acids is small for a protein. Future users of the technology will certainly want designer proteins made from 100 amino acids at least. Such a molecule has 10130 possible amino acid sequences. To choose the best sequence, the algorithm begins by searching for the sequences least likely to produce the desired final geometry. It thus eliminates rapidly the worst possibilities and narrows a list of the best possible answers, a process called "dead-end elimination." The selection criteria take into account the potential interactions of side chains that will be exposed to solvent, hidden in the interior of the molecule and located somewhere between the two.

"If you want to use dead-end elimination, you need quite a bit of high-end computer power," Mayo told BioWorld Today.

Calculation Time Increases With Molecule Size

To predict the sequence of the modified zinc finger protein, the authors ran the problem on a machine that employed 10 processors running in parallel. The calculation took an actual elapsed time of 10 hours and a total time of 90 hours.

"The unfortunate thing with the dead end elimination approach is it doesn't scale nicely with the size of the protein. For example, we are now looking at a molecule which is twice as large as the one reported in the Science paper and we are running those calculations on a 32 processor machine," Mayo said. That calculation has taken an elapsed time of 48 hours.

Less calculating time, however, will be required for users who know precisely which amino acids in a large protein they wish to modify.

"If you are looking at large protein hormones, which might be a couple of hundred amino acids, it is possible today to take our calculations, and run them on a portion of the molecule known, for example, to cause aggregation behavior in a solution. That portion of the molecule might be only 10, 20 or 30 amino acids," Mayo said.

People in the industry have already expressed interest in the technology, but Mayo said it is too early to name them. Patents are in preparation.

The author's expectation is to double the size of proteins that can be analyzed on an annual basis. That goal assumes access to larger computers and continual improvement in performance of the software. The two scientists hope to have completed work on molecules with 51 and 56 amino acids in less than one year.

"I wouldn't be surprised if we had them done before the end of this calendar year which is a growth rate much better than I originally anticipated," Mayo said.

At the same time, Dahiyat and Mayo are developing other computational tricks that should allow them to run the calculation on larger molecules without necessarily requiring larger computers. *