Sequences are nice, but they aren't shapes.

To understand a protein, determining its amino acid sequence is a mere first step. Next comes the determination of its structure and, even more complicated, how the protein folds into that structure after coming off the production machinery as a line of amino acids.

Getting that information at the atomic level is, at that point, the proverbial needle-in-a-haystack problem: Even a relatively short peptide will have several hundred atoms, and each of those atoms will have several possible states. Nor is there one true path to protein folding, since most proteins can get to their end conformation via a number of different pathways, meaning that the total possible number of states of a peptide is nearly infinite.

Part of the problem is that the folding trajectory cannot necessarily be broken down into parallel problems. There are approaches that model protein folding 10 nanoseconds at a time and try to string together a number of short modeling results into one longer complete folding sequence. But because there are multiple ways for proteins to fold, not every intermediate state leads to the completed proteins.

"Say you have something to do that will take you three hours," Eugene Shakhnovich told BioWorld Today. "Even if you set a thousand people to work on it for a minute each, if you need to go through one step to get to the next, at the end you will not have accomplished the same thing."

In the Nov. 21, 2006, edition of the Proceedings of the National Academy of Sciences, Shakhnovich, a professor of chemistry at Harvard University, and his colleagues reported on a way to overcome the limitations and predict a full folding pathway.

The researchers used a computational model to generate thousands of possible intermediate states in the folding of a peptide known as the engrailed homeodomain. They then clustered those states into structurally similar groups, and developed a concept of flux to determine which states represent part of the road most often traveled by the peptide as it folds. "Most trajectories pass through high-flux clusters," Shakhnovich said, while low-flux clusters essentially see much less traffic.

"High-flux clusters are milestones," Shakhnovich said, "and low-flux clusters are detours. That's the essence" of the clustering approach.

Shakhnovich said that the approach could turn out to be useful for drug development in two ways. For one thing, there is the obvious potential to model drug targets. But additionally, he said, "this method can account for protein motion upon interaction with a drug," meaning it can be used to predict protein/small-molecule interactions. Shakhnovich said that Vitae Pharmaceuticals, where he sits on the scientific advisory board, uses "similar concepts," though not the exact methods or algorithms that the PNAS paper reported on.

Another recent paper, in the Nov. 10, 2006, issue of Science, reported on solving a similar folding problem for nucleic acids. There the authors, who hail from Stanford University, used direct measurements of the energy states of RNA hairpin molecules to determine the "Full, Sequence-Dependent Folding Landscape Of A Nucleic Acid," as they put it in their title. Shakhnovich called his colleagues' approach "ingenious," but noted that such an approach would not work for proteins.

"Nucleic acids fold by very different mechanisms," he said.