The hype surrounding artificial intelligence (AI) can make it sound like the technology has all the answers. But from a scientific perspective, one of the technology's biggest strengths is that it can ask better questions.

"Whatever machine learning gives you, they are only predictions, and you still have to go back into the lab and validate it," Jessica Vamathevan told BioWorld MedTech. "You have to work on that particular protein or work on that particular mutation."

Much of the power of AI lies in its ability to synthesize vast amounts of data for generating those predictions. The advent of computing technology in every aspect of experimental research has simultaneously provided an impetus to develop computational tools, and the necessary data volume to test those tools.

"In biology, data is exploding [and] has been exploding for several years," Vamathevan said. "There has been a step change in technology" – while sequencing the first human genome took years, "we can now do those kinds of projects in a matter of hours."

Vamathevan, who is head of the strategic partnership office at the European Bioinformatics Institute (EMBL-EBI), compared the situation in much of biomedicine to the average smartphone.

Prior to the advent of digital photography, cameras came with rolls of film that allowed for 12, 24, or, if one was being profligate, 36 exposures. Each of the prints, good or bad, cost the same amount of money when developing the film.

Today, there is essentially no limit to the number of pictures that can be taken by even the average smartphone camera, and those images can be evaluated at no cost by looking at them on the camera's screen.

"You can take lots and lots of rubbish pictures to get a good picture – but it's the good picture that counts, and that you might want to do something with," she said.

Machine learning, a subset of AI, can be a powerful tool because it can, to an extent, teach itself to sift through the rubbish in search of the sort of data that can fuel true insight.

"It has to be able to learn," Vin Singh, CEO of Bullfrog AI, which uses machine learning to identify patient groups that are likely to benefit from experimental drugs that have made it to late-stage clinical trials, but then failed in those trials, told BioWorld MedTech. "If it can't learn, if you can't train it, then it's just traditional analytics."

That same independence, though, is the source of the technology's biggest headaches as far as drug discovery is concerned.

Black boxes

Machine learning algorithms are learning to classify data without explicit instruction on what they need to use to classify it. While an algorithm is being trained, it will receive feedback on whether its classification of a given data point is correct, but not why it is correct, or incorrect.

Ideally, with enough training, a machine learning algorithm can go on to classify new data into the categories it has learned. And in principle, an algorithm can evaluate a much larger number of variables than its programmer, potentially enabling it to classify data better than even a human expert. Potentially.

In practice, machine algorithms can pick up on artifacts as well as meaningful differentiators.

In fact, Daphne Koller told BioWorld MedTech, "the more powerful the machine learning model is that you use, the more capable it is of latching onto subtle artifacts."

The black box nature of what the algorithm is learning has been widely discussed as a problem in AI, with some researchers criticizing that not just the criteria that individual algorithms use, but the whole field is essentially a black box.

But in Koller's opinion, the problem is with biases in datasets more than with black boxes.

The black box concept "is one of those things one needs to be nuanced in thinking about," she said. "Whether the black box is an issue once you remove [biases] depends on your goal.

For one thing, she pointed out, "the human brain is also a black box."

In a striking example, experienced pathologists looking at tumor biopsies are excellent at classifying the corresponding tumors according to whether they are benign – that is, unable to metastasize – or malignant.

But they cannot precisely describe how they make that decision.

Nevertheless, because an experienced pathologist is right the vast majority of the time, the treating physician will trust their judgment even though how they arrived at that judgment remains opaque.

The leopard-skin pillbox-hat problem

There are numerous examples – some well documented and some quite possibly apocryphal, but instructive nevertheless – of artifacts that machine learning algorithms can latch on to.

One example is the neural network that could distinguish leopards from jaguars with high accuracy, but could not distinguish leopards from leopard-print sofas because it learned to look at spot shape and nothing else when classifying images as jaguars.

Another algorithm learned to tell snow from grass, but not the wolves (in snow) from the dogs (on grass) that its programmers were hoping it would learn to distinguish.

A well-documented example in the medical field is that imaging analysis software trained to diagnose pneumonia latched onto differences between images taken with a traditional X-ray machine and a portable one that was brought to the bedside of very sick patients instead.

Yet another algorithm learned to give glowing prognoses to pneumonia patients with asthma who were admitted to the ICU, because doctors are much more likely to send asthmatic patients to the ICU at the first sign of possible pneumonia, where many of them recover very quickly because they did not have severe pneumonia in the first place.

Specifically in drug discovery, a study published in the Aug. 20, 2019, issue of PLoS ONE reported that when its authors used a popular dataset on ligand-target interactions, the DUD-E dataset, to train a convolutional neural network, the algorithm completely ignored any data on targets.

Senior author Tom Kurtzman, associate professor of chemistry at Lehman College, told BioWorld MedTech that he stumbled across the issue while trying to use DUD-E as a training set for a drug discovery problem his team was working on.

The method seemed to work with 99% accuracy to distinguish ligands that would bind from target proteins from those that would not. "If it worked for real, that's exactly what you'd want in the pharmaceutical industry. It's a trillion dollar method," he said.

Alas, when Kurtzman and his team developed a series of tests to see what the program was learning, they found that withholding information on the target protein did not change the program's performance.

"Categorically, it's not learning protein-ligand interactions at all," Kurtzman said.

Instead, the program appeared to be picking up on other factors. Perhaps the most important factor is that molecules that bind the same target are more likely than two randomly chosen molecules to have similar structures. That likelihood increases further when two binders are part of the same lead series, which look similar by design.

"A molecule that's going to bind to HIV proteases is going to look like other molecules that bind to HIV proteases," Kurtzman said. Such bias is not in itself a problem – it is used in the biopharma industry to discover new lead compounds.

The problem, he said, is that purportedly, the convolutional neural network algorithm is "learning the basic physics of molecular interaction, and it's not."

In the studies his team has now published in PLoS, "ligand biases didn't do any better than existing methods" to identify promising leads.

Kurtzman's findings illustrate that even in the era of Big Data, datasets that are appropriate for specific machine learning research questions are not necessarily in abundant supply – but that using inappropriate datasets can amount to a new iteration of searching for your wallet under the streetlamp because that's where it's bright.

Brian Shoichet, a professor of pharmaceutical chemistry at University of California, San Francisco, whose lab developed and maintains the DUD-E dataset in collaboration with the lab of adjunct associate professor John Irwin, said that DUD-E was developed for use with "methods that use physical properties ... if you now take a method that uses topological information [for analysis], of course the whole thing breaks down."

In general, he said, the idea of "cheating" by latching on to artifacts "is well known in AI ... though people might not be as aware of it with respect to drug discovery."

Shoichet's own group, he said, uses the database in conjunction with physics-based methods of drug discovery, but "it's not a learning-based method." For machine learning methods, he said, the database "has to be used with this idea of cheating in mind."

Koller's Insitro is approaching the problem of where to find appropriate data by being an experimental as well as a computational company.

"We have been grateful beneficiaries of advances that have happened over the past three to five years," she said. The company generates iPS cells and converts them into a variety of lineages, which enables them to generate data on the effects of a mutation in various cell types.

The company also has partnerships with companies that have complementary clinical data – in April, the company announced a deal for nonalcoholic steatohepatitis (NASH) target discovery with Gilead Sciences Inc. – since, Koller said "we're not trying to cure cells, we're trying to cure people."

Combining disparate datasets is, in general, an area where AI can break new ground. Several companies, such as Exscientia Ltd., of Oxford, U.K. and Cyclica, of Toronto, are using AI to look at a ligand's interaction with multiple potential targets, both those that could provide additional therapeutic benefits and those that could lead to side effects.

Cyclica CEO Naheed Kurji told BioWorld MedTech that applying AI to understanding a ligand's interaction with one target remains "a target centric-approach ... inherently, [it does] not adjust for the downstream risk when that molecule is put into a more complex system," aka the patient.

"We essentially took the inverse approach," he said, testing each ligand against a "panoramic view of the proteome ... To my knowledge, ours is the only platform on the market that can take a small molecule and screen it across the entire characterized proteome."

Kurji said it was both the goal and the "moral obligation" of AI and machine learning companies "to put the best platforms forward into the hands of domain experts so they can take more steps in the right direction, fewer steps in the wrong direction, and get better medicines to the people that need them," he said, while bearing in mind that in order to gain acceptance, those platforms will need to be transparent to their users.

"Our goal is not to simply tell an experimental scientist to ... take a certain step in a direction," he said. "The only way to get them to do that is to tell them why."

No Comments