SAN DIEGO – At the State of Innovation session during BIO 2017, a brief exchange neatly summed up the current status of big data. Noting that clinical trial expenses and success rates have not changed appreciably as the amount of data that is collected from patients, most prominently but not only through genome sequencing, has increased, venture capital firm Flagship Pioneering's Jeremy Springhorn lamented that "there's all this information out there, yet we're not using it."

Richard Harrison, chief scientific officer of Clarivate Analytics (parent company of BioWorld), disagreed, noting that he and colleagues have published data that show a strong increase in the use of biomarkers in clinical trials.

"We're using it," Harrison said of the information that the life sciences industry is awash in. "I just don't know if we're using it correctly."

That in a nutshell, is big data.

The 2013 quip made by Duke University professor of psychology and behavioral economics professor Dan Ariely that big data is like teenage sex – "Everyone talks about it, nobody really knows how to do it" – still accurately describes a lot of big data efforts.

Unlike teenage sex, though, big data efforts are typically conducted by people who know they've got a lot left to learn. And so Wednesday's daylong series of sessions on big data spent plenty of time on challenges as well as opportunities.

Eric Green, director of the National Human Genome Research Institute, illustrated the challenges of using big data for genomics. "In many ways, genomics has been a bit of a poster child for biological big data," he told the audience. "It represents the front end of the data revolution."

The sequencing of the human genome has impacted areas well beyond human health by enabling technological advances in sequencing that are now used across all fields of biology.

There are some fields where genomic analysis has already had a giant impact on medical care. Tumor sequencing – sometimes of targeted genes, sometimes of larger panels – is now a standard part of cancer care.

And Green said that the number of rare diseases whose genetic basis is known has gone from 61 on the day the Human Genome Project began to roughly 4,700 today.

But rare monogenic diseases "is not what fills hospitals and clinics around the world," Green pointed out. "What fills hospitals and clinics . . . is common, complex diseases."

For those diseases, getting from big data to precision medicine will take the integration of multiple sets of big data that measure the interactive effects of genes, environment and lifestyle.

As a result, precision medicine remains a largely aspirational goal.

Reaching that goal will take addressing different challenges on different timescales, said Gunaretnam Rajagopal, global head of computational science at Janssen R&D.

The current challenge of big data, he told the audience, is to decide which questions can be answered with presently available data, and a short-term goal is to decide what sort of data should be collected to best supplement what is already available, and how to scale up collection and analysis methods to deal with amounts of data that already surpass what can be processed by the humans making the clinical decisions based on that data, and are ever increasing.

In the long term, Rajagopal said, one of the key issues was going to be to develop trustworthy analytical systems.

Green concurred, saying that technical innovation "is not what I worry about. . . . It's the analysis of the data that presents the far greater problem."

Atul Butte, director of the University of California at San Francisco's Institute of Computational Health Sciences, agreed that analytics are critical, and that as big data turn into bigger data, and then huge data, new analytical methods based on machine learning and artificial intelligence (AI) will be important.

But he argued that "the most important thing about machine learning and AI is to demystify it."

"It seems like it is the realm of wizards – it's software," he told the audience. "The hardest part is to figure out what questions you want to ask."

Butte is deeply optimistic about the potential for big data to ultimately transform medical care, and gave several examples of companies, including Carmenta Inc. and Numedii Inc., that have found commercial success by analyzing publicly available big data to develop products.

Butte noted that beauty of big data, especially open-access big data, is that it is a resource that is not depleted as it is used.

If one person takes a resource such as water or oil from public land, that resource is gone, he said. "But if I take data, you can also have it!"

And there are plenty of things to do with that data, he added. "Every one of you [in the audience] could do a different diagnostic and we could not step on each other's toes, because that's how many diagnostics we need."