SEATTLE – Tracing the family tree of COVID-19 through its evolving DNA sequence makes it possible to disprove many false claims circulating on social media about the novel coronavirus, and, in particular, that it was generated in a covert biological weapons program.

“From everything I’ve looked at, there is zero evidence for genetic engineering; it looks like normal evolution,” said Trevor Bedford, a computational biologist at Fred Hutchinson Cancer Research Center, who has been using genomes sequences taken from patient samples to track the spread of the virus since Jan. 11.

“Thousands of mutations are distributed across the genome. If you’re engineering something, you wouldn’t do that. There are no signals for biological engineering. It looks like natural evolution,” Bedford told attendees of the AAAS meeting on Feb. 14.

Bedford also decried a paper published on the Biorxiv preprint server by scientists at the Indian Institutes of Technology, pointing to an “uncanny similarity” between COVID-19 and HIV. They claimed to have identified four insertions in the spike glycoprotein of COVID-19, through which the virus binds to the host cell, that are not present in other coronaviruses, but which looked the same as key structural proteins of HIV-1, a finding that they said “is unlikely to be fortuitous in nature.”

The research “was very shoddily done,” Bedford said. “The sequence differences are not unique to COVID-19. Closely related [bat] coronaviruses have these chunks as well. They are small motifs used by nature over and over again.”

The paper was swiftly withdrawn from Biorxiv, but the allegations continue to have a life of their own on social media, with stories headlined “Scientists confirm” COVID-19 is “man-made.”

Biorxiv has proved an important conduit for rapid publication of legitimate research about the virus, but the controversy around the Indian paper led the website to add a yellow band across all its postings about COVID-19 to stress that these are “preliminary reports that have not been peer-reviewed … and should not be reported in news media as established information.”

The volume of misinformation about COVID-19 led World Health Organization (WHO) Director General Tedros Adhanom Ghebreyesus to label it an “infodemic.” WHO has set up a team to monitor and respond to “myths and rumors” around the clock.

Along with debunking bioweapon conspiracy theories, the genomes of 100 samples of COVID-19 taken from patients that have been sequenced to date also are providing insights into the epidemiology of the virus. In combination with live case records and mathematical modeling, that gives lie to claims there has been a cover-up, and that far more people have contracted the virus than officially reported.

Comparing virus from different patients and knowing how fast it mutates, makes it possible to say how many cases have occurred. “We get upwards of 200,000 total infections,” Bedford said. That fits with estimates based on mathematical models published by researchers at the WHO Collaborating Centre for Infectious Disease Modelling, Imperial College London, he noted.

The family history exposed by the genome sequences debunks another rumor, that COVID-19 crossed to humans from snakes or fish. Based on the genetic analysis, the likelihood is that the virus was transmitted by a bat to another mammal between 20 and 70 years ago. That as-yet-unidentified intermediary passed the virus on to its first human host in the city of Wuhan in late November or early December 2019.

Global cooperation

Virus genomes are being released three to six days after sample collection and shared around the world via GISAID (global initiative on sharing all influenza data). The number of genome sequences and the speed with which they have been published underlines the unprecedented level of global cooperation in tackling the epidemic, Bedford said. In the 2013 – 2016 Ebola epidemic in West Africa, it was a year before the first sequence was available; in the case of Zika virus, it took several months. Even with seasonal flu and all the resources thrown at that, updates are monthly, albeit the norm is to sequence and publish multiple genomes at once.

Each of the different COVID-19 sequences varies by a handful of single amino acid point mutations. That forms the basis of the family tree showing how virus samples collected at different times and in different locations, are related.

The technique was used in the West Africa Ebola epidemic, and in tracking the geographical spread of the Zika virus. “It is a super-useful tool,” Bedford said.

The first five sequences of COVID-19 that were made available on Jan. 11 had little genetic variety, with three being identical and two having slight differences between them.

“We know that these sort of coronaviruses mutate at about one mutation per genome, per month. And so just seeing this, we know that all of the five viruses shared a very recent origin,” Bedford said.

That was consistent with the supposition that the original source was repeated animal-to-human transmission at the seafood market in Wuhan, where live animals were on sale.

However, by Jan. 19, COVID-19 genomes from Wuhan and Thailand indicated there was human-to-human spread. “The genome sequences actually provided an early view of this, before other data streams. I think that was hugely valuable,” said Bedford.

Bedford’s real-time tracking of the evolution of COVID-19 is posted on the open source website All the genomes analyzed to date are highly related, with at most seven mutations relative to the common ancestor. There is no sign of the virus becoming more virulent or infectious.

As of Feb. 14, there was a total of 47,505 laboratory-confirmed cases of COVID-19 in China, and 16,427 cases that have been clinically confirmed in Hubei province. There have been 1,381 deaths in China, including 121 reported on Friday, while outside China there have been 505 cases in 24 countries, and two deaths.

