Privacy concerns related to DNA sequencing got yet another airing today when a team from the Whitehead Institute reported in Science that using only publicly available information, they have been able to identify about 50 men who had anonymously donated DNA to projects such as the Thousand Genomes Project.
In other words, there is plenty of DNA out there that has an identity attached to it. And if that DNA belongs to a relative of the supposedly anonymous donor, parts of it can match exactly.
Most DNA is mixed up during conception, of course – that’s the beauty of sexual reproduction. But mitochondrial DNA is passed on unadulterated from mothers to all their children, and Y chromosomes are inherited wholesale from fathers by their sons.
The team now publishing in Science used the fact that something else is also almost always inherited from fathers to sons: surnames.
They started by using so-called metadata from the anonymous donors that is made public along with the DNA itself, such as the donor’s age and donation site, which allows a pretty good inference about the state they live in.
Using such characteristics, they were able to narrow down the potential donors by quite a bit. By then looking at the surnames of men with matching DNA in genealogy databases, they were able to identify a number of specific donors.
The method can’t identify every donor. But the team estimates that they could identify somewhere between 10 percent and 20 percent of all anonymous male donors using their approach. In response to the findings, the National Institute of General Medical Sciences and the National Human Genome Research Institute have removed some information about their donors from public view.
Researchers Defend Open Access
Similar collisions between one person’s desire for anonymity and his relatives’ lack thereof have come about before. Children who were conceived using a theoretically anonymous sperm donor, or their mothers, were able to track down the sperm donor after first connecting with other relatives via genealogy sites. Researchers are already rallying to defend the importance of open-access databases – which is true enough.
It is, of course, also easy for them to say about someone else’s DNA. Especially since it is not yet clear what genetic information can and can’t be used for by those that uncover it. For example, health insurance cannot be denied on the basis of a person’s DNA, but no explicit law exists for other types of insurance such as long-term disability.
Individual DNA donors may decide that the risk of being “outed” is worth it for the advantages of contributing to scientific progress, and of open access data. But any potential donor needs to understand that, in the words of the Science paper’s authors, “data release, even of a few markers, from one person can spread through deep genealogical ties and lead to the identification of another person who might have no acquaintance with the person who released his genetic data.”
Or perhaps, men should take a page from women’s playbook and start lying about their age.