New Study: Protecting Privacy in DNA Research

In the last few years, genome association studies have led to breakthrough medical discoveries. However, due to privacy concerns that the identity of individuals could be determined through DNA data, health institutes in the US and abroad have removed public access to the genetic data coming from these association studies. Such association studies have been shown to shed light on diseases such as cancer or Alzheimer's disease, and sharing the raw data from these studies with other scientists can aid tremendously with further discoveries. For this reason, Dr. Eran Halperin of the International Computer Science Institute and Tel Aviv University, and colleagues at the University of California, Berkeley have developed "a mathematical formula and a software solution that ensures that malicious eyes will have very low chances to identify individuals in any study," says Dr. Halperin.

The team found a mathematical formula to determine which SNPs –– small molecules of DNA that differ from individual to individual in the human population –– can be publicly accessed without compromising information about the participation of specific individuals in the study. Using software designed with this formula, NIH and other institutes can distribute important research data and make it available to scientists without compromising anyone's privacy.

"We've been able to determine how much of the DNA information one can reveal, without compromising a person's privacy," says Dr. Halperin. "This means the substantial effort invested in collecting this data will not have been in vain. Making this data publicly available again could speed up research and allow people to make new discoveries, more quickly."

Genome association studies compare data from many individuals to identify specific positions in the genome that may be associated with an increased risk of disease, such as cancer. For instance, Dr. Halperin was recently involved in a study that found a link between a specific genetic mutation and risk of a type of non–Hodgkin's lymphoma. By allowing access to genetic information from such studies to the scientific community, other scientists can leverage these studies to find more connections between genetics and diseases.

The authors of the study plan to provide access to their software to NIH, and hope that scientists will use it, thereby providing public access to their now secure collected data.

A complete list of authors: Sriram Sankararaman and Michael Jordan of UC Berkeley.

Guillaume Obozinski from Willow, a joint research team between INRIA Rocquencourt, École Normale Supérieure de Paris and Centre National de la Recherche Scientifique. Eran Halperin of ICSI and Tel Aviv University.