Monthly Archives: February 2014

Anonymous Genetic Profiles Aren’t Completely Anonymous

Today it is easy for long-forgotten photos or personal information to live online indefinitely. But what if the most personal data about you – your genetic makeup – lived online? An individual’s genome contains a vast amount of information about inherited diseases and physical traits, all stored in strands of DNA. The consequences of being able to search, cross-reference, and analyze this information are profound, experts say.

Hundreds of thousands of people have already had their genomes mapped in the U.S., either for research studies or through one of several private companies offering this service. In many cases, people want to know their risk of medical maladies like heart attack or breast cancer, or to identify the specific gene causing a disorder in their family. What these pioneers of personal genome mapping might not know, though, is how easily re-identifiable their anonymous data can be. And if that is the case, the question might not be whether to share, but rather how to regulate and protect what is being shared.

“We are entering an era of ubiquitous genetic information,” said computational biologist Yaniv Erlich, speaking at the American Association for the Advancement of Science meeting in Chicago in February.

Erlich, who is a fellow at the Whitehead Institute for Biomedical Research in Cambridge, Mass., brings a unique but apt background to genetic privacy research: He is a former hacker, someone who was hired to expose weaknesses in the security systems of banks and credit card companies. He and his team took a similar approach to illustrate vulnerabilities within genetic databases. Their study, published in Science last January, recovered the identities of nearly 50 anonymous participants in the 1000 Genomes Project; and they did it using free, publicly accessible Internet resources.  

“We have shown that it is possible, in some cases, to take genetic sequencing data of males and infer the surname by inspecting the Y-chromosome of this person,” Erlich said, “with a success rate of about 12 percent.”

Their method relied on the code-like nature of genomes. On the Y-chromosome of every male, there is a type of distinct pattern made up of what are called short tandem repeats, or Y-STRs. Erlich’s team developed an algorithm to help identify these patterns, called Y-STR haplotypes, in a human genome.

A number of recreational genetic genealogy websites connect surnames to Y-STR haplotypes, with the intent of building family trees and reuniting distant relatives. Unintentionally, these databases make it possible to re-identify seemingly anonymous genomes.

By comparing anonymous data to genome data on two major public databases, Ysearch and SMGF, the researchers were able to find close matches, and further narrow them with other data such as surnames, ages, and states of residence.

While about 40,000 U.S. males share an average surname, the combination of a surname, birth year and state shrinks that number considerably.

From the honed-down list of about 12 males, the team was able to use Google and free services such as to track down the owner of the unknown genome. A similar technique has been used by individuals who were adopted or conceived from sperm donation to trace their biological families. As more genetic data reaches online databases, Erlich said, new threats to privacy are keeping pace.

So, he would like to explore the best ways to collect genetic data for scientific studies, while protecting the privacy of participants. And he thinks it is possible to have both.

Drawing accurate conclusions regarding inherited disorders requires analysis of millions of samples, Erlich said. One big concern is how to keep all of those samples private — from insurance companies, marketers, anyone who might discriminate or draw conclusions about participants based on this wide array of information.

Privacy becomes especially important in those cases, he said, since prospective participants of scientific studies have ranked privacy of sensitive information as one of their top concerns and a major determinant of whether they will participate in a study.

In order to protect privacy, Erlich and Princeton researcher Arvind Narayanan suggest a combination of access control, data anonymity and cryptography. As national policy continues to evolve on the subject of genetic privacy, private industry is gearing to fill in gaps in a number of ways.

For example, in the future, it could be the norm for users to send their genetic data through a cloud service as an added precaution. Kristin Lauter, head of the cryptography research group at Microsoft Research, likens this method, called homomorphic encryption, to “not having to trust your jeweler,” since users would hand over their precious information, and allow a private service like hers to do calculations on it in an encrypted form.

“The cloud service never sees your private data,” she said. “Only you, who has the key, can un-encrypt it and analyze the result.”

But, like using a credit card, one runs the risk of being hacked. This is why another element of protecting genetic privacy might lie in improved informed consent processes, as well as follow-up analyses of each individual’s results.

John Wilbanks, chief commons officer for the Seattle-based Sage Bionetworks, which advocates open and collaborative science, said he agrees with Erlich’s findings that re-identification risks are higher than people think.

“When these services guarantee anonymity, that’s a quite difficult promise to keep…I think right now they can tend to understate the re-identification risks, and overstate the risk of harm,” Wilbanks said.

Sarah Whitman, Inside Science

DNA collection aids arrests—but what about privacy?

In this week’s episode of “TechKnow,” we highlight the latest advances in forensic technology that are helping law enforcement agencies identify suspects and solve crimes with increasing accuracy. Cases that were left cold for years are being revisited with fresh eyes—and, more importantly, fresh technology. Where we have seen some of the largest leaps in the past decade is in the analysis of contact trace DNA or “Touch DNA.” A person sheds about 400,000 skin cells per day, and with smaller and smaller samples required to make an accurate match, it is becoming more difficult to commit a truly untraceable crime.

But in order to match crimes up with criminals based on trace amounts of DNA, it also requires expanding and centralizing local, state, and national DNA databases used by law enforcement.

Last June, the Supreme Court ruled 5-4 that it was constitutional to take DNA swabs from people who have been arrested for “serious crimes” without getting a warrant or waiting for a conviction. These DNA samples can be added to a database and can be used to solve past crimes—and also future crimes.

Laws in 28 states already allow for such samples to be collected upon arrest for felonies (and sometimes misdemeanors), and it’s likely that many if not all of the other 22 states will pass similar laws in light of the Supreme Court decision.

Privacy advocates warn that warrantless searches of a person’s DNA, especially for misdemeanor arrests, is a slippery slope.

Even conservative Justice Antonin Scalia agrees, writing in his Maryland vs. King dissent:

“Solving unsolved crimes is a noble objective, but it occupies a lower place in the American pantheon of noble objectives than the protection of our people from suspicionless law-enforcement searches.”


Those in favor of including the DNA swab as part of the routine booking process argue that it is no different than fingerprints. While biometric techniques such as fingerprinting have been used in law enforcement since the early 1900s, the systematic collection of biometrics has always been controversial.

Where DNA databases differ from fingerprint databases is in the amount potential information that could be extracted from the samples held—in the future. At the moment, the DNA swab sample doesn’t capture anywhere near all of the 3 billion markers that exist in an entire DNA strand. The sample is limited to 13 isolated markers that proponents say provide no useful information on an individual until a match is made.

However, once these markers are entered into the CODIS database, the national Combined DNA Index System run by the FBI, many police departments do not destroy the sample—they hold onto it indefinitely so that a retest can be done, should there be a positive match. This has privacy advocates concerned that as DNA technology continues to advance, these samples could be revisited and a more robust genetic profile extracted from the full genome.

Even though it is extremely speculative at this point, the concern is that in the future, as technology advances and our ability to access data from the full genome improves, genetic information such as what you look like and where you come from could be used not only in investigations, but also for profiling and predicting crimes. That opens up a Pandora’s box of questions—including whether our legal system is equipped to deal with complex issues of biomedical ethics and genetic privacy.

An investigative technique that is actually already being practiced by a handful of states —California, Colorado, Virgina and Texas—is the familial search. Partial DNA matches reveal possible relatives of a suspect, which can lead to their involvement in investigations. As unconvicted arrestees are added to these widening databases, will familial searches also be allowed in the states where this practice is legal? If so, more and more people, will be entangled in investigations without warrant.

The ACLU says:

“DNA testing of arrestees has little to do with identification and everything to do with solving unresolved crimes. While no one disputes the importance of that interest, the Fourth Amendment has long been understood to mean that the police cannot search for evidence of a crime.”

If a DNA swab is really intended to solve crime, not just to establish a new form of identification, it’s much harder to justify as the morally sound choice, especially when taking into account the presumption of innocence.

False arrests are damaging enough, but to be added indefinitely to a DNA database, without charge, seems more clearly an erosion of an innocent person’s right to privacy. Plus, racial minorities are convicted at a much higher rate than the overall population, according to the ACLU, 65 of prisoners who are sentenced to life without parole for nonviolent offenses are black. Similarly, black inmates make up 45 percent of the state and federal prison population while only 13 percent of the US population is black. These disparities inevitably lead to people of color being subjected to the seizure of their DNA more often than others.

And a recent FBI audit of the national DNA database yielded an unprecedented result: potential errors in nearly 170 profiles were detected. When compared against the 13 million profiles contained in the database, 170 may not seem like a lot, but it calls into question the fallibility of the process. It is also important to note that local and state databases are not subject to the same scrutiny and audits as federal records.

Mandatory genetic collection—like in the sci-fi movie “Gattaca”—may be more equitable, but it isn’t a particularly attractive alternative to privacy advocates, either.

The common argument is that if you are doing nothing wrong, then you have nothing to worry about. Though we’re always excited about innovative technology that can advance—and improve—important work being done, it’s important to continue raising these questions about the scope and future use of DNA evidence.

Noureen Moustafa, Al Jazeera


NHS postpones plan to share patient records


A plan to share NHS patient records with academics, medical charities and drug companies has been postponed for six months in an attempt to convince the public that privacy fears are unfounded.

The giant data base, seen as a likely fillip to Britain’s bioscience industry, was to have been launched in the spring.

David Cameron, the prime minister, had said it would “make the NHS the best place in the world to carry out medical research”.

However privacy campaigners had raised concerns that patients could have been identified from the information. Although records will be anonymised before being passed to researchers, individual patient numbers and postcodes will be fed into the data base.

Announcing the postponement, NHS England said it would now begin collecting patient information from GPs’ surgeries in the autumn rather than the spring as originally planned.

The aim was to “allow more time to build understanding of the benefits of using the information, what safeguards are in place and how people can opt out if they choose to,” it said.

Tim Kelsey, national director for patients and information at NHS England, said it had been told “very clearly” by the public that more time was needed to understand the benefits of sharing information and their right to object.

Sarah Neville, Financial Times

New Encryption Technique Promises to Keep Your Genetic Data Secure- For Now

Our genetic codes — the string of nucleotide “letters” that comprise our genomes — contain information about which diseases we may be susceptible to, or, what health conditions we have a predisposition for. However, this information is not absolute; having a gene or series of gene mutations that are biomarkers for a given disease, or spectrum of diseases, does not mean that our fate is sealed. Genetic information only provides an indication of probabilities — though what those probabilities are depends upon may other factors, many of which — like diet and exercise — are not in our genes at all.

And yet, medical research programs as well as medical insurance providers value this information and often use it to make treatment or research study participation decisions and insurance risk (and thus pricing) decisions, respectively. These are just two obvious uses of genetic data; there are other possible uses, no doubt, that we have yet to foresee. For these reason, issue of genetic privacy has emerged in recent years. This concern over protecting one’s genetic data will only increase as the cost of sequencing a genome drops more and more to the point where large numbers of people will have their genomes sequenced.

Further, recent reports of supposedly anonymous genetic data being correctly tied to its owner through more or less simple social media researching h as only intensified the concern. Yesterday at the annual Science/AAS meeting in Chicago, computational biologist Yaniv Erlich of the Whitehead Institution for Biomedical Research (Cambridge, Massachusetts) announced to those attending the symposium that he successfully matched anonymous genetic datato the exact person it came from in 12 percent of (anonymous) male genome donors.

So then, how does one protect one’s genomic data such that we control who sees it, and how much of it? This question has been asked fro several years now, but until recently, there were no promising answers.

But now there is a promising solution to this issue in the form of a fairly new encryption technique called homomorphic encryption.

The technique was presented at a symposium here at the annual Science meeting by cryptologist Kristin Lauter, research manager for the cryptography group at Microsoft Research in Redmond, Washington. The technique is a type of lattice-based cryptography scheme that allows users of the date to perform mathematical manipulations on it (like addition and multiplication) while still keep the data itself encrypted. This scheme was first developed by IBM in 2009.

During genome sequencing (the “DNA test”), genetic information is translated via a complex algorithm. This algorithm can be faithfully approximated using these mathematical operations. The lattice cryptology allowed homomorphic encryption which in turn allow computers to analyze the encrypted data (i.e., perform these mathematical operations) and produce encrypted results without ever actually decoding that genomic information. Thus the encryption technique allow researchers to analyze genetic data for genomic studies and research while simultaneously preserving patient privacy through protecting his/her genetic information.

Lauter compared the technique to “locking a gold brick in a safe with a pair of gloves attached to openings in the side. A jeweler could still use the gold to make jewelry without ever having full access to the gold brick.”

One drawback with this method is that more computational power and time is needed to encrypt the data compared to conventional encryption methods. but the research team is busy refining the technique to achieve “practical homomorphic encryption” which trades off computational flexibility for faster more efficient performance. The team was able to calculate a patient’s risk of a heart attack — based upon personal health data — in a fraction of a second.

The refinement is faster than “pure” homomorphic encryption, but according to the researchers, it’s still a billion times slower than it would be be due to the need to protect patient privacy (i.e., the patient’s identity and complete genomic sequence).

Wide-scale adoption of the technique with have to await standardization by the National Institute of Standards and Technology — a process that could take up to ten years.

While the technique will surely help keep a person’s genetic date more secure, genetic privacy critics point out that complete security is impossible. Combining the technique with other encryption and security methods could help improve DNA data security.

Regarding this, Lauter stated:

“Homomorphic encryption is a huge tool in our toolbox that we need to consider in policy discussions. We can’t solve all the problems using this method, but in combination with other, faster techniques it could provide a solution.”

But even this would not be full-proof, for long — especially as the cost of sequencing DNA continues to drop and consumer “bench top” gene sequencers become commonly available.  One could acquire a copy of someone’s genetic code simply by shaking hands with them and then swabbing his hands and sequencing the sample secretly and cheaply (in a form of genetic espionage).

Planet Save



Employer alert: Under GINA, family medical history is genetic information

When most people think of “genetic information,” what may come to mind is a DNA double helix or recent strides in genetic testing. But the definition of “genetic information” under the federal Genetic Information Nondiscrimination Act (GINA) is very broad and includes the family medical history of an employee or job applicant.

GINA expressly prohibits employers from asking about genetic information, including family medical history. And an alleged violation of that provision was the basis of the EEOC’s first lawsuit claiming systemic discrimination in violation of GINA.

The lawsuit, which was settled last month, alleged that a nursing and rehabilitation center in upstate New York violated GINA by asking job applicants to provide family medical histories as part of its post-offer, pre-employment medical exams (EEOC v. Founders Pavilion, Inc., No.13-CV-01438).

Although it appears that the employer complied with the timing requirements of the Americans with Disabilities Act (ADA) by delaying medical exams until the post-offer stage, GINA prohibits employer requests for family medical history at any time.

There is safe harbor notice that employers can use when requesting medical information about an applicant or employee . If an employer receives genetic information even though it uses the safe harbor notice, it’s considered an “inadvertent” disclosure—one that doesn’t violate the law.

No matter how an employer obtains genetic information, the information must be treated as a confidential medical record and kept separate from personnel files. Access to medical files should be strictly limited. Information may be kept in the same files that an employer uses for confidential medical information under the ADA as long as ADA’s confidentiality requirements are met.

The EEOC’s Strategic Enforcement Plan (SEP) outlines areas targeted for enforcement by the agency. Targeted areas include barriers to hiring as well as evolving areas of law. In light of the SEP focus areas, it’s likely that there will be additional lawsuits brought under GINA, particularly in cases alleging systemic discrimination. GINA applies to all public employers, private employers with 15 or more employees, employment agencies, and labor organizations.

To avoid claims of discrimination, particularly systemic discrimination, employers should focus on two areas: the hiring process and supervisor training.

Hiring process. Employers should walk through the hiring process, from the time a vacancy occurs through the new employee’s first day, and check all the steps and procedures involved to make sure the process complies with federal and state fair employment laws.

And some key items are important after the hiring process, too. For example, before recruitment begins, employers should check to see if the job description is accurate and current. The description should also specify essential job functions. An accurate, up-to-date job description is not only crucial to the hiring process it’s also invaluable when an employee requests a reasonable accommodation.

Supervisor training. Some fair employment laws, particularly those prohibiting discrimination based on disability or genetic information, can trip up supervisors who haven’t received training on prohibited inquiries or “regarded as” disabilities. Supervisors don’t need to know all the nuances of the law, but they should know what kinds of employee requests or statements may trigger an employer’s responsibility to, for instance, send an FMLA notice or start the interactive process.

Supervisors should also know about prohibited inquiries that can create employer liability – like asking an employee if a disease or disorder runs in the family. And finally, training supervisors to avoid any retaliatory actions will go a long way in helping employers avoid retaliation claims.

John Farrell, HR. BHR

Utah poll shows Americans still wary of genetic testing for cancer

Most Americans would consider undergoing genetic testing to predict their risk for certain cancers, but confusion persists over the benefits and risks, according to a University of Utah poll.

The U.’s Huntsman Cancer Institute has invested heavily in genetics in recent months and sponsored a poll last fall to understand the public’s perception of genetic testing. The online survey of 1,202 insured adults found nearly two-thirds would be at least somewhat likely to seek genetic testing to predict their likelihood of developing hereditary cancer. Over four-fifths would use genetic information to guide treatment.

But 34 percent would not seek testing — even if cost wasn’t an issue — primarily due to fears that the results could make it harder to get a job or obtain health insurance.

Federal law prohibits such discrimination, suggesting a need to better educate the public about the strengths and weaknesses of genetic screens.

“I see patients every week who could have taken steps to reduce their risk if they’d known they’d had a predisposition for a certain type of cancer. The best treatment for cancer is prevention, of which genetic testing plays an integral role,” said Saundra Buys, co-director of the Family Cancer Assessment Clinic and medical director of  High Risk Cancer Research at Huntsman, in a news release.

But not all patients are good candidates for testing, Buys added in an interview, noting family and personal health history are the most important factors in determining whether a person should pursue testing.

Only 5 percent of all cancers are thought to be inherited. But targeted therapies exist for about 50 mutations, and the list of target-able genes is growing.

Tests for these known mutations are only as useful as the counseling that accompanies them, Buys said.

“There are many genetic tests being ordered in physician offices around the country without the benefit of genetic counseling. The results of these tests are complex, and without appropriate counseling, can cause confusion and unneeded anxiety for patients,” she said.

Among other findings from the poll:

• Of those who said they would seek testing to guide treatment, 72 percent said they would be willing to share their genetic information for research purposes.

• Only 8 percent reported ever having had a genetic test, but some claimed it was for prostate cancer when no such screen exists. “These people might have had a PSA or rectal exam, but they didn’t have a genetic test,” Buys said.

The U. posted partial poll results online.

Harris Interactive conducted the survey of insured men and women between the ages of 25 and 70. Results were weighted to reflect the U.S. population. Answers carry a 3 percent margin of error.

Kirsten Stewart, Salt Lake City Tribune

Genetics for the People?

“Your genetic information should be controlled by you,” declares an advertisement for the American direct-to-consumer (DTC) genetic-testing firm 23andMe. Amid the current furor over electronic eavesdropping, the notion that individuals should decide who can access their personal data is particularly appealing. But whether 23andMe is practicing what it preaches remains dubious, at best.

In fact, even some “techno-libertarians,” who believe that government should not regulate new developments in biotechnology, have supported the US Food and Drug Administration’s decision – outlined in a scathing letter to 23andMe CEO Anne Wojcicki last November – to prevent the firm from marketing its tests pending further scientific analysis. “I’d like to be able to argue that the [FDA] is wantonly standing in the way of entrepreneurism and innovation by cracking down on 23andMe,” wrote Matthew Herper in Forbes. “I wish that was the story I’m about to write, but it’s not.”

According to the FDA, 23andMe’s marketing of an unapproved Personal Genome Service (PGS) violated federal law, because, after six years, the firm still had not proved that the tests actually work.

23andMe is also coming under fire from its users. Just five days after the FDA sent its letter, a California woman, Lisa Casey, filed a $5 million class-action lawsuit against the firm, alleging false and misleading advertising.

Doubts about 23andMe’s PGS go back a long way. In 2008, the American Society of Clinical Oncology commissioned a report that declared that the partial type of genetic analysis offered by 23andMe had not been clinically proved to be effective in cancer care. Two years later, the US Government Accountability Office concluded after a lengthy investigation that DTC genetic tests provide “misleading” results – a situation that is complicated further by “deceptive marketing.”

What 23andMe offered was a $99 test for some 250 genetically linked conditions, based on a partial reading of single nucleotide polymorphisms (SNPs) – points where individuals’ genomes vary by a single DNA base pair. Given that DTC genetic tests target only a fraction of the human genome’s three billion markers, and various companies sample different SNPs, different tests may return disparate results for the same customer, who might then make serious medical decisions based on inaccurate information. In this context, it is unsurprising that even those who oppose regulation in general see the need for it in the case of DTC testing.

But the issue goes beyond inaccuracy. As the journalist Charles Seife has pointed out, the retail genetics test is “meant to be a front end for a massive information-gathering operation against an unwitting public.” Indeed, when 23andMe customers log in to their accounts, they are invited to fill out surveys about their lifestyle, family background, and health, adding epidemiological value to the genetic data, which are then used by the firm’s research arm, 23andWe.

In providing this information, 23andMe’s customers are building a valuable “biobank” for the company. Given that the United States decided not to create a national biobank like the United Kingdom’s, owing to the high set-up costs – estimated at about a billion dollars – 23andMe may well view its growing biobank as saving it a massive outlay.

Some 60% of 23andMe customers have agreed to provide the requested information. Of course, one could argue that they are not “an unwitting public,” but altruistic volunteers – so generous, in fact, that, beyond paying for the service, they hand over the value of their subsequent labor in providing epidemiological data.

Tom Sawyer pulled off that business strategy when he persuaded his friends to pay him for the privilege of taking over his hated chore, painting the picket fence. But most scientific research subjects either receive compensation or contribute their time free of charge; they do not pay to participate.

Casey’s lawsuit has latched onto this practice, which her attorney calls, “a very thinly disguised way of getting people to pay [23andMe] to build a DNA database.” Even 23andMe admits that the biobank is the core of the company’s strategy. “The long game here is not to make money selling kits, although the kits are essential to get the base level data,” says board member Patrick Chung. “Once you have the data, [23andMe] does actually become the Google of personalized health care.”

That is exactly the problem, according to Seife. The massive corporate data store that Google has accumulated through all of our individual searches has become its most valuable asset. “By parceling out that information to help advertisers target you, with or without your consent, Google makes more than $10 billion every quarter,” Seife writes. In other words, users have become the product.

To be sure, the 23andMe Privacy Statement stipulates that it “uses Genetic and Self-Reported Information [only] from users who have given consent.” But it also includes a less prominently displayed qualifier: “If you do not give consent…we may still use your Genetic and/or Self-Reported Information for R&D purposes.” This “may include disclosure of Aggregated Genetic and Self-Reported Information to third-party non-profit and/or commercial research partners who will not publish that information in a peer-reviewed scientific journal.”

In other words, commercial use of non-identifiable information is still allowed without donors’ explicit consent. That is what worries libertarians – and anyone else who is concerned about the expansion of online efforts to gather information about individuals.

As a 23andMe blog post by Wojcicki puts it, “Wikipedia, YouTube, and MySpace have all changed the world by empowering individuals to share information. We believe this same phenomenon can revolutionize health care.” But if individuals share information involuntarily, it is the company – not its customers – that will be empowered.

Donna Dickenson, Project Syndicate