Skip to Main Content

STAT is co-publishing this investigation by Undark.

They numbered 20 in all — 10 men and 10 women who came to a sprawling medical campus in downtown Buffalo, N.Y., to volunteer for what a news report had billed as “the world’s biggest science project.”

advertisement

It was the spring of 1997, and the Human Genome Project, an ambitious attempt to read and map a human genetic code in its entirety, was building momentum. The project’s scientists had refined techniques to read out the chemical sequences — the series of As, Cs, Ts, and Gs — that encode the building blocks of life. Now, the researchers just needed suitable human DNA to work with. More exactly, they needed DNA from ordinary people willing to have their genetic information published for the world to see. The volunteers who showed up at Buffalo’s Roswell Park Cancer Institute had come to answer the call.

To take part in the study was to assume risks that were hard to calculate or predict. If the volunteers were publicly outed, project scientists told them, they might be contacted by the media or by critics of genetic research — of whom there were many. If the published sequences revealed a worrisome genetic condition that could be tied back to the volunteers, they might face discrimination from potential employers or insurers. And it was impossible to know how future scientists might use or abuse genetic information. No one’s genome had ever been sequenced before.

But the volunteers were also informed that measures had been put in place to protect them: They would remain anonymous, and to minimize the chances that any one of them could be identified based on their unique genetic sequence, the published genome would be a patchwork, derived not from one person but stitched together from the DNA of a large number of volunteers. “If we use the blood you donate” to prepare DNA samples, the consent form read, “we expect that no more than 10% of the eventual DNA sequence will have been obtained from your DNA.”

Soon, however, those assurances began to wither. When a much-celebrated working draft of the human genome was published in 2001, the vast majority of it — nearly 75 percent — came from just one Roswell Park volunteer, an anonymous male donor known as RP11.

advertisement

A page from the Roswell Park Cancer Institute consent form that was signed by RP11 and other DNA donors. It conveyed an expectation that no more than 10 percent of the published genome sequence would come from any donor. Undark

To this day, the story of how and why RP11 came to be the centerpiece of one of biology’s crowning achievements has largely escaped public scrutiny. Even the scientists who helped orchestrate it disagree about the particulars.

To piece the story together, Undark reviewed more than 100 emails, letters, and other digital documents housed within the History of Genomics Archive at the National Human Genome Research Institute. The documents, provided to Undark through an institutional research collaboration agreement, reveal that the project’s sourcing of human genetic material was more ethically fraught than official publications portrayed it to be, and included DNA harvested from a cadaver, and from one of the project’s own scientists. The records, along with interviews with many of the project’s central figures and with experts in law and bioethics, paint a picture in which high-ranking project officials — constrained by their own experimental protocols and accelerated timelines — veered from their guiding principles and pushed the boundaries of informed consent.

“We were panicking,” recalled Aristides Patrinos, who led the Department of Energy’s efforts in the Human Genome Project and, along with National Human Genome Research Institute director Francis Collins, helped steer the project to completion. “So a lot of these issues were not front and center. That’s no excuse, but it was a reason. We were under a lot of pressure to make sure we finished by the time we finished.”

The revelations potentially cast a stain on a project that had been extolled for its high ethical standards. “It’s a big deal when researchers act deceptively, which is to say they do things that they said they weren’t going to do, or don’t do things that they said they were,” said Paul Appelbaum, a Columbia University professor who specializes in legal and ethical issues in medicine, psychiatry, and genetics. “It has the potential to negatively impact the research enterprise in general, and the benefits that can potentially come from it.”

To the extent that an injustice was done, it has propagated far and wide. The genetic sequence that emerged from the Human Genome Project continues to serve as a cornerstone resource of modern biology — as a so-called reference genome, used ubiquitously by clinicians and researchers to identify genetic variants, sequence new genomes, and aid tests that determine patients’ genetic risks. Although the reference genome has undergone several refinements and incorporated new genetic material over the years, RP11 remains at the center of it all, with his DNA still constituting more than 70 percent of the most recent versions.

RP11 is likely unaware that his DNA played, and continues to play, such a pivotal role in the march of genetic science. Project leaders, hamstrung, they say, by a decades-old ethics panel decision, have never attempted to inform him.

“Well, I think at this point, it probably would be a good idea to come out in the open and tell everybody what happened,” said Patrinos. “And give as many specifics as possible.”

The display of a DNA sequencer shows the progress of a sequencing run. NIH/NHGRI
A researcher checks a computer readout from a sequencing machine while holding a Sanger sequencing chromatogram during the Human Genome Project.
A researcher checks a DNA sequencing readout during the Human Genome Project. NIH/NHGRI

The Human Genome Project is often compared to the achievement of putting humans on the moon. Launched in 1990 by the Department of Energy and the National Institutes of Health, the project took 13 years and, at the time, around $3 billion to complete. By 2000, scientists had sequenced around 85 percent of the genome, and the milestone was marked with a White House ceremony. President Bill Clinton described it as “more than just an epic-making triumph of science and reason.” U.K. Prime Minister Tony Blair, who joined by satellite, called it the kind of breakthrough that “takes humankind across a frontier and into a new era.”

But in 1996, the project was at a crossroads. Francis Collins, then the director of NIH’s National Center for Human Genome Research — later renamed the National Human Genome Research Institute, or NHGRI — was leading the international consortium of laboratories tasked with completing the sequence. Still in his mid-40s, the physician’s star was rising. He had succeeded Nobel laureate James Watson years earlier as the center’s director, and Barack Obama would later appoint him to the helm of NIH, the world’s largest public funder of biomedical and behavioral research. People who worked with him described him as a brilliant mind and a great communicator — a passionate leader with legendary powers of persuasion.

Collins needed all of those qualities to manage the first sequencing of a human genome. It was a staggeringly complex operation. First, the entirety of a person’s DNA — a molecular sequence of more than 3 billion pairs of nucleotide bases, typically represented as As, Cs, Ts, and Gs — had to be broken into fragments roughly 100,000 to 200,000 base pairs long. The fragments were then isolated and cloned, typically by specially preparing each one and inserting it into a bacterium, which copied the fragment as it reproduced. In this way, the team’s scientists could make a physical copy of a person’s full, albeit fragmented, genome — known as a clone library.

Identical clone libraries could then be shipped to different laboratories around the world, allowing many research groups to read the fragments, and piece the sequences back together, in parallel. In a way, it was like distributing sets of the same, extraordinarily difficult jigsaw puzzle to a lineup of the world’s best puzzle solvers: They could work on different sections of the puzzle simultaneously and, if need be, check each other’s work.

By 1996, clone libraries were already being distributed to a variety of labs. But that spring, project members learned that several of the libraries had been constructed without any informed consent process and with no oversight from institutional review boards, or IRBs — bodies that, according to federal policy, should have ethical purview over research with human subjects. Rumors swirled that some of the DNA had come from scientists involved with the project, a scenario that project members speculated could raise ethical questions about consent and invite charges of elitism. Internal project correspondence and tissue bank donation records reviewed by Undark suggest that another DNA source was the cadaver of a 19-year-old who had died by suicide; the family had donated the body to science but had not specifically consented to its use in the Human Genome Project.

It bothered Collins that at least one donor’s identity was known to project scientists, and that the donor was aware his DNA was being used to create a library. “It sounds as if the donor knows who he is,” he wrote in an email that March, after being briefed on a clone library that had been constructed at the California Institute of Technology. “That’s not the way it should have been done.”

In the wake of the revelation, Collins and Patrinos consulted an array of advisers and came up with a new plan, outlined in a joint guidance. They would find new donors and make new clone libraries, under new protocols. Unlike the old libraries, the new ones would be obtained through a double-blind procedure: Scientists involved with the project would not know the identities of the donors, and donors wouldn’t know for certain whether their DNA was being used in the project. According to internal correspondence and interviews, project leadership was concerned not only about the genetic privacy of the donors, but also about the possibility that a donor might trumpet their role to the media and create a spectacle.

“It seemed like it would create a major distraction from what we wanted to generate,” recalled Robert Waterston, who headed one of the five centers that did the majority of the sequencing for the project.

“We wanted the human genome,” he added — meaning a reference that everyone could relate to. “It’s not Joe Blow’s genome. It’s your genome. It’s my genome. It’s representative of everybody’s genome.”

To further protect the two-way confidentiality, the completed representation of the human genome would be a mosaic, assembled from the DNA of not one but multiple donors. The thinking, among the project’s inner circle, was that a mosaic would not only complicate attempts to identify donors based on the genetic sequence but also reduce the incentive for wanting to know the donors’ identities to begin with. If a donor’s identity did come to light, limiting their contributions might minimize their exposure to potential harms — and deter them from attempting to claim property or ownership rights over the published sequence.

In a June 1996 email that appears to have been written by Melvin Simon, who led a cloning operation at Caltech, the scientist told Human Genome Project leadership, including Patrinos, that, as he understood it, no matter what waiver a volunteer is willing to sign, he or she would not lose ownership or property rights. “Thus only by a true patchwork or anonymizing approach can it be made extremely difficult to claim such rights,” the email read. (Simon confirmed the sentiment behind the email in an interview with Undark.)

Simon’s Caltech team and a laboratory at the Roswell Park Cancer Institute were each commissioned to create new clone libraries under the new protocols. Soon, however, the plans for a mosaic genome would veer off course, and the Human Genome Project would find itself in a consent conundrum — with one person, RP11, caught in the middle.

Pieter De Jong at his home on Wednesday, July 3, 2024, in Redmond, Wash.
Pieter De Jong, who led the Roswell Park work on the Human Genome Project, at his home in Redmond, Wash., in July 2024. Jovelle Tamayo for STAT

Pieter de Jong, who led the cloning project at the Roswell Park Cancer Institute, had been behind some of the problematic libraries that had sparked Collins’ consternation in the spring of 1996. But he had a long history with the project, and he was a foremost expert at DNA cloning. So when the Human Genome Project enacted its new plan, they commissioned him to build at least five new libraries, de Jong recalled to Undark.

This time, de Jong used a lottery-like process to select donors. On March 23, 1997, he ran an advertisement in the Buffalo News seeking 20 volunteers. The edition also featured a front-page story about the project, which de Jong says he helped arrange. In the weeks that followed, the volunteers each came in, met with a genetic counselor, signed a consent form, and donated a few tablespoons of blood. The genetic counselor labeled each blood sample with a number, but created no records linking the samples to their donors.

Clip from a story on the Human Genome Project in The Buffalo News in March 1997
Clipping of a story on the Human Genome Project in The Buffalo News in March 1997.

The 20 samples were then transferred to de Jong, who chose two at random — one male and one female — to use for clone libraries. The only personal information the facility retained were the names and signatures on the consent forms, which were sealed in envelopes and stored in a locked file cabinet. As a result, it would be virtually impossible for anyone at Roswell Park to determine who the two donors were.

A postdoctoral researcher, Kazutoyo Osoegawa, did most of the work building the first library. Osoegawa was skillful, de Jong recalled, with a knack for coaxing large fragments of DNA from a sample for cloning: The larger the fragments, the more easily scientists could map them for sequencing, and the fewer fragments overall they would have to sequence to finish the job.

By August of 1997, de Jong, Osoegawa and their colleagues had begun distributing the first of the new Roswell Park clone libraries, RP11, and it was a good one — with enough fragments for scientists to be fairly certain that they spanned essentially the entire genome, with few missing gaps. A second library was in the works, with more to follow. But, before those libraries could materialize, the Human Genome Project’s plans took a turn.

On the evening of Sept. 20, 1998, Francis Collins emailed NHGRI brass, including Jane Peterson, a program director involved with the sequencing effort, and Mark Guyer, the institute’s assistant director for scientific coordination, about an unhappy circumstance. “I have been feeling uneasy about the RPC11 library ever since Jane uncovered the language that Pieter de Jong used for the consent form,” he wrote. (The RP11 library was often referred to as RPC11 or RPCI-11 in correspondence.)

The specific language that unsettled Collins was the passage conveying that no more than 10 percent of the genetic sequence was expected to come from their DNA. And it was resurfacing at an inopportune moment.

In this September 1998 email, Francis Collins wrote about his uneasiness with the the 10 percent language used in donor consent forms at Roswell Park and, indicating a desire to go beyond that limit, asked: “how far can we push this?” Undark

The Human Genome Project was in the midst of what Maynard Olson, who led one of the project’s sequencing labs, described in an email that September as a “de facto drift away from the concept of a genome sequence that is a mosaic of contributions from many individuals.” When de Jong crafted the consent language, he was under the impression that 10 new clone libraries would be built and integrated into the completed genome. But now project leaders were lurching toward a strategy that would draw most of the final sequence — between 60 and 90 percent — from a single clone library. And RP11 was their library of choice.

In his email to his NHGRI colleagues, Collins wrote that the document of general principles he and Patrinos had shared suggested an intent to include several donors but wasn’t specific about it, “nor does it put a ceiling on the amount of sequence that could come from a single person.”

The 10 percent language in the consent form worried him, however. Attempting to reconsent RP11 under new terms would be complicated: RP11 could have been any of the 10 male donors, and all the researchers had to go on were the names on the consent forms. The only way he could think to do it, he wrote, would require asking every volunteer if they objected to the raising of the 10 percent restriction — “and then holding our breath that none of them do.”

Technically, the word “expect” didn’t forbid using RP11 for more than 10 percent of the sequence, Collins wrote, “but how far can we push this?”

The next month, Collins joined a conference call with de Jong, Roswell Park IRB chair Harold Douglass, and other Roswell Park and NHGRI staff. According to handwritten notes, Collins told them that limiting use of the clone library to 10 percent would devastate the momentum of the project and that there were concerns about recontacting all 10 male donors. The notes indicate that Douglass mentioned the IRB would ask about the benefit of fast-tracking the project, and Collins said there was a medical reason: to “find as many genes ASAP to understand disease.” (Speaking to Undark, Collins confirmed his participation in the call. He said the notes, taken by a different participant, used phrasing he wouldn’t have used, but seemed correct.)

Days later, the Roswell Park IRB met and — according to a written summary that was shared with Guyer — “voted unanimously against any attempts to try to find and reconsent the ten donors.” Among the IRB’s stated justifications were that the expectation expressed to the donors was not a guarantee, and that attempting to reconsent the 10 male volunteers would be difficult and could jeopardize RP11’s anonymity. To delay the project by not expanding the use of RP11’s library, the panel added, would itself be unethical, given the number of people who stood to derive health benefits from the timely completion of the human genome. (Douglass declined to comment for this story.)

An archival photograph of an aerial view of the Roswell Park Cancer Institute in the late 1990s. (note: likely 1998)
An aerial view of the Roswell Park Cancer Institute in the late 1990s. Edwin A Mirand Library/Roswell Park Comprehensive Cancer Center

Recently, Collins spoke to Undark about RP11 and the Human Genome Project’s donor sourcing strategies. He was joined by Eric Green, who was also involved with the project and currently leads the National Human Genome Research Institute.

According to Collins and Green, project leaders did initially aim to construct 10 new clone libraries for use in the completed genome. But they soon realized it would be inefficient and chaotic to work with 10 libraries at once. “There would be lots of complexities that would come out by having too much blending going on,” Green said.

Collins explained that structural differences between individual genomes — such as large-scale insertions or deletions of genes — can make it difficult to stitch together an accurate sequence from two different human sources. If you go from one person to 10, he said, “and then you try to fit the whole thing together, it’s going to be potentially much more error-prone.”

It was primarily those technical challenges, Collins and Green said recently, that prompted the decision to derive most of the genome from a single donor. And RP11 — with its well-sized fragments and comprehensive coverage of the genome — stood out from the other libraries as the ideal one to work with, they said. Also, Green added, RP11 at the time was further along than any of the other new libraries in the process of being characterized and prepared for sequencing.

But Collins’ and Green’s recollections diverge in key ways from those of other scientists involved in the Human Genome Project. Robert Waterston, for instance, who was among the small circle of researchers who guided project strategy, recalls that the complexities of blending clone libraries were only a minor consideration. Yes, structural differences in DNA could complicate the task of meshing one person’s genetic sequence with another’s, he said, but only in certain regions of the genome, such as those marked by repeat sequences that differ in number and complexity from one person to the next.

The bigger factor, said Waterston, was time. And the Human Genome Project was pressed for time, he said, thanks to a man named J. Craig Venter.

In May 1998, the scientist Venter — whose nonprofit Institute for Genomic Research had done pilot work for the Human Genome Project — launched a venture built to rival the publicly funded initiative. That June, Venter and his colleagues pledged in a Science article that they would sequence a human genome by 2001 — years ahead of the Human Genome Project’s 2005 target deadline — and at a fraction of the cost. The enterprise, known as Celera Genomics Group, set up shop in Rockville, Maryland, just miles from NHGRI’s Bethesda headquarters.

Correspondence from that time suggests the news lit a fire under the Human Genome Project. “Obviously there would be significant political advantages to getting something out a year earlier than Venter is proposing, provided we can defend its utility,” wrote Phil Green, an investigator at the University of Washington’s sequencing center, in an email that was shared with Collins shortly after word of Venter’s plans began to spread.

Francis Collins (NHGRI) and Craig Venter (Celera Genomics) at a press conference for the publications describing the initial analyses of the human genome sequence.
Francis Collins and Craig Venter at a press conference announcing the publications describing the initial analyses of the human genome sequence. NIH/NHGRI

Project members worried about the implications of a commercial enterprise owning, and possibly monetizing, the first human genome. For some of them, competition itself — and the specter of a stinging defeat — seemed to be motivation enough. In an email that September, NHGRI’s Peterson described Eric Lander — who led the Whitehead/MIT Center for Genome Research, one of the five large centers that sequenced the majority of the genome — as having called her “in a very depressed mood.” Lander believed Venter would have a draft of the human genome “done before next summer and will take continual pot shots at us,” Peterson wrote. (Lee McGuire, chief communication officer at the Broad Institute, where Eric Lander is a member and founding director, told Undark that Lander was unavailable to be interviewed for this story.)

In a move that was widely reported in the media as being prompted by the Celera announcement, Collins announced that September that the Human Genome Project would aim to finish its genome two years earlier than planned, by 2003, and release a working draft by 2001.

“We came into this crush with Celera, and everything just had to get done as quickly as possible,” recalled Waterston. The complement of libraries they’d envisioned wasn’t ready yet, and it would’ve taken time to make and distribute them, he said. They had to work with what they had, and what they had was RP11.

“There just wasn’t an alternative,” Waterston recalled. “We didn’t have a second library to go to.”

Marco Marra and John McPherson — who along with Waterston did much of the preliminary characterization of clone libraries at Washington University — similarly remember that it was the dearth of available libraries, more than the challenge of blending them together, that led the project to focus on a single donor.

That aligns with de Jong’s recollection. RP11 was a good library, he told Undark, but so were subsequent libraries he built. The problem was that there was no time to wait. (De Jong shared records with Undark indicating that his lab had not yet completed the second of its planned new libraries by September 1998, when the issues around RP11’s consent language arose; it is unclear whether the Caltech laboratory had completed and distributed the first of its planned new libraries to sequencing centers by that time, but Waterston recalls they hadn’t.)

Although de Jong said he was not heavily involved in discussions of sequencing strategy, he thinks it began to dawn on the scientists how much additional work, and money, would be required to prepare and sequence 10 libraries, rather than one or two. “They couldn’t potentially keep up the same speed as Venter with his commercial effort if they would have stayed with the original plan,” said de Jong. “So I think it was mostly because they didn’t want to lose the race.”

Other members of the Human Genome Project who spoke with Undark expressed similar sentiments, including one of its highest-ranking figures. “We got pretty panicky that we were going to lose this,” Patrinos said of the competition with Celera. “So at that time, we had to follow paths that would get us to the conclusion as fast as possible.”

Asked if he felt Celera contributed to a sense of urgency at that time, Collins told Undark he didn’t recall that being a factor — that the rush, instead, was to get the job done to provide benefits for understanding health and disease. In a follow-up call, Collins clarified: “I think Celera’s intentions to produce a for-profit human genome sequence was an issue that everybody was fully aware of, so that was in the air, if you will.” But he said “it was not the driving factor at all” in the decision to move as quickly as possible to obtain a complete public sequence.

In any case, on Oct. 27, 1998 — five months after Venter launched his rival to the Human Genome Project, a month and a half after the project gave itself a new, ambitious deadline, weeks after Collins’ concerned email about RP11’s consent language, and days after Collins’ conference call with the chair of the Roswell Park IRB — the ethics panel gave Collins and his team carte blanche to dramatically expand the use of RP11’s DNA, without telling any of the Roswell Park donors about the change.

That same month Simon and collaborator Hiroaki Shizuya — having finished their first Caltech library under the new donor protection protocols — told the DOE’s Marvin Frazier that although the group had genetic material in hand to begin a second library, they had been “informed that there was no longer a great deal of interest” in new libraries, and they were instead moving on to new research pursuits.

Archival correspondence suggests the turn of events didn’t sit well with all of the lead scientists involved in the project. “I was deeply distressed to have the director of a major genome center already start building the case that the informed-consent form for DNA used to build RPC-11 did not really mean what it said,” wrote Olson in a November 1998 email to Collins and his University of Washington colleague Phil Green. The ethical, legal, and social issues related to the library sourcing will not go away, he predicted.

Speaking to Undark, Olson said he does not recall which consent language, or which director, he was referring to in his email. But he remembers there being tension between the ethicists and technical experts involved with the project. Some of the ethicists resented the idea that technical considerations should factor into discussions, he said, and “a lot of the more technically well-informed participants in the project just actually weren’t terribly interested” in the ethics issues.

Dr. Aristides (Ari) Patrinos at his home office in Gaithersburg, Maryland on June 5, 2024.
Aristides Patrinos, who led the Department of Energy’s efforts in the Human Genome Project, at his home office in Gaithersburg, Md. Valerie Plesch for Undark

Undark invited several biomedical ethicists and legal experts to review the Roswell Park consent form and the IRB’s ruling on RP11. Their responses called into question many of the justifications the ethics panel gave for its decision.

“The big deal is that the 10% is not just a minor aspect of the consent form,” wrote Hank Greely, a Stanford University Professor who works on ethical, legal, and social issues in the biosciences, in an email to Undark. Rather, he noted, it “is a substantial part of the argument about confidentiality.” Greely said that he didn’t find any of the panel’s justifications convincing. He doesn’t think the IRB acted nefariously, but he said that he would not have so hastily dismissed the possibility of attempting to reconsent the volunteers, and that doing so wouldn’t necessarily have heightened the risks to the donor. “We’ve got these 10 names. Let’s see if they’re in the phone book,” he said, later adding, “let’s see how locatable they are.”

Jonathan Moreno, a professor of medical ethics and health policy at the University of Pennsylvania who declined the offer to review documents but was briefed by Undark on the IRB decision, agreed that the volunteers should have been reconsented.

Appelbaum, the Columbia University legal and ethics specialist, was one of several experts who took issue with the panel’s interpretation of the 10 percent expectation. “I think a reasonable person would take away from that that the intent of the research team was to use no more than 10 percent of his or her genome in the project,” he said. “And so playing with words in that way, I think, is really not appropriate in this context.”

A 1998 meeting summary details the Roswell Park institutional review board’s unanimous vote “against any attempts to try to find and reconsent” DNA donors. Undark

Appelbaum also thought it was odd for Collins, representing a sponsoring agency, to meet directly with an IRB chair on an ethical issue related to work the agency was sponsoring. There is a risk, he said, of exerting undue influence on the oversight process. Bruce Gordon, the assistant vice chancellor for regulatory affairs at the University of Nebraska Medical Center, told Undark that, generally speaking, “the best practice would be that funders shouldn’t be interacting with the IRB under any circumstance,” though he described it as an unspoken rule, and not a strict standard.

Collins said he agreed the conference call was an unusual step, but that the significance of the situation justified it. “I counted on the IRB to do what they always do,” he said, which is “to step back and take up a purely objective view of an ethical question and render their best opinion. I do not believe I put pressure on them at all.”

Although ethicists and legal experts who spoke to Undark raised questions about the rationale of the IRB’s ruling, many said it was unlikely that RP11 had suffered concrete harms as a result — a point also expressed by Collins and other key figures from the Human Genome Project. Protections enacted in the U.S. since the completion of the Human Genome Project make it illegal for employers or health insurers to discriminate based on a person’s genetic information. And experts say that without a matching DNA sample, it remains difficult to identify a person based solely on a genetic sequence. With a matching sample, however, it would be straightforward to identify the donor, whether their contribution was 70% or 7%.

“I think it’s fair to say RP11 was probably misled about what was going to happen,” said R. Alta Charo, a professor emerita of law and bioethics at the University of Wisconsin – Madison. (Like Moreno, Charo declined the offer to review documents, but was briefed by Undark on the IRB decision.) The real question, however, said Charo, is whether the decision made him more identifiable, whether it exposed him to more risk. “I don’t know how to answer that question.”

Appelbaum said it may be true that RP11’s risks weren’t substantially heightened by the decision to expand the use of his genetic sequence. “But it seems to me that that’s different from saying that the action wasn’t consequential,” he said, “in the sense that it can be highly consequential, I think, for the research enterprise in this country to make promises to people in signed consent forms, and then violate those promises.”

Appelbaum described the episode as illustrative of a long history of deceptions that have contributed to a lack of trust in the research enterprise, especially in minoritized communities. “One of the big issues in human subjects research, which has assumed even greater salience in genomic research, has been the issue of trust,” he said. “If I agree to be in your project, are you leveling with me about what’s going to happen to me? And if I agree to donate blood, or some other tissue sample, are you telling me the truth about how it’s going to be used?”

President Bill Clinton, J Craig Venter (L) and Dr. Francis Collins of the National Institute of Health look at the audiance during an in the East Room of the White House, June 26, 2000.
President Bill Clinton, Craig Venter (left) and Francis Collins (right) during a ceremony in the East Room of the White House on June 26, 2000, to mark the completion of the first rough map of the human genome. Mark Wilson/Newsmakers via Getty Images

The June 2000 White House ceremony that marked the Human Genome Project’s sequencing milestone was a joint ceremony: At the presidential lectern that day, President Clinton was flanked on one side by Francis Collins and on the other by Craig Venter, whose Celera team was also nearing the finish line.

The following winter, the two teams each published landmark genome papers, with the Human Genome Project’s report on its draft genome sequence officially appearing in the Feb. 15 issue of the prestigious journal Nature, and Celera’s sequencing results appearing in the rival journal Science one day later.

Celera reported that its genome had been assembled from five unnamed donors, one of whom — the majority donor — Venter later revealed was himself.

Meanwhile, the Human Genome Project was circumspect about the donors behind its published sequence. A table in the Nature paper listed eight clone libraries that were described as having contributed the bulk of the sequence. Among them was RP11, which the table noted accounted for just over 74 percent of the draft genome. The other seven each contributed between 1.6 and 4.3 percent of the total. Additional libraries, neither named nor tallied in the paper, collectively accounted for the remaining 8.4 percent of the sequence.

The paper described the libraries as originating from anonymous DNA donors, according to a lottery-like process like the one used at Roswell Park. What was left unsaid — but what consent documents, internal memos, and other records reviewed by Undark reveal — is that six of the eight named libraries were the same ones that had raised ethics concerns early in the project: the library sourced from the 19-year-old cadaver; the libraries suspected to have been built with the DNA of project scientists; the libraries whose donors were known to project researchers. Collins and Patrinos had agreed in 1996 to let scientists use those libraries, provided the donors were properly consented, protocols were cleared by IRBs, and the libraries contributed minimally to the final sequence. (Caltech’s Simon told Undark that it was a lab technician’s husband — and not a postdoc, as had been rumored — who produced the sperm from which one of his early libraries was built.)

Also left unsaid was that four of the eight libraries had all been derived from the same donor.

Collins and NHGRI director Green could not confirm to Undark how many, if any, of the libraries outside of the top eight had been approved by IRBs. Collins also said he did not know if the family of the 19-year-old tissue donor had been reconsented in accordance with the 1996 guidelines.

Asked if he feels the project should have been more forthright in the 2001 paper about the sourcing of DNA donors, Collins said “it’s always good in hindsight to be transparent and forthright in every way. To be honest though, I don’t think in my view, that this was such a major substantial issue that it would have required a deep debate about exactly how to put that forward.” He added, “I don’t believe that individuals were significantly put at risk by the way in which this was laid out. And I hope that doesn’t get lost.”

To Appelbaum, however, the idea that the Human Genome Project’s landmark paper may have misrepresented donor procedures is gravely concerning — the kind of transgression that can erode public trust in science more broadly. Perhaps an argument could be made to defend the project’s DNA sourcing, Appelbaum said, “but I’m not sure there’s any argument on the other side about covering up what you did when you publish your results. I think you’ve got to be open about that.”

“If you made certain decisions along the way,” he said, “you describe the decisions you made and the justification for them.”

The culmination of the Human Genome Project was, in a way, the beginning of a long scientific afterlife for RP11’s genetic sequence. A 2010 study, published in the journal Science, analyzed the reference genome and concluded that RP11 was of mixed African and European genetic ancestry, and likely identified as Black or African American.

Perhaps most consequential, however, is that the sequence that emerged from the human genome project has evolved into a foundational resource of modern genetics. It has been revised and improved through the years, each new edition, or reference assembly, augmented with new annotations and fixes.

Deanna Church, who led an international collaboration that managed the reference assemblies in the years following the Human Genome Project’s completion, likens them to maps that give scientists a shared coordinate system for describing, comparing, and understanding genetic sequences. Researchers use them to interpret and identify fragments of DNA; clinicians and genetic testing companies use them as benchmarks to determine which genetic variants a person carries. The reference assembly that emerged from the Human Genome Project has become “the foundation for all genomic data and databases,” wrote the authors of a 2019 opinion piece in the journal Genome Biology.

And to this day, the most widely used reference assemblies continue to derive more than 70 percent of their sequence from a person who did not clearly consent to that level of use.

In recent years, Church and other experts have argued that it is time for a new reference model: The assemblies from the Human Genome Project do not adequately reflect the breadth of human genetic variation, they say. And although those reference assemblies are of exceptional quality by genome standards, a newer sequence, sourced from new DNA and known as the telomere-to-telomere assembly, is both more accurate and more comprehensive.

But a reference assembly’s usefulness stems in large part from the information, annotations, and standards that are built on top of it, and it will take time for scientists to duplicate that infrastructure for a new reference genome.

Leslie Biesecker, chief of the Center for Precision Health Research at the NHGRI, estimates it will be three to five years before the community transitions to a new reference. “There are so many pieces of machinery that need to be moved forward at the same time in order for that whole system to work.”

Stanford’s Greely, a lawyer by training, said it’s conceivable that were RP11 to learn of the outsized role his DNA played in genetic science, he might seek financial compensation. “Without wanting to get into the merits of the claims, it could play out kind of the way the Henrietta Lacks story has,” said Greely, referring to a Black woman who died of cervical cancer in 1951, and whose cells were harvested for science without her consent. (Lacks’ family members were recently awarded an undisclosed settlement from Thermo Fisher Scientific, over allegations the company unjustly profited from her cells.) “If I were NIH, I would worry — hey, if this guy knows, he might sue us or make trouble for us,” Greely said.

Documents suggest the architects of the Human Genome Project worried about just such a scenario: a clause in the original consent form used at Roswell Park asserted that, by signing, a donor waived their “rights to claim any part of conceivable profits resulting from research performed on the blood and products derived from the blood you donated.” But emails sent to NHGRI leadership in July 1997 indicate that when Department of Health and Human Services officials learned of the clause, they argued it ran afoul of a federal regulation that bars consent language that could be construed as a waiver of legal rights. Although RP11 had likely already signed the original version, the waiver was removed from the consent form by that August.

Pieter De Jong poses for a portrait in front of freezers containing DNA samples at his home on Wednesday, July 3, 2024, in Redmond, Wash
Pieter De Jong poses for a portrait in front of freezers containing DNA samples at his home. Jovelle Tamayo for STAT

These days, the trim beard Pieter de Jong wore during the days of the Human Genome Project has turned to gray. He now lives near Seattle, where he still runs a small clone library supply operation. This year, to free up space, he finally destroyed three of the five clone libraries he built for the Human Genome Project — two of which he says the project never used, and a third that was incorporated into the reference sequence only in the genome’s later revisions.

De Jong no longer knows the whereabouts of the 20 consent forms that were collected from the Roswell Park volunteers — the only known records that identify the participants by name. Although study protocols stipulated that Roswell Park staff would maintain a chain of custody for the forms, Annie Deck-Miller, director of public relations at the center, now known as the Roswell Park Comprehensive Cancer Center, told Undark in an email that the facility no longer possesses any forms related to de Jong’s study. In a subsequent emailed statement, representatives of Roswell Park indicated that documents related to the Human Genome Project were stored onsite “for a number of years, as required by federal regulations.” They declined to comment further, however, citing a lack of capacity “to engage in a review of decisions purported to have taken place in a confidential meeting conducted 26 years ago. Collins and Green say they have never attempted to notify Roswell Park donors about the change to the sequencing plan, and that the IRB decision does not permit them to.

There is, however, one Human Genome Project donor whose whereabouts de Jong knows precisely: the person behind the four clone libraries that accounted for more than 9 percent of the draft sequence.

De Jong recalls that he and a visiting collaborator created those libraries in the summer of 1993. They did it quickly — he was in a hurry to apply for grants “and get something going” — and he said there were few ethical guardrails to guide them. De Jong felt it would be inappropriate to solicit DNA from one of his lab workers, “so my collaborator — my visitor — and me, we exchanged, we both tossed up and we gave blood samples for the project.”

One of those samples yielded clone libraries that helped spark the 1996 panic over donors: libraries whose origins project leaders worried might leak to the press, de Jong said, but that nonetheless found their way into the world’s first human genome sequence.

“It ended up being me,” de Jong said, matter-of-factly. “The reference genome is maybe 80 percent or 75 percent RP11, and maybe 10 percent me.”

If you or someone you know donated to or otherwise participated in the Human Genome Project and you would like to share your story, Undark and STAT would like to hear from you. Contact us at [email protected].

Undark is a nonprofit, editorially independent digital magazine exploring the intersection of science and society.

To submit a correction request, please visit our Contact Us page.