Skip to Main Content

Another upload of genetic sequence data from the H5N1 bird flu outbreak in dairy cattle has exacerbated the scientific community’s frustration with the U.S. Department of Agriculture after the agency again failed to include basic information needed to track how the virus is changing as it spreads.

Like a large tranche of sequences that the USDA uploaded to a public database on April 21, this week’s data dump did not include information about where and when the sequenced samples were obtained from cows or other sequenced animals. All are simply labeled with “USA” and “2024.”

advertisement

A key goal of monitoring genetic sequences in an outbreak is to track the evolution of a spreading virus, in this case to see if transmission among a new mammalian species is leading to changes that could make H5N1 more transmissible to and among people. Without the equivalent of a time stamp on the individual sequences, that’s much more difficult to do, scientists told STAT.

“We know what was happening a month ago, but we don’t know what’s happening now. Or it’s less clear what’s happening now,” said Thomas Peacock, an influenza virologist at the Pirbright Institute, a British organization that focuses on controlling viral illnesses in animals.

Cows in 36 herds in nine states are known to have tested positive for the virus. But it is widely believed the outbreak, which may have begun late last year, is more widespread than the number of confirmed outbreaks would suggest.

advertisement

In fact, the USDA said as much in a preprint the agency posted on BioRxiv on Wednesday. The paper, which has not yet been peer reviewed, is based on an analysis of sequence data from the outbreak. The authors suggest the spillover event that started the spread in cattle may have happened in early December. The first detection that something was amiss with some cattle herds in the Texas panhandle dates to late January, but it took until March 25 before USDA confirmed the presence of H5N1 in a Texas herd.

In the paper, the authors say they have posted the sequence data they used online. A link in the article did not initially lead to a cache of data and to the supplementary materials  — additional charts and figures that flesh out a paper — but that was later fixed.

In scientific publishing, researchers often try to hold on to their sequence data until they can get a paper published, for fear of being scooped by other scientists. But during public health emergencies, there is heavy pressure to share data as it becomes available, because to withhold it until publication can hamstring good decision making.

“Really grateful to this research team for sharing this, though I hope they weren’t holding on to the data solely to ensure they published first,” Angela Rasmussen, a virologist who studies emerging zoonotic pathogens — disease threats that jump from animals to humans — posted on Twitter on Thursday. Rasmussen, who is among those who have been frustrated at the USDA’s data sharing approach, works at the Vaccine and Infectious Disease Organization at the University of Saskatchewan, in Saskatoon, Canada.

Many of the 87 new sequences that were uploaded to the database of the National Center for Biotechnology Information — run by the National Institutes of Health’s National Library of Medicine — are from samples retrieved from poultry and wild birds, and may not pertain to the dairy cow outbreak. But 10 of the new viral sequences are from cattle, two more are from cats, and another is from a pigeon. These sequences are all believed to be part of the outbreak.

The fact that basic information — called metadata — isn’t being shared about the samples “hinders our efforts a lot,” said Gytis Dudas, a senior researcher in genomic epidemiology and metagenomics at the Vilnius University Life Sciences Center in Lithuania. Dudas is working with a group of U.S. and international researchers to try to make sense of what the genetic sequences say about the H5N1 outbreak in cows.

A number of scientists have openly questioned whether the USDA is deliberately withholding these data, or even removing more specific information.

“I can’t imagine that they’d be getting these samples, running the sequences, and not somehow recording that data for themselves, for what state it came from and what date it was sampled. That’s really extremely basic data,” said Rasmussen.

A USDA spokesman denied that the department is taking metadata off the sequence files before uploading them. In an email exchange with STAT, he said samples it receives contain only laboratory information numbers when they are sequenced. “Metadata is added by [Animal and Plant Health Inspection Service] staff after the sequencing occurs,” he said. “APHIS adds ‘USA’ and ‘2024’ as metadata tags and posts the sequences as they become available, in order to expedite public access to sequence data.”

The department has committed to sharing raw sequence data as quickly as it is available and has said it will upload what are called “consensus sequences” in an internationally used database, GISAID — the Global Initiative on Sharing All Influenza Data — when they are ready. Consensus sequences are more thoroughly edited and contain the metadata scientists are seeking.

It’s not just academic scientists who are seeking it, Peacock said, noting international public health agencies that are trying to assess the risk the U.S. outbreak poses are keen to get more data too. “They’re just being much more quiet about it. But you know they’re all requesting this and not getting it as well, as far as I’m aware.”

The USDA has only posted consensus sequences to GISAID from this outbreak once, in late March. It’s clear, though, that they have many more than they have shared to date. At an online symposium last week, Rosemary Sifford, the USDA’s chief veterinary officer, showed a phylogenetic tree featuring dozens of sequences, using the figure to explain that the department believes the outbreaks across the country are all linked and began from one spillover of the virus from wild birds to cows, likely in Texas.

USDA Chief Veterinary Officer Rosemary Sifford presented a phylogenetic tree of H5N1 viruses from the dairy cow outbreak during a recent online symposium. Screen caption via Astho

A phylogenetic tree is like a family tree of a virus, showing how it is changing over time, but also providing a sense of when the virus spilled over from wild birds into cattle. The genetic sequence data available so far suggest that it occurred in late 2023 or early 2024.

The sequences featured in the phylogenetic tree in Sifford’s presentation would have been consensus sequences, Peacock said. “It does suggest they have them and they’re just not uploading them.”

The group of scientists Peacock, Dudas, and Rasmussen are part of quickly went through the sequences on the slide Sifford showed, harvesting from it the metadata the USDA has to date failed to provide. “That was less than ideal,” Dudas said.

This story has been updated with information about the USDA preprint, and to reflect the fact that additional data promised in the paper is now online.

To submit a correction request, please visit our Contact Us page.