Elisabeth Bik, a microbiologist by training, has become one of the world’s most influential science detectives. An authority on scientific image analysis who’s been profiled in The New Yorker for her unique ability to spot duplicated or doctored photographs, she appeared frequently in the news over the past year as one of the experts who raised research misconduct concerns that led to an investigation into, and the eventual departure of, former Stanford president Marc Tessier-Lavigne.
Bik first became interested in plagiarism as a hobby while working as a researcher at Stanford University in 2013. She later began specializing in image duplication specifically, which she believes is a more serious problem for science as a whole.
To plagiarize or be plagiarized is bad for scientists, “but it doesn’t necessarily bring a new or false narrative into science,” Bik told STAT. But “if a scientist photoshopped something, or has two images that overlap, but presents them with two different experiments, that is actually cheating.”
Bik made the decision to become a full-time science sleuth in 2019. STAT caught up with Bik, who was selected as a member of the 2024 STATUS List, while she was traveling in Taiwan to talk about her talent for spotting patterns, the impact of artificial intelligence on her work, and why she thinks journals and institutions are still too slow to address research misconduct.
This interview has been edited for clarity and brevity.
After the investigation into the Stanford president Marc Tessier-Lavigne, your work has definitely come into the mainstream. Are there any other really high-profile things that you’ve been looking at since then?
I’ve been working with [investigative journalist] Charles Piller and some other sleuths in discovering some cases in fraud in the Alzheimer’s space. And so Marc Tessier-Lavigne sort of fell under that. But we worked on a case: [University of Southern California neuroscientist] Berislav Zlokovic. Charles Piller wrote about it in Science [in November].
That is sort of a big case, because this is a big lab with lots of money. This researcher works in Alzheimer’s, but also on stroke. And there was a clinical trial that he was getting involved in, a drug that was a result of his research. [The National Institutes of Health] halted the clinical trial because of his articles.
That is a pretty big and very immediate action. I don’t think that has happened very, very frequently that because of these misconduct investigations, a clinical trial gets paused. So that was one of the consequences of this research.
In this case, dozens of papers co-authored by Zlokovic had doctored evidence — images and data — supporting the idea that a compound he studied, 3K3A-APC, could benefit stroke patients. This is a clear example of how this kind of erroneous data can have an impact on people. Is there any way these drugs might get through?
In general, it’s hard to say, because a drug might still work, even though the people might have cheated in the lab. I’m not ruling that out. I think the chances that the drug will work are low if there was obvious cheating, looking at images that have been published. But it’s just hard to know, hard to predict.
In the beginning, you were looking at plagiarism in general. What made you want to focus directly on images?
Because once I found the first case of image duplication that I found myself, I just thought that was more serious for science as a whole. I felt plagiarism is bad for scientists, or to be plagiarized, but it doesn’t necessarily bring a new or false narrative into science. Well, if a scientist photoshopped something, or has two images that overlap, but presents them with two different experiments, that is actually cheating. And so now, those scientists then would present results as if they happened, but they didn’t happen. They were falsified or even fabricated.
Of the three forms of misconduct — which are plagiarism, falsification, and fabrication — I feel plagiarism is the least bad; it’s not as bad as falsification or fabrication. And so as soon as I found images that appear to have been duplicated and reused to represent different experiments, I felt this is much worse for science. And I have, apparently, a talent to recognize them.
Speaking of your talent, has anyone wondered how your brain works?
So I was profiled for The New Yorker, by Ingfei Chen. I was tested by [Jeremy Wilmer, a psychology researcher at Wellesley University] who had a lot of online tests. I did a lot of online tests that were designed by this person to test people’s ability to spot patterns or to recognize faces, and I turned out to be pretty bad in recognizing faces. It takes me much longer than the average person to remember faces at a conference. I feel very miserable, because I have no idea who they are.
I just don’t have that brain module, but I am good at pattern recognition and 3-D, spatial orientation. Most people see what I see, once I point it out. I think it’s a combination: I have perhaps a little bit better than average talents for spotting patterns. But I also am crazy enough to do this as a hobby, too.
When I still worked at Stanford, I scanned 20,000 papers to have an idea of how often we see these duplications. And that was when I was still full-time employed. Now I just do this full time. But back then, it was just in the evenings or on the weekends that I did that. I don’t think a lot of people would have scanned 20,000 papers, just to have an idea how often a particular phenomenon happens.
At this point, how many papers have you analyzed?
Oh, at some point, it was over 100,000. It’s hard to know because I don’t keep track of that anymore. I know how many [papers with problems] I’ve found — around 8,000. Some of those have plagiarism or other problems, such as animal ethics or lack of ethical approval, most are images. If we assume roughly one in 25 papers has an image problem, and I found around 8,000 — and this is a very rough calculation — I would have screened roughly 200,000 papers.
Is this how you imagined what your life would look?
I would rather just do image duplication searches, because I really enjoy the deep focus that I can get in a day, if I just do it for hours and hours in a row. I don’t mind doing that. But I also think it’s important to give talks because it’s important to share my frustrations about the lack of response sometimes, from scientific journals and institutions. Now, having the chance to go to Taiwan and talk to and meet with lots of people and just hear a lot of different viewpoints, it’s just an amazing opportunity that I don’t want to miss.
What are you giving talks on in Taiwan?
It’s just my general talk about how I got to switch my career and do this, why I think it’s important, why I think misconduct is bad. But also, what can we do better — as scientific publishing, institutions, or researchers. This was mainly talking to people who were involved in teaching graduate students research integrity classes on how not to do science fraud. Also, I usually talk about ChatGPT and artificial intelligence; how that can, on one hand, find these problems, but on the other hand, create them as well, because generative AI can generate text, and also images that are completely fake and look fairly realistic.
I’m not originally from the U.S., I’m from the Netherlands. English is not my first language. And I share that people who speak English from birth have some advantage in writing a scientific paper in English, because English is the scientific universal language. It is hard if your English is not your first language, and is it then allowed to use ChatGPT, or some other AI language model, to help you rewrite your text? Of course, you have this thin line: When is it just rewriting your own text? When is it completely generating it from scratch?
From the perspective of the researchers, I can see that AI would definitely make their jobs easier. Would it make your job harder in trying to determine what exactly is a real image?
I don’t think I will be able to recognize a good AI-generated image anymore. We have found some images generated two, three, four years ago, which we believe were AI-generated. But this was by a paper mill, and I think they made the error of putting all these AI-generated Western blot bands on the same background. So because they all have the exact same background, we could recognize that pattern of noise. And we found 600 papers that we believe are AI-generated, but a more primitive form of AI.
But I think there’s probably a lot of papers being produced right now that we can no longer recognize as fake. We might have an idea, thinking, that’s probably a paper mill, but you also don’t want to falsely accuse anybody. So if there’s no real duplication or something that is obviously wrong, it’s just hard to really comment on that. You also don’t want to insult anybody saying, “Oh, your paper is fake.” You have to have some real proof that a paper is fake.
You are probably one of the most visible people doing this work, especially since you do use your full name and a lot of people use pseudonyms on PubPeer. Have you faced any danger?
I’ve been threatened to be sued several times. None of that actually ever happened. But at some point, my home address was published online, in one of those complaints that was filed against me. I will be worried, a little bit, that there will be a disgruntled author whose work I’ve criticized. And of course, there have been many of those. It only takes one mad person to do something harmful.
And I’ve had a lot of insults online. But so far, I’ve stayed, relatively, in the safe zone, but one of the professors at Harvard has now sued three whistleblowers, the Data Colada team. And that definitely gave me pause.
I know that you’ve experienced a lot of frustrations. Is there a percentage of papers that people like the journal editors are just not taking a look at?
Most of the papers, the journals are not taking action. I actually almost gave up on sending the emails to editors, because it’s so much work. If I investigate a bunch of papers from a set of researchers, let’s say I find 30 papers that have problems, those might have been published in 20 different journals. And now I have to track down [the email addresses of] editors of 20 different journals.
So my initial set, after five years, two-thirds were still not corrected or retracted. I think that number is slowly moving towards 50%. But that’s almost 10 years since I’ve reported them, and half of them are still not addressed. So that is just frustrating.
Is there anything you’ve seen that’s positive in what journal editors are doing to increase their scrutiny, or signs people are taking this more seriously?
Journals seem to be slowly starting to be convinced that they need to take action, but it’s still a very slow process. Institutions seem to be still lagging in how they address these cases; they seem to operate mostly in secrecy. I think with the Stanford president, that was a unique case. It was because of the writing of student journalists [at The Stanford Daily] that the whole case blew up and was actually then investigated by an outside committee. And I think so from my perspective, it seems to be working with journalists that moves these cases forward.
Is there anything that you think that would be important to touch on that I haven’t asked you yet?
I don’t do this to break people’s careers. I do this because I care about science. I feel that is also an important part of science, and there should be a little bit more of a career in it. I’m crowdfunded. Why isn’t that part of science also being funded?
I just think it’s wonderful that I got recognition by the STATUS List. It’s very helpful to see that this type of work is appreciated, perhaps not directly by the scientific community, but by other people who think that this work is important.
To submit a correction request, please visit our Contact Us page.