Elicit's limitations

To help you calibrate how much you can rely on Elicit, we’ll share some of the limitations you should be aware of as you use Elicit:

Limitations specific to Elicit

Elicit uses language models, which have only been around since 2019. While already useful, these early stage technologies are far from “Artificial general intelligence that takes away all of our jobs.”

For example, the models aren’t explicitly trained to be faithful to a body of text by default. We’ve had to customize the models to make sure their summaries or extractions are actually what is said in the abstract, and not what the model thinks is likely to be the case in general (sometimes called "hallucination"). While we’ve made a lot of progress and try hard to err on the side of Elicit saying nothing rather than saying something wrong, in some cases Elicit can miss the nuance of a paper or misunderstand what a number refers to.
Elicit is a very early stage tool and we launch things uncomfortably beta to iterate quickly with user feedback. It’s more helpful to think of Elicit-generated content as around 80-90% accurate, definitely not 100% accurate.
Other people have also helpfully shared thoughts on limitations [1, 2].

Limitations that apply to research or search tools in general

Elicit is only as good as the papers underlying it. While we think researchers are a very careful and rigorous group on average, there is research with questionable methodology and even fraud. Elicit does not yet know how to evaluate whether one paper is more trustworthy than another, except by giving you some imperfect heuristics like citation count, journal, critiques from other researchers who cited the paper, and certain methodological details (sample size, study type, etc.). We’re actively researching how best to help with quality evaluation but, today, Elicit summarizes the findings of a bad study just like it summarizes the findings of a good study.
In the same way that good research involves looking for evidence for and against various arguments, we recommend searching for papers presenting multiple sides of a position to avoid confirmation bias.
Elicit works better for some questions and domains than others. We eventually want to help with all domains and types of research but, to date, we’ve focused on empirical research (e.g. randomized controlled trials in social sciences or biomedicine) so that we can apply lessons from the systematic review discipline.

Other thoughts on limitations

This section is really way too short. We tried to share enough to make you not overrely on Elicit but this is not a comprehensive list of possible limitations.