Running Extract from PDFs in batches (Advanced)

Edited

Extract from PDFs can handle about 3000 answers at once (e.g. 100 papers X 30 columns, or 300 papers X 10 columns). If you're running a large data extraction and ned more answers than that, you'll need to run in batches. To run in batches:

  1. iterate on your columns using 3-10 papers

  2. save the columns once you're happy with them

  3. create a new notebook

  4. upload 100 papers (or whatever your batch size is, given your number of columns, to get to about 3000 answers total), and run "Extract data from PDFs" on those 100 papers

  5. Add your saved columns and wait for Elicit to provide the answers

  6. Download the CSV of results

  7. Repeat 3-6 for the rest of your papers

  8. Combine the CSVs. The easiest way is probably to use this notebook