How to extract text from a PDF
- Drop your PDF into the zone above, or click to browse and select it.
- Click Extract text. The tool processes each page in turn — you will see the progress as it works.
- The text appears in the box above, organised by page. Click Copy all to copy it to your clipboard, or Download .txt to save it as a plain text file.
Everything runs in your browser using pdf.js.
No file is sent to a server — open DevTools (F12) → Network while extracting to confirm zero upload requests.
When to extract text from a PDF
- Copying content into another document — extract the text from a report or article so you can paste it into a Word document, Google Doc, or email.
- Searching inside a large PDF — extract and paste into a text editor with better search than your PDF viewer.
- Feeding text to an AI tool — extract the content of a PDF so you can paste it into ChatGPT, Claude, or another AI for summarisation or analysis.
- Data extraction from reports — pull tables and figures from PDFs for further processing in a spreadsheet.
- Accessibility — convert a PDF to plain text for easier reading in a text-to-speech tool or screen reader.
How it works under the hood
pdf.js loads the PDF and parses its page content streams. Each page in a PDF contains drawing instructions
— text is stored as positioned character sequences, not as flowing paragraphs. pdf.js calls
page.getTextContent() to retrieve each text item with its position data, then the tool
joins adjacent items into lines and pages.
Because PDF does not natively store reading order, the tool reconstructs it from character positions. Simple single-column documents extract cleanly. Complex multi-column layouts, tables, and rotated text may produce text in an unexpected order — this is a structural limitation of the PDF format.
Limits and what to expect
- Scanned PDFs: a PDF that is just a scan (image of a page) contains no text data — only pixels. The tool will return empty output. To extract text from a scanned PDF, you need OCR (optical character recognition) — a feature we plan to add via optional cloud processing.
- Complex layouts: multi-column layouts, text in tables, and rotated text may extract in a different order than they visually appear. For clean extraction, straightforward single-column documents work best.
- Ligatures and special characters: some fonts use ligatures (like "fi" rendered as a single glyph) that may not extract correctly. This depends on how the font is embedded in the PDF.
- Password-protected PDFs: the PDF must be openable without a password. PDFs that restrict content viewing cannot be extracted.
- Large documents: each page is processed in sequence. Very large PDFs (100+ pages) take a few seconds — you will see progress as each page completes.
Privacy compared to other PDF-to-text tools
Online PDF-to-text converters upload your document to a server for processing. For PDFs containing sensitive information — client reports, legal documents, financial statements, medical records — that upload is a real risk.
keptlocal extracts text entirely inside your browser. Your PDF never leaves your device. The extracted text is displayed in the browser and downloaded as a .txt file — no intermediate server is involved at any point.