keptlocal
Files never leave your browser

PDF to Text

Upload a PDF and the tool extracts all readable text, page by page. Copy it to the clipboard or download as a .txt file. Nothing is uploaded.

Drop a PDF here, or

Files never leave your browser.

Drop a PDF to extract its text.

How to extract text from a PDF

  1. Drop your PDF into the zone above, or click to browse and select it.
  2. Click Extract text. The tool processes each page in turn — you will see the progress as it works.
  3. The text appears in the box above, organised by page. Click Copy all to copy it to your clipboard, or Download .txt to save it as a plain text file.

Everything runs in your browser using pdf.js. No file is sent to a server — open DevTools (F12) → Network while extracting to confirm zero upload requests.

When to extract text from a PDF

  • Copying content into another document — extract the text from a report or article so you can paste it into a Word document, Google Doc, or email.
  • Searching inside a large PDF — extract and paste into a text editor with better search than your PDF viewer.
  • Feeding text to an AI tool — extract the content of a PDF so you can paste it into ChatGPT, Claude, or another AI for summarisation or analysis.
  • Data extraction from reports — pull tables and figures from PDFs for further processing in a spreadsheet.
  • Accessibility — convert a PDF to plain text for easier reading in a text-to-speech tool or screen reader.

How it works under the hood

pdf.js loads the PDF and parses its page content streams. Each page in a PDF contains drawing instructions — text is stored as positioned character sequences, not as flowing paragraphs. pdf.js calls page.getTextContent() to retrieve each text item with its position data, then the tool joins adjacent items into lines and pages.

Because PDF does not natively store reading order, the tool reconstructs it from character positions. Simple single-column documents extract cleanly. Complex multi-column layouts, tables, and rotated text may produce text in an unexpected order — this is a structural limitation of the PDF format.

Limits and what to expect

  • Scanned PDFs: a PDF that is just a scan (image of a page) contains no text data — only pixels. The tool will return empty output. To extract text from a scanned PDF, you need OCR (optical character recognition) — a feature we plan to add via optional cloud processing.
  • Complex layouts: multi-column layouts, text in tables, and rotated text may extract in a different order than they visually appear. For clean extraction, straightforward single-column documents work best.
  • Ligatures and special characters: some fonts use ligatures (like "fi" rendered as a single glyph) that may not extract correctly. This depends on how the font is embedded in the PDF.
  • Password-protected PDFs: the PDF must be openable without a password. PDFs that restrict content viewing cannot be extracted.
  • Large documents: each page is processed in sequence. Very large PDFs (100+ pages) take a few seconds — you will see progress as each page completes.

Privacy compared to other PDF-to-text tools

Online PDF-to-text converters upload your document to a server for processing. For PDFs containing sensitive information — client reports, legal documents, financial statements, medical records — that upload is a real risk.

keptlocal extracts text entirely inside your browser. Your PDF never leaves your device. The extracted text is displayed in the browser and downloaded as a .txt file — no intermediate server is involved at any point.

Frequently asked questions

Are my files uploaded to a server?
No. Text extraction runs entirely in your browser using pdf.js. Your PDF never leaves your device — open DevTools → Network while processing to confirm zero upload requests.
Why does the extracted text look scrambled or out of order?
PDF does not store text in reading order — it stores it as positioned drawing instructions. pdf.js reconstructs reading order from position data, but complex multi-column layouts, tables, and rotated text can produce unexpected ordering. This is a limitation of the PDF format, not the tool.
Can I extract text from a scanned PDF?
Only if the PDF was OCR-processed and contains an embedded text layer. A scanned PDF that is just an image of a page contains no extractable text — the tool will return empty output for those pages.
What encoding is the downloaded .txt file?
UTF-8, which supports all languages and special characters present in the source PDF.
Is there a page limit?
No hard limit. Large PDFs take a few seconds as each page is processed in turn — you will see the progress update as pages are extracted.
Can I extract text from a password-protected PDF?
Only PDFs that can be opened without a password. If the PDF restricts content viewing, extraction will fail — unlock it in your PDF reader first.