May 23, 2026 · 7 min read · PDF

Compressing PDFs: File Size vs. Quality Explained

Hitendra Patel

Founder, keptlocal · Senior Technical Lead, Healthcare IT

A one-page PDF from Microsoft Word can be 500 KB. A scanned document of the same page might be 3 MB. A slide deck exported from PowerPoint can reach 50 MB. Compressing a PDF is rarely as simple as "make it smaller" — the right approach depends on what is making it large in the first place.

Why are PDFs so large?

A PDF is a container format. It can hold text (as actual characters), vector graphics (as drawing instructions), raster images (as pixel data), fonts, metadata, and more. The dominant contributor to file size varies by document type.

Embedded images are almost always the largest component. A slide deck with high-resolution stock photos is large because those photos are large. Scanned PDFs are a specific case worth understanding: a scanned page is a photograph of paper. A colour scan of one A4 page at 300 DPI produces an image of 2480 × 3508 pixels — roughly 25 megapixels. Stored as JPEG at reasonable quality, that is 500 KB–2 MB per page. A 100-page scanned document is easily 100–200 MB before any compression.

Fonts are the less obvious culprit — and the one that surprised me most when I first started pulling PDFs apart. PDFs embed font data so the document renders identically on every device. A document using three different font families, each with multiple weights, may carry 1–3 MB of font data before a single word of content is counted. Subsetting — embedding only the characters actually used — is a common optimisation that reduces this significantly. Some tools export PDFs with images stored as raw uncompressed bitmaps rather than JPEG or PNG; the same photograph at the same resolution is 10–20× larger uncompressed. And PDFs assembled by merging other PDFs sometimes contain duplicated fonts, colour profiles, or images — the same resource embedded multiple times because each source document carried its own copy.

What compression actually does

"Compressing a PDF" can mean several different things depending on the tool:

Re-encoding embedded images

The most impactful type of compression for most documents. Images in the PDF are decoded, re-encoded as JPEG (or JPEG 2000) at a lower quality setting, and re-embedded. A photograph stored as a high-quality JPEG at 95% quality might drop from 2 MB to 300 KB at 75% quality with little visible difference on screen.

The caveat: if the original image was already compressed before being embedded, re-compressing it again introduces generation loss. Each re-encode at a lossy quality setting discards additional data. The degradation is cumulative.

Reducing image resolution (downsampling)

A photograph exported at 600 DPI is overkill for a document that will only be read on screen or printed on a standard office printer. Downsampling reduces the image dimensions before re-encoding — a 600 DPI image scaled to 150 DPI has one-sixteenth the pixel count and therefore roughly one-sixteenth the file size (before compression). The trade-off is visible when zooming or printing at high quality.

Font subsetting

If a PDF embeds a complete font — all characters, all weights — but only uses a subset of those characters, the tool can strip the unused character data. A document that uses Helvetica for English text does not need the Cyrillic or Arabic glyphs. Subsetting can reduce font data by 50–80%.

Removing metadata and hidden content

PDFs can contain author comments, revision history, embedded thumbnails, document properties, and other metadata that add to file size without contributing to the visible content. Stripping these is lossless — the visible document is unchanged.

Lossless stream compression

PDF streams (the internal data containers) can be compressed using algorithms like Flate (zlib/deflate). Applying or improving this compression is lossless — the output is bit-for-bit identical to the input from a content perspective, just smaller.

Why browser-based PDF compression has limits

True PDF compression — particularly image downsampling and re-encoding at the byte level — requires native code execution for practical performance. Tools like Ghostscript and MuPDF are compiled binaries that can process a 100-page scanned document in seconds. Reimplementing that in pure JavaScript, running inside a browser tab, is computationally prohibitive for large files.

What browser-based tools like keptlocal's Compress Image can do: compress individual images using the Canvas API and JPEG encoder built into the browser. This works well for image files and for PDFs where the dominant content is photographs.

For aggressive PDF-level compression — downsampling embedded images, applying Ghostscript-style linearisation, stripping metadata across a complex document — a desktop tool or a server-side compressor is the right choice. When using a server-side compressor, be aware of the privacy implications of uploading discussed elsewhere on this site.

Choosing a target file size

The right target depends on how the document will be used:

For email attachments, most providers enforce limits of 10–25 MB. A practical target is under 5 MB to ensure delivery across varied mail servers. For documents with photos, 150 DPI at 75% JPEG quality gets you there without perceptible quality loss at normal reading size. Web uploads depend on the platform — WordPress defaults to 2 MB, government portals often set 5–10 MB limits. The same 150 DPI / 75% setting clears most of them.

Do not compress a document you intend to send to a commercial printer. Print requires 300 DPI at minimum, often 300–600 DPI for images. Compressing below 300 DPI produces visible pixelation when printed. For screen-only documents, by contrast, 96–150 DPI is indistinguishable from 300 DPI on screen — aggressive downsampling is appropriate and produces the smallest files. And for archival storage: do not compress originals. Store full quality, distribute compressed copies.

What you lose when you compress

Lossless compression (stream compression, metadata removal, font subsetting) loses nothing visible. The document looks and prints identically.

Lossy compression (image re-encoding, downsampling) loses data permanently. The discarded data cannot be recovered from the compressed file. Specific things that degrade:

Fine text on image-based pages is the most common problem: scanned documents with small text at the margins can become unreadable after aggressive compression. 75% JPEG quality on a dense text scan introduces visible blocking artefacts. Photographs with subtle gradients — skin tones, skies — show JPEG banding at lower quality settings. On screen you can live with it; in a printed portfolio or report you cannot. Line art and diagrams are the worst case: JPEG compresses sharp-edged content poorly, blurring the edges that carry the meaning. For documents with technical diagrams, PNG or lossless PDF compression is the right choice.

The workflow that works

Identify what is making the file large. Open the PDF properties (File → Properties in Adobe Reader, or inspect via our PDF Info Viewer). If the file is under 5 MB, it may not need compression at all — check the target limit first.
Try lossless options first. Remove unnecessary metadata and apply stream compression before touching image quality. Many tools can reduce file size 20–40% losslessly.
Set the minimum quality that meets your use case. If the document will be read on screen only, 150 DPI and 75% JPEG quality is the sweet spot. Aggressive settings below 100 DPI or below 60% quality introduce visible degradation.
Test the result before distributing. Open the compressed file, zoom to 100%, and look at a text-heavy region and an image-heavy region. If both look acceptable, the compression is fine. If either shows artefacts, reduce the compression level.
Keep the original. Never overwrite the source file with the compressed version. Compression is irreversible — you cannot recover quality once discarded. This is the step people skip and regret.

Working with individual images rather than a whole PDF? Try the keptlocal Compress Image tool — runs in your browser with no upload required.

Free browser tool

Compress PDF

Reduce PDF file size by removing unused objects and metadata — in your browser.

No upload. No signup. Runs in your browser.

Use Compress PDF