Security

This fork is a file-format library. It writes and reads .docx bytes. It does not render, sandbox, or validate the semantic content that Word will subsequently process. Callers who accept untrusted input are responsible for their own sanitisation.

altChunk payloads (HTML, RTF, MHTML, plain text)

An altChunk is a Word primitive that embeds a foreign payload in the package and asks Word to substitute the payload's rendered content for the <w:altChunk> marker on open. The substitution runs inside Word's native import filters, not inside python-docx.

That means an attacker who controls the altChunk payload controls whatever Word's filter chooses to interpret:

HTML / XHTML (add_html_chunk) — Word's HTML import pipeline historically evaluates embedded scripts, external image URLs, and conditional-comment VML. It has been the vector for CVE-2017-11826, CVE-2018-0802, and more recent template-injection issues.
RTF (add_rtf_chunk) — RTF's control-word grammar lets payloads carry embedded OLE objects, external data links, and remote templates. CVE-2017-0199 and CVE-2023-21716 are well-known RCE examples that were triggered simply by opening a document.
MHTML (add_mhtml_chunk) — multi-part archives combine HTML plus related resources; the HTML portion has the same exposure as HTML altChunks.
Plain text (add_text_chunk) — lowest-risk, but note Word will still interpret hyperlink-looking strings on autoformat.

python-docx's stance

python-docx does not auto-sanitise altChunk payloads. None of the helpers below inspect, rewrite, or strip the input:

Document.add_alt_chunk(content, content_type=..., match_src=...)
Document.add_html_chunk(html, match_src=...)
Document.add_text_chunk(text, encoding=..., match_src=...)
Document.add_rtf_chunk(rtf, match_src=...)
Document.add_mhtml_chunk(mhtml, match_src=...)

This is deliberate — sanitisation is a content-policy concern that belongs to the embedding application, not to the serializer.

What callers should do

If the payload originated outside your trust boundary:

Sanitise HTML with a library like bleach or nh3 with an explicit allow list of tags/attributes. Strip <script>, event handlers (on*=), javascript: / data: URIs, and external references.
Reject RTF unless you can cryptographically verify its origin. RTF has no practical sanitiser; there is no safe subset for opaque third-party input.
Reject MHTML by default — it multiplexes HTML plus arbitrary MIME parts. Unpack it, sanitise the HTML portion, drop the attachments, and re-emit a plain HTML altChunk if you must.
Plain text is safe to embed but will be rendered verbatim; consider HTML-escaping if the caller expects literal characters like < or & to round-trip.

Document protection password hash is SHA-1 — not a confidentiality control

settings.xml carries a <w:documentProtection> element whose @w:hash attribute holds a hash of the editor-restriction password. The hashing algorithm (@w:cryptAlgorithmSid="4") is SHA-1 with a small per-document salt and an iteration count; the algorithm is spec-mandated by ECMA-376 §17.15.1.29 and Microsoft's MS-OFFCRYPTO §2.3.7.1. python-docx writes whatever the spec requires — not what a modern password-hashing policy would prefer — because Word refuses to verify hashes produced with any other algorithm.

What this protects

DocumentProtection prevents casual, in-UI editing of a document opened in Microsoft Word. That is all. It is a user-experience guardrail, not a cryptographic access-control mechanism:

The hash and salt live in plaintext inside settings.xml — any tool that can read the package bytes (including python-docx itself) can remove the element and drop the protection.
SHA-1 collision / preimage attacks are well-documented; brute-forcing the original password against the stored hash is feasible for weak passwords on commodity hardware.
The document body is not encrypted. Only password-based AES encryption (python-docx[encryption], handled by python-ooxml-crypto under MS-OFFCRYPTO §2.3.4) provides actual confidentiality.

What callers should do

Treat DocumentProtection as a UI hint, not a secret.
For confidentiality, use whole-package encryption (Document.save(..., password="…")), which applies AES under the MS-OFFCRYPTO Agile profile.
Never store a high-value password in the protection hash — assume anyone who opens the file can see / crack it.

Reporting a vulnerability in python-docx itself

If you find a security issue in the loadfix fork (parser memory-exhaustion, ZIP-path traversal, XML external-entity handling, etc.), file an issue on the loadfix monorepo and mark it security. Do not open a public PR that includes a working exploit; drop a note that describes the class of issue and wait for a maintainer to reach out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SECURITY.md

Security — python-docx (loadfix fork)

altChunk payloads (HTML, RTF, MHTML, plain text)

python-docx's stance

What callers should do

See also

Document protection password hash is SHA-1 — not a confidentiality control

What this protects

What callers should do

Reporting a vulnerability in python-docx itself

There aren't any published security advisories

Security: loadfix/python-docx

Security

SECURITY.md

Security — python-docx (loadfix fork)

altChunk payloads (HTML, RTF, MHTML, plain text)

python-docx's stance

What callers should do

See also

Document protection password hash is SHA-1 — not a confidentiality control

What this protects

What callers should do

Reporting a vulnerability in python-docx itself

There aren't any published security advisories