Skip to content

Improve numbering numId lookup#1553

Open
Sean-Kenneth-Doherty wants to merge 1 commit into
python-openxml:masterfrom
Sean-Kenneth-Doherty:codex/numbering-next-numid-set
Open

Improve numbering numId lookup#1553
Sean-Kenneth-Doherty wants to merge 1 commit into
python-openxml:masterfrom
Sean-Kenneth-Doherty:codex/numbering-next-numid-set

Conversation

@Sean-Kenneth-Doherty
Copy link
Copy Markdown

Fixes #940.

CT_Numbering._next_numId gathered existing IDs in a list, then checked membership for every candidate ID. When numbering definitions grow large and contiguous, that makes the lookup quadratic in the number of <w:num> elements.

This changes the collection to a set so membership is constant-time while preserving the existing behavior of returning the first gap, starting at 1. I also added focused oxml tests for empty, contiguous, and gapped numbering definitions.

Local validation:

  • uv run --with pytest --with ruff --with-editable . python -m pytest tests/oxml/test_numbering.py -q
  • uv run --with pytest --with ruff --with-editable . python -m pytest tests/oxml/test_numbering.py tests/parts/test_numbering.py -q
  • uv run --with pytest --with ruff --with-editable . python -m ruff check src/docx/oxml/numbering.py tests/oxml/test_numbering.py
  • git diff --check

Local timing sanity check on 5,000 contiguous IDs, isolating the membership loop:

  • list membership path: 0.387197s
  • set membership path: 0.008164s

@Sean-Kenneth-Doherty Sean-Kenneth-Doherty marked this pull request as ready for review May 16, 2026 22:21
@Sean-Kenneth-Doherty
Copy link
Copy Markdown
Author

Moved out of draft after local validation:

  • uv run --with pytest --with ruff --with-editable . python -m pytest tests/oxml/test_numbering.py tests/parts/test_numbering.py -q -> 8 passed
  • uv run --with pytest --with ruff --with-editable . python -m pytest tests/oxml/test_numbering.py -q -> 3 passed
  • uv run --with pytest --with ruff --with-editable . python -m ruff check src/docx/oxml/numbering.py tests/oxml/test_numbering.py
  • git diff --check

Timing sanity check on 5,000 contiguous IDs, isolating the membership loop:

  • list membership path: 0.797563s
  • set membership path: 0.003804s
  • speedup: 209.7x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_next_numId takes too long time when generate a big document

1 participant