gh-149079: Optimize sorting in unicodedata.normalize() by serhiy-storchaka · Pull Request #150782 · python/cpython

serhiy-storchaka · 2026-06-02T11:54:51Z

Sort the Py_UCS4 buffer instead of PyUnicodeObject. This allows to avoid the use of PyUnicode_READ() and PyUnicode_WRITE().

Issue: O(n²) insertion sort in unicodedata.normalize("NFC") canonical ordering #149079

Sort the Py_UCS4 buffer instead of PyUnicodeObject. This allows to avoid the use of PyUnicode_READ() and PyUnicode_WRITE().

serhiy-storchaka · 2026-06-02T12:00:59Z

./python -m timeit 's=("a"+"\u0300\u0327"*1000)*100; from unicodedata import normalize' -- 'normalize("NFC", s)'

Baseline: 100 loops, best of 5: 3.76 msec per loop
This PR: 100 loops, best of 5: 3.57 msec per loop

./python -m timeit 's=("a"+"\u0300\u0327"*9)*10000; from unicodedata import normalize' -- 'normalize("NFC", s)'

Baseline: 100 loops, best of 5: 3.99 msec per loop
This PR: 100 loops, best of 5: 3.84 msec per loop

eendebakpt · 2026-06-02T13:59:12Z

@serhiy-storchaka On your benchmark I can improve from 3.75 ms to 2.0 ms by using a more efficient search in find_nfc_index. See main...eendebakpt:gh-149079-find-nfc-index. The changes in the PR look good at first sight.

serhiy-storchaka · 2026-06-02T14:39:04Z

by using a more efficient search in find_nfc_index.

Looks interesting. But this is a different issue, not directly related to #149079. Can you open a new issue?

pythongh-149079: Optimize sorting in unicodedata.normalize()

4a6e912

Sort the Py_UCS4 buffer instead of PyUnicodeObject. This allows to avoid the use of PyUnicode_READ() and PyUnicode_WRITE().

serhiy-storchaka requested a review from sethmlarson June 2, 2026 11:54

serhiy-storchaka added the skip news label Jun 2, 2026

bedevere-app Bot mentioned this pull request Jun 2, 2026

O(n²) insertion sort in unicodedata.normalize("NFC") canonical ordering #149079

Open

bedevere-app Bot added the awaiting core review label Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-149079: Optimize sorting in unicodedata.normalize()#150782

gh-149079: Optimize sorting in unicodedata.normalize()#150782
serhiy-storchaka wants to merge 1 commit into
python:mainfrom
serhiy-storchaka:unicodedata-normalize-optimize

serhiy-storchaka commented Jun 2, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

serhiy-storchaka commented Jun 2, 2026 •

edited

Loading

Uh oh!

eendebakpt commented Jun 2, 2026

Uh oh!

serhiy-storchaka commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

serhiy-storchaka commented Jun 2, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eendebakpt commented Jun 2, 2026

Uh oh!

serhiy-storchaka commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

serhiy-storchaka commented Jun 2, 2026 •

edited by bedevere-app Bot

Loading

serhiy-storchaka commented Jun 2, 2026 •

edited

Loading