gh-150815: Speed up copy.deepcopy() of containers with atomic elements by gaborbernat · Pull Request #150822 · python/cpython

gaborbernat · 2026-06-02T23:29:10Z

copy.deepcopy() copies a structure by sending every element back through deepcopy(). For elements that need no copying at all — strings, ints, None, booleans, floats and the other immutable atomic types — that round trip costs a full function call each, even though the value handed back is the same object. Real data is dominated by these atomic leaves: a parsed JSON document, a settings dict cloned before mutation, a record copied inside a framework. The keys are strings and most values are strings and numbers, so copying spends most of its time calling deepcopy() only to get the same object straight back.

This folds the atomic-type check that already gates the top of deepcopy() into the dict, list and tuple copiers, so an atomic element is returned as-is without the per-item call. The check is the same one deepcopy() runs, and atomic objects are not memoized either way, so the result is identical for shared references, recursive structures and int/tuple subclasses.

Deep-copying 105 JSON documents drawn from the top-1000 PyPI projects improves from 1.21 ms to 990 µs, 22% faster. This follows the atomic fast path added in gh-114264, extending it from the entry point to the per-element loop.

Benchmark	base	patched
deepcopy 105 real corpus JSON objects	1.21 ms	990 µs: 22% faster

Benchmark (pyperf)

Run base vs patched by swapping Lib/copy.py on the same interpreter. The figure above is from 105 JSON documents in the top-1000 PyPI corpus; the self-contained script below builds an equivalent atomic-heavy structure and shows a comparable percentage gain.

import copy, pyperf

# Representative of parsed-JSON / config data: string keys, scalar leaves.
doc = {
    "name": "example-package", "version": "1.2.3", "private": False,
    "scripts": {"build": "tsc", "test": "pytest", "lint": "ruff check ."},
    "keywords": ["cli", "async", "http", "json"],
    "dependencies": {f"dep{i}": f"^{i}.0.0" for i in range(20)},
    "authors": [{"name": f"Person {i}", "email": f"p{i}@example.com", "active": True} for i in range(10)],
    "config": {"timeout": 30, "retries": 3, "verbose": False, "level": None},
}
objs = [doc] * 50

runner = pyperf.Runner()
runner.bench_func("deepcopy atomic-heavy structures", lambda: [copy.deepcopy(o) for o in objs])

Resolves #150815.

Issue: Speed up copy.deepcopy() of containers holding atomic elements #150815

JelleZijlstra · 2026-06-03T05:03:24Z



-def _deepcopy_list(x, memo, deepcopy=deepcopy):
+def _deepcopy_list(x, memo, deepcopy=deepcopy, _atomic=_atomic_types):


Is this trick of capturing the global in a local still a net positive in newer Python?

The benchmark numbers I provided were done against a build on the main branch. Now I haven't tried enabling the JIT or any other advanced features, but out of the box there is a significant benefit here.

But did those gains come from using a local or only from doing the type(...) in check?

I think we should let the performance people do their thing and not try to artificially speed things up like this.

eendebakpt · 2026-06-03T06:48:45Z

There are more options available to make deepcopy faster (see #91610 (comment)).

If we want to make deepcopy faster, I believe we should gather enough support so a core dev can review a C (or rust?) implementation. With a C implementation we have much larger performance gains (see https://github.com/percolab/copium for example).

gaborbernat · 2026-06-03T14:00:02Z

I don't think a pure-Python tweak and a C/Rust deepcopy are opposing directions — they optimize different ends and can coexist. This change helps every build today with no extension to compile and no new maintenance surface, and it doesn't block or complicate a future C implementation; if one lands, this just becomes a small fast path that the C version supersedes.

I've also reduced this PR to the minimal change: a 3-way benchmark (base / inline-check-with-local-capture / inline-check-using-the-global) showed the entire speedup comes from the inlined type(...) in _atomic_types check, and the local-variable capture added nothing measurable, so I dropped it.

…lements The dict, list and tuple deep-copiers send every element back through deepcopy(), paying a function call even for atomic immutable elements that deepcopy() returns unchanged. Inline the atomic-type check into the three copiers so those elements are returned as-is. Behavior is identical, including shared references, recursion and int/tuple subclasses.

The benchmark shows the speedup comes entirely from the inlined type(...) in _atomic_types check, not from binding the global to a local default argument, so drop the local capture for a minimal change.

Address review: the inline atomic check recomputed type() for non-atomic elements (once inline, once inside deepcopy()), a measurable regression on non-atomic-heavy containers. Split deepcopy() into a thin entry plus _deepcopy_fallback() that takes the already-computed class, so each element is typed exactly once whether or not it is atomic. Apply the same pattern to the list, tuple, dict and frozendict copiers.

gaborbernat · 2026-06-03T22:02:11Z

@Bobronium is right: the first version ran type() twice for every non-atomic element, once in the inline guard and once again inside deepcopy(). I measured the cost, tested three designs to address it, and picked the one that is fastest with the smallest regression.

Method

I compared each version of Lib/copy.py on the same built interpreter (3.16.0a0), loaded as the copy module, driven by pyperf (1 warmup + 3 values × 20 processes, 5 independent runs per version). Coefficient of variation across the 5 run-means stayed at or below 1.1% on every workload, so the gaps below are real rather than jitter. Workloads span all-atomic to all-non-atomic, with the worst case for the regression (empties: a list of 2000 [], non-atomic elements with no atomic data to amortize the check against).

Options tested

Inline guard (the first push): a if type(a) in _atomic_types else deepcopy(a, memo) in each copier. Skips the call for atomic elements but types non-atomic elements twice.
Inline guard + local capture: same, with type and _atomic_types bound to locals/default args in each copier.
Single type(): split deepcopy() into a thin entry plus an internal _deepcopy_fallback(x, memo, cls) that takes the already-computed class. Each element is typed once whether or not it is atomic.

Results (speedup vs `main`, higher is better)

Benchmark	base	1. inline	2. inline+local	3. single-type
json_corpus (atomic-heavy)	1.777 ms	1.34x	1.39x	1.40x
list[int] (pure atomic)	0.131 ms	2.11x	2.27x	2.19x
empties (worst non-atomic)	1.804 ms	0.96x	0.96x	0.98x
nested dict (realistic)	8.817 ms	1.13x	1.15x	1.18x
list[Obj] (non-atomic obj)	6.470 ms	1.05x	1.07x	1.06x

How I chose

Option 2 (local capture) buys about 3% on the atomic cases but leaves the worst case at 0.96x, since it removes a global lookup, not the second type() call. It does not address the review. (I also tried option 3 plus local capture; it came out slightly worse than plain option 3 on the worst case and the realistic case, so the extra binding was not worth it.)

Option 3 wins on every axis that matters: it has the smallest regression (0.98x vs 0.96x, halving the worst-case loss), it is fastest on the realistic mixed workload (+18%), and it keeps the atomic gains (1.40x and 2.19x). The regression that remains is narrow: it needs non-atomic elements holding no atomic data anywhere in their subtree. Any atomic leaf tips the structure positive, which is why nested dict lands at +18%.

The branch now uses option 3, applied to the list, tuple, dict and frozendict copiers. test_copy (83 tests) and test_pickle pass. Raw pyperf JSON for all runs is available if anyone wants to verify.

bedevere-app Bot added the awaiting review label Jun 2, 2026

bedevere-app Bot mentioned this pull request Jun 2, 2026

Speed up copy.deepcopy() of containers holding atomic elements #150815

Open

gaborbernat force-pushed the opt/deepcopy-inline-atomic branch from aeab6f7 to 8b8b7e8 Compare June 2, 2026 23:45

JelleZijlstra reviewed Jun 3, 2026

View reviewed changes

gaborbernat requested review from JelleZijlstra and ZeroIntensity June 3, 2026 14:38

Bobronium reviewed Jun 3, 2026

View reviewed changes

Comment thread Lib/copy.py Outdated

gaborbernat added 3 commits June 3, 2026 11:26

Use the _atomic_types global directly without capturing it

8968c33

The benchmark shows the speedup comes entirely from the inlined type(...) in _atomic_types check, not from binding the global to a local default argument, so drop the local capture for a minimal change.

gaborbernat force-pushed the opt/deepcopy-inline-atomic branch from 6ceb409 to 0c3f9e1 Compare June 3, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-150815: Speed up copy.deepcopy() of containers with atomic elements#150822

gh-150815: Speed up copy.deepcopy() of containers with atomic elements#150822
gaborbernat wants to merge 3 commits into
python:mainfrom
gaborbernat:opt/deepcopy-inline-atomic

gaborbernat commented Jun 2, 2026 •

edited

Loading

Uh oh!

JelleZijlstra Jun 3, 2026

Uh oh!

gaborbernat Jun 3, 2026

Uh oh!

JelleZijlstra Jun 3, 2026

Uh oh!

ZeroIntensity Jun 3, 2026

Uh oh!

eendebakpt commented Jun 3, 2026

Uh oh!

gaborbernat commented Jun 3, 2026

Uh oh!

Uh oh!

gaborbernat commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants



		def _deepcopy_list(x, memo, deepcopy=deepcopy):
		def _deepcopy_list(x, memo, deepcopy=deepcopy, _atomic=_atomic_types):

Uh oh!

Conversation

gaborbernat commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JelleZijlstra Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gaborbernat Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

JelleZijlstra Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

eendebakpt commented Jun 3, 2026

Uh oh!

gaborbernat commented Jun 3, 2026

Uh oh!

Uh oh!

gaborbernat commented Jun 3, 2026

Method

Options tested

Results (speedup vs main, higher is better)

How I chose

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gaborbernat commented Jun 2, 2026 •

edited

Loading

Results (speedup vs `main`, higher is better)