gh-150821: Skip URL parsing in mimetypes.guess_type() for file paths by gaborbernat · Pull Request #150828 · python/cpython

gaborbernat · 2026-06-02T23:29:25Z

mimetypes.guess_type() accepts either a URL or a filesystem path, so it parses its argument as a URL with urllib.parse.urlparse() before looking at the extension. The common argument is a plain file path, which has no URL scheme to find, so the parse — and the urllib.parse import it triggers — is spent on nothing. Guessing content types from file names is everywhere: static-file servers, upload handlers, archive and build tools deciding how to treat each file as they walk a tree of thousands.

A URL scheme requires a :, so a path without one cannot be a URL. This detects that case and goes straight to extension lookup, skipping urlparse() and its lazy import. Real URLs, and the rare path that contains a :, still take the full parsing path, and results are unchanged for both.

Guessing types for 15 real file names sampled from the top-1000 corpus improves from 23.4 µs to 11.0 µs, 112% faster.

Benchmark	base	patched
guess_type x15 file paths	23.4 µs	11.0 µs: 112% faster

Benchmark (pyperf)

Run base vs patched by swapping Lib/mimetypes.py on the same interpreter. The names are real file names sampled from the top-1000 corpus.

import mimetypes, pyperf
mimetypes.init()

names = ["webhook_list.py", "tox.ini", "api_management_delete_policy.py",
    ".env.sample.entra-id", "alerts_get_by_id.py", "ai_prompt_workflow.md",
    "functions.py", "sample_connections.py", "certificate_delete.py",
    "_ai_agents_instrumentor.py", ".flake8", "agent_trace_configurator.py",
    "test_ws_invoke.py", "README.md", "setup.cfg"]

runner = pyperf.Runner()
runner.bench_func("guess_type x15 file paths",
                  lambda: [mimetypes.guess_type(n) for n in names])

Resolves #150821.

Issue: Speed up mimetypes.guess_type() for plain file paths #150821

…paths guess_type() parsed every argument as a URL before checking the extension, even for plain file paths that have no scheme. Detect the no-scheme case and go straight to extension lookup, avoiding urlparse() and its lazy import. Real URLs keep the full parsing path; results are unchanged.

sobolevn

Thanks!

sobolevn · 2026-06-03T06:20:42Z

            scheme = p.scheme
            url = p.path
        else:
            return self.guess_file_type(url, strict=strict)


question: can this branch still happen now? do we have tests for this case?

Yes, it can still happen: a ':' that isn't a real URL scheme — a single-letter Windows drive like c:fake.html, or a colon elsewhere in the name like note 12:30.txt — gives urlparse an empty or single-character scheme, so it falls through to this branch and is treated as a file path. I've added test_path_with_colon_but_no_url_scheme covering those cases.

Address review: frame the fast path as 'no colon means it cannot be a URL' (a file path may legitimately contain ':' on POSIX), add the blank line before the lazy import, and cover a ':'-containing argument that is not a URL (single-letter drive, colon in the name) reaching the file path branch.

Replace the in-function import with a top-level lazy import, keeping urllib.parse off the module import path while declaring it with the other imports.

bitdancer · 2026-06-03T18:15:28Z

 except ImportError:
    _winreg = None

+lazy import urllib.parse


I believe our standard style is to put regular import lines above the try/except clauses.

Done — moved os, posixpath, and urllib.parse to module-level lazy imports at the top, above the platform try/except blocks.

bitdancer · 2026-06-03T18:15:37Z

@@ -123,10 +125,14 @@ def guess_type(self, url, strict=True):
        """
        # Lazy import to improve module import time


This comment now only applies to os...which could also be made a lazy import at the top now that we have language supported lazy imports ;) Then the comment can go away.

Good call. Made os (and posixpath) module-level lazy imports as well and removed the now-redundant comments. test_mimetypes still passes, including test_lazy_import.

Per review: place the lazy imports at the top of the module with the other imports, above the platform try/except blocks, instead of inside the functions, and drop the now-redundant 'Lazy import to improve module import time' comments.

bitdancer

LGTM, but I'm going to leave to someone with more recent involvement in the mimetypes module to merge.

gaborbernat requested a review from a team as a code owner June 2, 2026 23:29

bedevere-app Bot mentioned this pull request Jun 2, 2026

Speed up mimetypes.guess_type() for plain file paths #150821

Open

bedevere-app Bot added the awaiting review label Jun 2, 2026

gaborbernat force-pushed the opt/mimetypes-skip-urlparse branch from d35cf04 to 175764e Compare June 2, 2026 23:45

sobolevn reviewed Jun 3, 2026

View reviewed changes

StanFromIreland reviewed Jun 3, 2026

View reviewed changes

Comment thread Lib/mimetypes.py Outdated

StanFromIreland reviewed Jun 3, 2026

View reviewed changes

Comment thread Lib/mimetypes.py Outdated

Use a module-level lazy import for urllib.parse

eb9d878

Replace the in-function import with a top-level lazy import, keeping urllib.parse off the module import path while declaring it with the other imports.

gaborbernat requested review from StanFromIreland, bitdancer and sobolevn June 3, 2026 15:08

Re-run CI (flaky test_ssl on macOS)

a05f1d1

bitdancer reviewed Jun 3, 2026

View reviewed changes

bitdancer approved these changes Jun 3, 2026

View reviewed changes

bedevere-app Bot added awaiting merge and removed awaiting review labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-150821: Skip URL parsing in mimetypes.guess_type() for file paths#150828

gh-150821: Skip URL parsing in mimetypes.guess_type() for file paths#150828
gaborbernat wants to merge 5 commits into
python:mainfrom
gaborbernat:opt/mimetypes-skip-urlparse

gaborbernat commented Jun 2, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

sobolevn left a comment

Uh oh!

Uh oh!

sobolevn Jun 3, 2026

Uh oh!

gaborbernat Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

bitdancer Jun 3, 2026

Uh oh!

gaborbernat Jun 3, 2026

Uh oh!

bitdancer Jun 3, 2026

Uh oh!

gaborbernat Jun 3, 2026

Uh oh!

bitdancer left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -123,10 +125,14 @@ def guess_type(self, url, strict=True):
		"""
		# Lazy import to improve module import time

Uh oh!

Conversation

gaborbernat commented Jun 2, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sobolevn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sobolevn Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gaborbernat Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bitdancer Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gaborbernat Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

bitdancer Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gaborbernat Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

bitdancer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gaborbernat commented Jun 2, 2026 •

edited by bedevere-app Bot

Loading