Skip to content

“Valid YouTube links rejected when embedded in text (URL parsing bug)” #328

@Aphotic

Description

@Aphotic

Description:
ytdlp fails to extract YouTube URL from mixed text (non-URL prefix / multilingual text)
When passing input that contains a valid YouTube link embedded within additional text (including timestamps, labels, or non-English characters), the downloader fails to parse the URL and throws a generic URL validation error.

Example input:

[2026-03-26 18:06] Stash: https://youtu.be/Xlw1bbivnio

Error:

ERROR: [generic] '[2026-03-26 18:06] Stash: https://youtu.be/Xlw1bbivnio' is not a valid URL

Expected behavior:
The application should robustly extract and process any type(short/long) valid YouTube URLs even when they are embedded within surrounding text, regardless of:

  • Prefix/suffix text (timestamps, labels, logs, etc.)
  • Language or character set
  • Mixed or unstructured input

Actual behavior:
The entire string is treated as a URL, causing validation failure instead of isolating the valid link.

Environment:

  • OS: Windows 10

Suggested improvement:
Implement URL extraction/parsing logic that:

  • Detects and isolates valid URLs within arbitrary text
  • Supports multilingual and mixed-character input
  • Gracefully ignores surrounding non-URL content

Impact:
Prevents batch processing or automation workflows where URLs are embedded in logs, chat exports, or multilingual text sources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions