Skip to content

refactor: unify markdown parsing logic to avoid memory leaking#21981

Open
ShirasawaSama wants to merge 1 commit into
open-webui:devfrom
ShirasawaSama:refactor/unify-markdown-parsing-logic
Open

refactor: unify markdown parsing logic to avoid memory leaking#21981
ShirasawaSama wants to merge 1 commit into
open-webui:devfrom
ShirasawaSama:refactor/unify-markdown-parsing-logic

Conversation

@ShirasawaSama

@ShirasawaSama ShirasawaSama commented Feb 28, 2026

Copy link
Copy Markdown
Contributor

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

  • Target branch: Verify that the pull request targets the dev branch. PRs targeting main will be immediately closed.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Add docs in Open WebUI Docs Repository. Document user-facing behavior, environment variables, public APIs/interfaces, or deployment steps.
  • Dependencies: Are there any new or upgraded dependencies? If so, explain why, update the changelog/docs, and include any compatibility notes. Actually run the code/function that uses updated library to ensure it doesn't crash.
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Include reproducible steps to demonstrate the issue before the fix. Test edge cases (URL encoding, HTML entities, types). Take this as an opportunity to make screenshots of the feature/fix and include them in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Design & Architecture: Prefer smart defaults over adding new settings; use local state for ephemeral UI logic. Open a Discussion for major architectural or UX changes.
  • Git Hygiene: Keep PRs atomic (one logical change). Clean up commits and rebase on dev to ensure no unrelated commits (e.g. from main) are included. Push updates to the existing PR branch instead of closing and reopening.
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

The current logic for parsing Markdown and registering Marked extensions is scattered across various files, leading to the following issues:

  1. Svelte component contexts are mounted onto Marked's global object, preventing Svelte components from being garbage collected and causing memory leaks.
  2. The code is difficult to maintain, and adding new extensions later cannot be managed centrally.
  3. Multiple component loads continuously add the same extensions to the marked global object, causing parsing performance degradation.
  4. The subsequent asynchronous worker parsing of Markdown functionality cannot be implemented.

Added

  • [List any new features, functionalities, or additions]

Changed

  • refactor: unify markdown parsing logic

Deprecated

  • [List any deprecated functionality or features that have been removed]

Removed

  • [List any removed features, files, or functionalities]

Fixed

  • [List any fixes, corrections, or bug fixes]

Security

  • [List any new or updated security-related changes, including vulnerability fixes]

Breaking Changes

  • BREAKING CHANGE: [List any breaking changes affecting compatibility or functionality]

Additional Information

  • [Insert any additional context, notes, or explanations for the changes]
    • [Reference any related issues, commits, or other relevant information]

Screenshots or Videos

  • [Attach any relevant screenshots or videos demonstrating the changes]

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.

@ShirasawaSama ShirasawaSama marked this pull request as draft February 28, 2026 21:25
@pr-validator-bot

Copy link
Copy Markdown

👋 Welcome and Thank You for Contributing!

We appreciate you taking the time to submit a pull request to Open WebUI!

⚠️ Important: Testing Requirements

We've recently seen an increase in PRs that have significant issues:

  • PRs that don't actually fix the bug they claim to fix
  • PRs that don't implement the feature they describe
  • PRs that break existing functionality
  • PRs that are clearly AI-generated without proper testing being done by the author
  • PRs that simply don't work as intended

These untested PRs consume significant time from maintainers and volunteer contributors who review and test PRs in their free time.
Time that could be spent testing other PRs or improving Open WebUI in other ways.

Before marking your PR as "Ready for Review":

Please explicitly confirm:

  1. ✅ You have personally tested ALL changes in this PR
  2. How you tested it (specific steps you took to verify it works)
  3. Visual evidence where applicable (screenshots or videos showing the feature/fix working) - if applicable to your specific PR

If you're not certain your PR works exactly as intended, please leave it in DRAFT mode until you've thoroughly tested it.

Thank you for helping us maintain quality and respecting the time of our community! 🙏

@ShirasawaSama ShirasawaSama marked this pull request as ready for review March 1, 2026 12:11
@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 4644d15 to fc5742b Compare March 1, 2026 12:14
@Classic298

Copy link
Copy Markdown
Collaborator

👍

@tjbck

tjbck commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

@Classic298 this needs to be tested

@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@Classic298 Would you be available to help me test this? My local testing might not cover all scenarios. I've noticed this component has a pretty severe memory leak.

@Classic298

Copy link
Copy Markdown
Collaborator

@ShirasawaSama will test, can you rebase this to latest dev? Thanks

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 2830e28 to 753a919 Compare March 9, 2026 07:57
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@Classic298 I have rebased to the latest dev branch and revalidated the core scenarios locally.

@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

Has there been any progress on this?

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 753a919 to 7c7eed7 Compare March 28, 2026 16:22
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@Classic298 In fact, we’ve found that a major cause of memory leaks right now is that the scope of the entire chat component is leaking into the marked global instance, which causes memory to keep growing and prevents it from being garbage collected.

@Classic298

Copy link
Copy Markdown
Collaborator

@pr-validator-bot pls review

@Classic298

Copy link
Copy Markdown
Collaborator

too many files to review properly

@Classic298

Copy link
Copy Markdown
Collaborator

@pr-validator-bot review

@pr-validator-bot

Copy link
Copy Markdown

🔍 Automated Code Review

Code Review Findings (click to expand)

src/lib/components/chat/Messages/Markdown/MarkdownTokens.svelte (line ~443): The template uses marked.lexer(decode(token.text)) but the import was changed from import { marked, type Token } from 'marked' to import { lexer, type Token } from '$lib/utils/marked'. Since marked is no longer imported, marked.lexer will throw a runtime error. Should be lexer(decode(token.text)) to match the new import.

This is an automated review. If you believe any finding is incorrect, please explain in a comment below.

@Classic298

Copy link
Copy Markdown
Collaborator

@ShirasawaSama

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 7c7eed7 to 6406f94 Compare March 29, 2026 18:16
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@pr-validator-bot review

@pr-validator-bot

Copy link
Copy Markdown

✅ Automated Code Review

No significant issues were detected in this PR.

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 6406f94 to f3e07af Compare April 14, 2026 18:16
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@pr-validator-bot review

@pr-validator-bot

Copy link
Copy Markdown

🤖 Automated Code Review

🔍 Standard Review — 🔍 Findings
  • src/lib/components/NotificationToast.svelte + src/lib/utils/marked/index.ts: the refactor changed renderMarkdownToHTMLSanitized to accept Marked parse options, but NotificationToast still passes { ALLOWED_TAGS: [] } (a DOMPurify option). That option is now ignored by sanitization and is instead passed to marked.parse, so the previous behavior (strip all HTML tags from toast content before markdown rendering) is lost. This is a real behavior regression in user-visible sanitization logic.
🔎 Thorough Review — 🔍 Findings

Warning – Sanitization options are being passed to the markdown parser, not to DOMPurify

File: src/lib/components/NotificationToast.svelte (usage) and src/lib/utils/marked/index.ts (API behavior)

NotificationToast now calls:

renderMarkdownToHTMLSanitized(content, { ALLOWED_TAGS: [] })

But in renderMarkdownToHTMLSanitized, the second argument is forwarded to renderMarkdownToHTML (Marked options), and then DOMPurify.sanitize(html) is called without any caller-provided sanitize config.

So ALLOWED_TAGS: [] is currently ignored for sanitization. This is a behavior regression from the previous implementation, where the toast explicitly stripped tags before/after markdown parse.

Why this matters:

  • The call site strongly implies stricter sanitization intent (ALLOWED_TAGS: []), but the new API silently does something else.
  • This can reintroduce formatting/HTML in places that intended plain-text-only rendering.

Recommendation:

  • Split API options into two explicit parameters, e.g.:
    • markedOptions?: MarkedParseOptions
    • sanitizeOptions?: DOMPurify.Config
  • Or add a dedicated helper for strict/plain-text rendering and update NotificationToast to use that.

SuggestionrenderMarkdownToHTMLSanitized can silently return empty content for async parse mode

File: src/lib/utils/marked/index.ts

Current behavior:

  • renderMarkdownToHTML may return Promise<string> when { async: true } is passed.
  • renderMarkdownToHTMLSanitized treats non-string result as '' and sanitizes that.

This creates a silent failure mode (content disappears) instead of surfacing misuse.

Recommendation:

  • Guard and throw when async mode is used in this sync helper, or make the helper async-aware.
  • Also consider narrowing the type so async mode is not allowed in renderMarkdownToHTMLSanitized at compile time.

Note on the thorough review: these findings are deliberately opinionated. Use your own judgment on scope — not every finding needs to be fixed in this PR; some may belong in a follow-up, and some may not be worth fixing at all given the context of this repository and this change. The thorough review intentionally flags things a senior reviewer might raise, not things that must be fixed. Treat naming and style suggestions especially as input, not as blockers.

Two reviews were run. The standard review only flags concrete, demonstrable bugs. The thorough review also includes opinionated suggestions about architecture, style, and naming.

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from f3e07af to 3f7b440 Compare April 14, 2026 18:26
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@pr-validator-bot review

@pr-validator-bot

Copy link
Copy Markdown

🤖 Automated Code Review

🔍 Standard Review — ✅ No issues

No issues detected.

🔎 Thorough Review — 🔍 Findings

Markdown utility refactor review

Overall this is a solid consolidation and removes a lot of repeated marked.use(...) side effects. I do see a few correctness/behavior risks in the new central API.


WarningrenderMarkdownToHTMLSanitized can silently return empty output when async parsing is enabled

File: src/lib/utils/marked/index.ts

renderMarkdownToHTMLSanitized() calls renderMarkdownToHTML() and then does:

const html = typeof raw === 'string' ? raw : '';

Because MarkedParseOptions includes async?: boolean, callers can legally pass markedOptions: { async: true }. In that case renderMarkdownToHTML() returns Promise<string>, and this helper will silently sanitize '' and return an empty string.

That failure mode is easy to miss in UI and hard to debug.

Recommendation:

  • Either make sanitized helper explicitly sync-only at the type level (exclude async), or
  • Detect promise return and throw a clear error (renderMarkdownToHTMLSanitized does not support async parsing).

Warning — Global breaks: true default changes markdown semantics in callsites that previously used default marked.parse

Files:

  • src/lib/utils/marked/marked.ts
  • src/lib/components/notes/NoteEditor.svelte
  • src/lib/components/notes/Notes.svelte
  • (and other new renderMarkdownToHTML(...) callsites without explicit breaks)

The centralized parser now sets:

const baseOptions = { ..., breaks: true, gfm: true }

Several migrated callsites now use renderMarkdownToHTML(markdown) without overriding breaks. Previously they called plain marked.parse(markdown) (default breaks: false).

This is a behavior change: single newlines now become hard breaks (<br>), which can alter persisted note HTML and rendering output.

Recommendation:

  • If this was intentional, document it explicitly in the PR/changelog because it is user-visible.
  • If not intentional, default breaks to previous behavior and pass { breaks: true } only where needed (chat-oriented renderers).

SuggestionrenderMarkdownToHTML accepts null but does not normalize it before passing to marked.parse

File: src/lib/utils/marked/marked.ts

Signature allows options?: MarkedParseOptions | null, but implementation forwards options directly casted:

return markedInstance.parse(markdown, options as MarkedOptions)

You already normalize null safely in lexer() via options ?? undefined. Doing the same here would make behavior consistent and avoid depending on marked.parse handling of null.

Recommendation:

return markedInstance.parse(markdown, (options ?? undefined) as MarkedOptions)

Note on the thorough review: these findings are deliberately opinionated. Use your own judgment on scope — not every finding needs to be fixed in this PR; some may belong in a follow-up, and some may not be worth fixing at all given the context of this repository and this change. The thorough review intentionally flags things a senior reviewer might raise, not things that must be fixed. Treat naming and style suggestions especially as input, not as blockers.

Two reviews were run. The standard review only flags concrete, demonstrable bugs. The thorough review also includes opinionated suggestions about architecture, style, and naming.

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 3f7b440 to 3bf95aa Compare April 14, 2026 18:39
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@pr-validator-bot review

@pr-validator-bot

Copy link
Copy Markdown

🤖 Automated Code Review

🔍 Standard Review — ✅ No issues

No issues detected.

🔎 Thorough Review — 🔍 Findings

⚠️ Correctness / Behavior Regression

1) NotificationToast changed sanitization semantics and now strips all markdown formatting

File: src/lib/components/NotificationToast.svelte

The old code did a two-step sanitize:

  1. sanitize raw input with ALLOWED_TAGS: [] (strip raw HTML),
  2. parse markdown,
  3. sanitize rendered HTML with default DOMPurify config.

The new call:

renderMarkdownToHTMLSanitized(content, {
  sanitizeOptions: { ALLOWED_TAGS: [] }
})

applies ALLOWED_TAGS: [] to the final rendered HTML, which removes all markdown-rendered tags too (<strong>, <em>, links, etc.).

So this is not equivalent behavior; it effectively turns output into plain text. If that wasn’t intentional, this is a regression.


💡 Type Safety / Maintainability

2) Widespread as string casts hide a real string | Promise<string> contract

Files (pattern):

  • src/lib/components/admin/Settings/Models.svelte
  • src/lib/components/chat/ModelSelector/ModelItem.svelte
  • src/lib/components/workspace/Models.svelte
  • src/lib/components/workspace/Models/BuiltinTools.svelte
  • src/lib/components/workspace/Models/Capabilities.svelte
  • src/lib/components/workspace/Models/DefaultFeatures.svelte
  • src/lib/components/notes/NoteEditor.svelte
  • src/lib/components/notes/NoteEditor/Chat.svelte
  • src/lib/components/notes/Notes.svelte
  • src/lib/components/common/RichTextInput.svelte

A lot of new usages do:

renderMarkdownToHTML(...) as string

renderMarkdownToHTML is explicitly typed as string | Promise<string>. The cast suppresses that risk and can silently produce incorrect runtime behavior if async parsing is ever enabled (or a future option/callback makes parse async).

Not a blocker today, but this is fragile. Prefer either:

  • a dedicated sync-only wrapper (renderMarkdownToHTMLSync) that enforces async: false, or
  • explicit handling/narrowing instead of type assertions.

Note on the thorough review: these findings are deliberately opinionated. Use your own judgment on scope — not every finding needs to be fixed in this PR; some may belong in a follow-up, and some may not be worth fixing at all given the context of this repository and this change. The thorough review intentionally flags things a senior reviewer might raise, not things that must be fixed. Treat naming and style suggestions especially as input, not as blockers.

Two reviews were run. The standard review only flags concrete, demonstrable bugs. The thorough review also includes opinionated suggestions about architecture, style, and naming.

@ShirasawaSama ShirasawaSama force-pushed the refactor/unify-markdown-parsing-logic branch from 3bf95aa to 77edbd5 Compare April 14, 2026 18:55
@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

@pr-validator-bot review

@Classic298

Copy link
Copy Markdown
Collaborator

i made some changes. thanks for testing it. lets see if it works

@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

i made some changes. thanks for testing it. lets see if it works

Which branch/PR?

@Classic298

Copy link
Copy Markdown
Collaborator

to the bot

@ShirasawaSama

ShirasawaSama commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

to the bot

OK, I think it's pretty good.

@pr-validator-bot

Copy link
Copy Markdown

🔎 Automated Code Review

Review Findings (click to expand)

Markdown refactor review

Overall this is a solid refactor direction (centralizing Marked config and eliminating repeated marked.use calls addresses the leak/perf issues called out in the PR description). I found a couple of regressions/risks worth addressing before merge.


Warning: behavior regression for newline handling (breaks) in places that previously relied on breaks: true

Files:

  • src/lib/components/common/RichTextInput.svelte
  • src/lib/components/notes/NoteEditor/Chat.svelte

You removed local marked.use({ breaks: true, gfm: true, ... }) setup (good), but some replacements now call renderMarkdownToHTML(...) without options.

Because centralized defaults in src/lib/utils/marked/marked.ts set breaks: false, these call sites now parse single newlines differently than before.

Examples in diff:

  • RichTextInput.svelte: const htmlContent = renderMarkdownToHTML(content);
  • NoteEditor/Chat.svelte: enhancedContent.html = renderMarkdownToHTML(enhancedContent.md);

This can silently change rendered output in editor insertion/streaming flows (line breaks collapsing unless separated by paragraph rules).

Suggested fix: explicitly pass { breaks: true } at call sites that previously depended on that behavior.


Warning: utils/index.ts now imports the DOMPurify wrapper path, expanding dependency surface unnecessarily

File:

  • src/lib/utils/index.ts

utils/index.ts switched to:

  • import { renderMarkdownToHTML } from '$lib/utils/marked';

But $lib/utils/marked (index) imports dompurify at module scope. utils/index.ts only needs non-sanitizing parse functionality, and this file is a foundational utility module used broadly.

This creates avoidable coupling and can increase bundle size / environment fragility for non-DOM contexts. Your own comment in src/lib/utils/marked/index.ts says worker-friendly imports should use ./marked directly.

Suggested fix: in src/lib/utils/index.ts, import from the worker-safe module:

  • import { renderMarkdownToHTML } from '$lib/utils/marked/marked';

If those two are addressed, the refactor looks in very good shape.


Note on this review: findings are deliberately opinionated. Use your own judgment on scope — not every finding needs to be fixed in this PR; some may belong in a follow-up, and some may not be worth fixing at all given the context of this repository and this change. This review intentionally flags things a senior reviewer might raise, not things that must be fixed. Treat naming and style suggestions especially as input, not as blockers.

This review cost: $0.5313  |  Consider Sponsoring Open WebUI or Purchasing an Enterprise License to help fuel continued progress in Open WebUI.

@Classic298

Copy link
Copy Markdown
Collaborator

i agree. but it is expensive as hell. I had to cut the normal review. now it only does the thorough review from now on

@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

i agree. but it is expensive as hell. I had to cut the normal review. now it only does the thorough review from now on

What model are you using? Is it a Claude model?

@Classic298

Copy link
Copy Markdown
Collaborator

No. I have tested many models. GPT-5.3-Codex works the best for these reviews. It is very thorough and flags things that Claude does not flag.

@ShirasawaSama

ShirasawaSama commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

No. I have tested many models. GPT-5.3-Codex works the best for these reviews. It is very thorough and flags things that Claude does not flag.

Actually, I find Copilot's code review really useful. You can check out this PR of mine for reference.

microsoft/onnxruntime#27393 (review)

@ShirasawaSama

Copy link
Copy Markdown
Contributor Author

All I can say is, if we were still deploying this project, I’d be using the company’s Opus 4.6 to review it thoroughly—that’s exactly how I used to submit code. Unfortunately, my commits are now just my own personal efforts.

@Classic298

Copy link
Copy Markdown
Collaborator

Opus 4.6 is not bad. Not at all. But i feel like it just misses things that GPT-5.3-Codex catches.

@ShirasawaSama

ShirasawaSama commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

Opus 4.6 is not bad. Not at all. But i feel like it just misses things that GPT-5.3-Codex catches.

Yes, just last week I spent $680 in a single day using Opus 4.6 to fix several states issues with the message.done field in OpenWebUI, but the problems weren’t resolved.

I think this conversation is off-topic; we should continue on Discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants