Skip to content

feat(orama): local only database#8836

Open
avivkeller wants to merge 6 commits intomainfrom
orama-local
Open

feat(orama): local only database#8836
avivkeller wants to merge 6 commits intomainfrom
orama-local

Conversation

@avivkeller
Copy link
Copy Markdown
Member

cc @nodejs/web-infra

This PR changes our Orama implementation to use a locally-merged version of our API docs Orama and Learn Orama.

Copilot AI review requested due to automatic review settings April 22, 2026 18:31
@avivkeller avivkeller requested review from a team as code owners April 22, 2026 18:31
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nodejs-org Ready Ready Preview Apr 22, 2026 10:11pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 22, 2026

PR Summary

Medium Risk
Medium risk because it replaces the search backend and removes Orama Cloud sync/chat UI, which could impact search relevance, routing, and runtime performance due to client-side index fetching/merging.

Overview
The site search integration is reworked to use a local, client-side Orama database built by fetching multiple prebuilt index snapshots (currently learn and api) from ORAMA_DB_URLS, prefixing document hrefs per section, and merging them into one DB via the new useOrama hook.

This removes the previous Orama Cloud setup and chat-oriented search UI: the sync-orama GitHub workflow, Orama Cloud env/constants, sync scripts/tests, and the custom apps/site Searchbox/chat components are deleted. The UI components package is bumped to 1.7.0, adds a new consolidated Common/Search component with footer shortcuts, and simplifies hit rendering (drops mode/chat-specific focus handling) while inlining default query parameters in Search/Results.

Reviewed by Cursor Bugbot for commit aa96ac2. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Codeowner Review Request

The following codeowners have been identified for the changed files:

Team reviewers: @nodejs/web-infra @nodejs/nodejs-website

Please review the changes when you have a chance. Thank you! 🙏

Comment thread apps/site/scripts/orama/index.mjs Outdated
Comment thread apps/site/scripts/orama/index.mjs Outdated
Comment thread packages/ui-components/src/hooks/useOrama.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the website search integration away from Orama Cloud toward a locally generated (and locally loaded) Orama database that merges the Learn + API docs indexes.

Changes:

  • Generate a merged orama-db.json during site prebuild and load it client-side for search.
  • Replace the site’s Orama Cloud searchbox + chat integration with the UI-components SearchBox wired to the local DB.
  • Remove Orama Cloud configuration, sync workflow, and related scripts/tests.

Reviewed changes

Copilot reviewed 44 out of 46 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
pnpm-lock.yaml Updates lockfile for Orama package changes across workspaces.
packages/ui-components/src/hooks/useOrama.ts Adds a hook to create a local Orama DB and lazy-load it from a JSON snapshot.
packages/ui-components/src/Common/Search/index.tsx Adds a new SearchBox component that uses the local Orama hook.
packages/ui-components/src/Common/Search/index.module.css Styles for the new SearchBox layout/footer.
packages/ui-components/src/Common/Search/Suggestions/index.tsx Removes suggestions component (part of prior chat/suggestions UX).
packages/ui-components/src/Common/Search/Suggestions/index.module.css Removes styles for suggestions component.
packages/ui-components/src/Common/Search/Results/index.tsx Hardcodes search params (limit/threshold/boost) rather than receiving them from site.
packages/ui-components/src/Common/Search/Results/Hit/index.tsx Simplifies hit rendering by removing now-unused chat/search mode handling.
packages/ui-components/src/Common/Search/README.md Removes Orama search component README.
packages/ui-components/src/Common/Search/Chat/Trigger/index.tsx Removes chat trigger UI.
packages/ui-components/src/Common/Search/Chat/Trigger/index.module.css Removes chat trigger styles.
packages/ui-components/src/Common/Search/Chat/Input/index.tsx Removes chat input UI.
packages/ui-components/src/Common/Search/Chat/Input/index.module.css Removes chat input styles.
packages/ui-components/src/Common/Search/Chat/Actions/index.tsx Removes chat actions UI.
packages/ui-components/src/Common/Search/Chat/Actions/index.module.css Removes chat actions styles.
packages/ui-components/package.json Swaps dependencies to use @orama/orama and moves @orama/core.
apps/site/turbo.json Removes Orama Cloud env vars from turbo pipeline config.
apps/site/scripts/orama/index.mjs New script that fetches two remote Orama DBs, merges them, and writes public/orama-db.json.
apps/site/scripts/orama/constants.mjs Adds constants for remote Orama DB URLs.
apps/site/scripts/orama-search/sync-orama-cloud.mjs Removes Orama Cloud sync script.
apps/site/scripts/orama-search/process-documents.mjs Removes old document processing for cloud sync.
apps/site/scripts/orama-search/get-documents.mjs Removes old document fetching logic for cloud sync.
apps/site/scripts/orama-search/tests/process-documents.test.mjs Removes tests for deleted processing logic.
apps/site/scripts/orama-search/tests/get-documents.test.mjs Removes tests for deleted fetching logic.
apps/site/package.json Runs Orama DB generation in prebuild/dev/deploy and removes cloud sync script entry.
apps/site/next.constants.mjs Removes Orama Cloud constants and default query/suggestions config.
apps/site/components/withSearch.tsx New wrapper component that mounts the new SearchBox.
apps/site/components/withNavBar.tsx Re-enables search in the navbar via the new wrapper.
apps/site/components/Common/Searchbox/orama-client.ts Removes Orama Cloud client initialization.
apps/site/components/Common/Searchbox/index.tsx Removes old site-specific searchbox (chat + suggestions + i18n wiring).
apps/site/components/Common/Searchbox/index.module.css Removes old searchbox styles.
apps/site/components/Common/Searchbox/SlidingChatPanel/index.tsx Removes sliding chat panel.
apps/site/components/Common/Searchbox/SlidingChatPanel/index.module.css Removes sliding chat panel styles.
apps/site/components/Common/Searchbox/SearchItem/utils.ts Removes old hit formatting + locale/base-path link building utilities.
apps/site/components/Common/Searchbox/SearchItem/index.tsx Removes old adapter from Orama hit docs to UI-components hit rendering.
apps/site/components/Common/Searchbox/Footer/index.tsx Removes old footer (powered-by + shortcuts).
apps/site/components/Common/Searchbox/Footer/index.module.css Removes old footer styles.
apps/site/components/Common/Searchbox/DocumentLink/index.tsx Removes locale-aware next/link wrapper for search/chat sources.
apps/site/components/Common/Searchbox/ChatSources/index.tsx Removes chat sources UI.
apps/site/components/Common/Searchbox/ChatSources/index.module.css Removes chat sources styles.
apps/site/components/Common/Searchbox/ChatMessage/index.tsx Removes chat message renderer.
apps/site/components/Common/Searchbox/ChatMessage/index.module.css Removes chat message styles.
apps/site/components/Common/Searchbox/ChatInteractions/index.tsx Removes chat interactions container.
apps/site/components/Common/Searchbox/ChatInteractions/index.module.css Removes chat interactions styles.
.gitignore Ignores generated apps/site/public/orama-db.json.
.github/workflows/sync-orama.yml Removes workflow that synced Orama Cloud.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/site/package.json
Comment thread apps/site/scripts/orama/index.mjs Outdated
Comment thread packages/ui-components/src/hooks/useOrama.ts Outdated
Comment thread packages/ui-components/src/Common/Search/Results/index.tsx
Comment thread apps/site/scripts/orama/index.mjs Outdated
Comment thread packages/ui-components/src/hooks/useOrama.ts Outdated
Comment thread packages/ui-components/src/hooks/useOrama.ts
Comment thread packages/ui-components/src/hooks/useOrama.ts Outdated
Comment thread packages/ui-components/package.json
Comment thread apps/site/package.json Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.19%. Comparing base (524e64b) to head (aa96ac2).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8836      +/-   ##
==========================================
- Coverage   73.88%   73.19%   -0.70%     
==========================================
  Files         105      102       -3     
  Lines        8889     8621     -268     
  Branches      326      314      -12     
==========================================
- Hits         6568     6310     -258     
+ Misses       2320     2310      -10     
  Partials        1        1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avivkeller avivkeller marked this pull request as ready for review April 22, 2026 21:34
@avivkeller
Copy link
Copy Markdown
Member Author

@dario-piotrowicz

[WebServer] ✘ [ERROR] Asset too large.
[WebServer] 
[WebServer]   Cloudflare Workers supports assets with sizes of up to 25 MiB. We found a file /home/runner/work/nodejs.org/nodejs.org/apps/site/.open-next/assets/orama-db.json with a size of 28.9 MiB.
[WebServer]   Ensure all assets in your assets directory "/home/runner/work/nodejs.org/nodejs.org/apps/site/.open-next/assets" conform with the Workers maximum size requirement.
[WebServer] 
[WebServer] 
[WebServer] If you think this is a bug then please create an issue at https://github.com/cloudflare/workers-sdk/issues/new/choose
[WebServer] Note that there is a newer version of Wrangler available (4.84.1). Consider checking whether upgrading resolves this error.
[WebServer] ? Would you like to report this error to Cloudflare? Wrangler's output and the error details will be shared with the Wrangler team to help us diagnose and fix the issue.
[WebServer] 🤖 Using fallback value in non-interactive context: no
[WebServer] 🪵  Logs were written to "/home/runner/.config/.wrangler/logs/wrangler-2026-04-22_21-36-59_766.log"

@avivkeller
Copy link
Copy Markdown
Member Author

Perhaps I should GZip the file?

@MattIPv4
Copy link
Copy Markdown
Member

MattIPv4 commented Apr 22, 2026

I imagine you're going to want to chunk that file, GZiping would only be a temporary fix and would bite us again as the site grows.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aa96ac2. Configure here.

}

return save(db);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unstable callback reference resets database on every render

High Severity

The inline async () => { ... } passed to useOrama creates a new function reference on every render. Inside useOrama, this function is used as a useMemo dependency ([loadData]), causing the memo to re-execute on each render. This resets loadPromiseRef.current to null and creates a brand-new empty Orama database, discarding any previously loaded data. The next search will trigger a fresh network fetch of all index files. Wrapping the callback in useCallback (or hoisting it outside the component) would stabilize the reference.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit aa96ac2. Configure here.

pageSectionContent: 2.5,
pageTitle: 1.5,
},
}}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boost references fields absent from new schema

Medium Severity

The hardcoded boost references pageSectionTitle, pageSectionContent, and pageTitle, but the new Orama schema only defines title, description, href, and siteSection. These boost parameters have no effect because they target fields that don't exist in the local database, so search result ranking is effectively disabled.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit aa96ac2. Configure here.

Comment thread packages/ui-components/src/hooks/useOrama.ts
@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

@dario-piotrowicz

[WebServer] ✘ [ERROR] Asset too large.
[WebServer] 
[WebServer]   Cloudflare Workers supports assets with sizes of up to 25 MiB. We found a file /home/runner/work/nodejs.org/nodejs.org/apps/site/.open-next/assets/orama-db.json with a size of 28.9 MiB.
[WebServer]   Ensure all assets in your assets directory "/home/runner/work/nodejs.org/nodejs.org/apps/site/.open-next/assets" conform with the Workers maximum size requirement.
[WebServer] 
[WebServer] 
[WebServer] If you think this is a bug then please create an issue at https://github.com/cloudflare/workers-sdk/issues/new/choose
[WebServer] Note that there is a newer version of Wrangler available (4.84.1). Consider checking whether upgrading resolves this error.
[WebServer] ? Would you like to report this error to Cloudflare? Wrangler's output and the error details will be shared with the Wrangler team to help us diagnose and fix the issue.
[WebServer] 🤖 Using fallback value in non-interactive context: no
[WebServer] 🪵  Logs were written to "/home/runner/.config/.wrangler/logs/wrangler-2026-04-22_21-36-59_766.log"

This file shouldn't be commited IMO. (The Orama DB)

@MattIPv4
Copy link
Copy Markdown
Member

It isn't committed, but does need to be included as an asset in the worker build.

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

I don't see in this PR where we are generating the orama DB for the website-specific pages? 🤔

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

It isn't committed, but does need to be included as an asset in the worker build.

Could be uploaded to R2. Doesn't need to live with the worker, it is a static asset Cloudflare loads in the client-side, not on the bundle.

@MattIPv4
Copy link
Copy Markdown
Member

Yeh, that's possible, but I imagine will then introduce a lot of similar complexities to what we had with the cloud sync flow before -- I think it'd be easier to keep it self-contained to each deployment, assuming we can split the file.

@avivkeller
Copy link
Copy Markdown
Member Author

well, in the latest change I changed it not build a file at all, but load both Orama DB files from the learn and API docs and use that at run time. It adds a second fetch request, but shouldn't make a difference.

@avivkeller
Copy link
Copy Markdown
Member Author

I don't see in this PR where we are generating the orama DB for the website-specific pages? 🤔

What do you mean by this? At the moment, none is created.

There really aren't any website specific pages that need searchable content, right?

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

I don't see in this PR where we are generating the orama DB for the website-specific pages? 🤔

What do you mean by this? At the moment, none is created.

There really aren't any website specific pages that need searchable content, right?

about pages? download pages? ...?

@avivkeller
Copy link
Copy Markdown
Member Author

about pages? download pages? ...?

IMO there's not much that's search-worthy on those pages. Even so, a 95% search complete site is better than a 0% searchable one.

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

about pages? download pages? ...?

IMO there's not much that's search-worthy on those pages. Even so, a 95% search complete site is better than a 0% searchable one.

I don't know chief. This sort of decision needs consensus. Can you explicitly ask this on Slack?

@MattIPv4
Copy link
Copy Markdown
Member

👍 I'd be fine with landing using just the learn + API DBs. I agree with the notion that some searching for most of our content is better than the no searching we have currently.

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

Also @avivkeller would you mind addressing or dismissing cursor comments? 🙏

Copy link
Copy Markdown
Member

@ovflowd ovflowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, SGTM with the changes, just want to wait for consensus on the approach, but code looks good ;)

// bottleneck here, so serializing these would waste ~N× round-trip time.
const indexes = await Promise.all(
Object.entries(ORAMA_DB_URLS).map(async ([key, url]) => {
const fetchedDb = (await fetch(url).then(res =>
Copy link
Copy Markdown
Member

@ovflowd ovflowd Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we not mix await and then here? 🙏

Copy link
Copy Markdown
Contributor

@bmuenzenmeyer bmuenzenmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a functional pass.
Question: does anyone else wish we had this documented within https://github.com/nodejs/nodejs.org/blob/main/docs/technologies.md ? I was surprised to see no mention of Orama in there.

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Apr 23, 2026

Did a functional pass.
Question: does anyone else wish we had this documented within https://github.com/nodejs/nodejs.org/blob/main/docs/technologies.md ? I was surprised to see no mention of Orama in there.

Wait, really? Interesting. We should probably document it now, can also be done in a follow-up PR if needed, @avivkeller!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants