Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/searchable-canonical-name-prefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@ensnode/ensdb-sdk": patch
"ensindexer": patch
"ensapi": patch
---

Add a materialized `domains.__canonical_name_prefix` column — the first 64 code points of `canonical_name` — to back left-anchored / substring search and NAME ordering. Direct-SQL consumers can now `WHERE __canonical_name_prefix LIKE 'vit%' ORDER BY __canonical_name_prefix` instead of replicating the previous `left(canonical_name, 256)` expression index. `canonical_name` is unchanged and remains the column for exact (`=` / `IN`) matches and display; the Omnigraph `name.starts_with` filter now targets the prefix column while continuing to return `canonical_name`. Reindex required.
Comment thread
shrugs marked this conversation as resolved.
Outdated
Original file line number Diff line number Diff line change
@@ -1,46 +1,23 @@
import { asc, desc, type SQL, sql } from "drizzle-orm";

import { truncateCanonicalNamePrefix } from "@ensnode/ensdb-sdk/ensindexer-abstract";

import di from "@/di";
import type { DomainCursor } from "@/omnigraph-api/lib/find-domains/domain-cursor";
import type { DomainsOrderBy } from "@/omnigraph-api/schema/domain-inputs";
import type { OrderDirection } from "@/omnigraph-api/schema/order-direction";

/**
* Length cap (in characters) of the `canonical_name` prefix used by:
* 1. the `(registry_id, left(canonical_name, N), id)` composite btree on `domains`,
* 2. all NAME-ordered queries' ORDER BY expressions, and
* 3. the value stored in `DomainCursor.value` when ordering by NAME — pre-truncated at
* encode time via {@link truncateNameForCursor} so filter-time comparisons are simple
* tuple compares against the index expression with no per-row `left(...)` re-application.
*
* The btree per-tuple max is ~2712 bytes; with `registry_id` and `id` consuming ~240 bytes of
* that, ~2400 bytes remain for the prefix expression. 256 chars × max 4-byte UTF-8 codepoint =
* 1024 bytes, well under the limit and within the realm of reasonable name lengths (mainnet avg
* is ~126). Queries MUST sort by this same expression for the planner to use the index for
* ordered scan; raw `canonical_name` ORDER BY falls back to a full scan + sort.
*
* An alternative solution is to redefine InterpretedLabel to enforce a maximum byte length of 255 before
* being truncated into an Encoded LabelHash — this mirrors a name's resolvability (must be dns-encodable)
* and allows us to avoid storing spam names. Then we'd also have to produce a b-tree-indexed
* materializedCanonicalName field that's length-capped as well to fit the btree index. Then we could
* query against that column instead of the full InterpretedName. All of that would avoid this
* LEFT(...) expression index and the necessity for the query pattern to match the defined index
* (to avoid the full scan).
*/
export const CANONICAL_NAME_SORT_PREFIX = 256;

/**
* Truncate a `canonicalName` to the cursor / index prefix length. Used when writing the cursor
* value for NAME orderings — callers slice once at encode time so the encoded cursor stays small
* (long names can hit thousands of characters) and `cursorFilter` can compare directly against
* the index expression without re-applying `left(...)` per row.
* Truncate a `canonicalName` to the materialized `__canonical_name_prefix` length when writing the
* `DomainCursor.value` of NAME orderings. Pre-truncating once at encode time keeps the encoded
* cursor small (long names hit thousands of characters) and lets `cursorFilter` compare directly
* against the `__canonical_name_prefix` column with no per-row `left(...)`.
*
* Uses code-point iteration (`[...name]`) rather than `String.slice`, which counts UTF-16 code
* units and would split surrogate pairs. Postgres `left(text, N)` counts characters (code
* points), so this keeps the JS-side and DB-side prefixes byte-identical.
* Delegates to {@link truncateCanonicalNamePrefix} so the cursor prefix is byte-identical to the
* column the NAME index sorts on.
*/
export function truncateNameForCursor(name: string | null): string | null {
Comment thread
shrugs marked this conversation as resolved.
Outdated
return name === null ? null : [...name].slice(0, CANONICAL_NAME_SORT_PREFIX).join("");
return truncateCanonicalNamePrefix(name);
}

/**
Expand All @@ -54,7 +31,7 @@ function getOrderColumn(orderBy: typeof DomainsOrderBy.$inferType): SQL {
const { ensIndexerSchema } = di.context;
switch (orderBy) {
case "NAME":
return sql`left(${ensIndexerSchema.domain.canonicalName}, ${sql.raw(String(CANONICAL_NAME_SORT_PREFIX))})`;
return sql`${ensIndexerSchema.domain.__canonicalNamePrefix}`;
case "DEPTH":
return sql`${ensIndexerSchema.domain.canonicalDepth}`;
case "REGISTRATION_TIMESTAMP":
Expand Down Expand Up @@ -117,8 +94,8 @@ export function cursorFilter(
const value = (() => {
switch (cursor.by) {
case "NAME":
// Already pre-truncated at encode time (see `truncateNameForCursor`), so this matches
// the index expression `left(canonical_name, CANONICAL_NAME_SORT_PREFIX)` directly.
// Already pre-truncated at encode time (see `truncateNameForCursor`), so this matches the
// `__canonical_name_prefix` column the NAME order sorts on directly.
Comment thread
shrugs marked this conversation as resolved.
Outdated
return sql`${cursor.value}::text`;
case "DEPTH":
return sql`${cursor.value}::int`;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,9 @@ const VERSION_TO_DOMAIN_TYPE: Record<
function nameCondition(filter: typeof DomainsNameFilter.$inferInput): SQL {
const { ensIndexerSchema } = di.context;
if (filter.starts_with) {
return ilike(ensIndexerSchema.domain.canonicalName, `${filter.starts_with}%`);
Comment thread
vercel[bot] marked this conversation as resolved.
// prefix / substring search runs against the materialized, length-capped prefix column (backed
// by its GIN trigram index); exact `eq`/`in` below stay on the full `canonicalName`.
Comment thread
shrugs marked this conversation as resolved.
Outdated
return ilike(ensIndexerSchema.domain.__canonicalNamePrefix, `${filter.starts_with}%`);
Comment thread
shrugs marked this conversation as resolved.
Outdated
Comment thread
shrugs marked this conversation as resolved.
}
Comment thread
shrugs marked this conversation as resolved.

if (filter.eq) {
Expand Down
34 changes: 25 additions & 9 deletions apps/ensindexer/src/lib/ensv2/canonicality-db-helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ import type {
RegistryId,
} from "enssdk";

import {
CANONICAL_NAME_PREFIX_LENGTH,
truncateCanonicalNamePrefix,
} from "@ensnode/ensdb-sdk/ensindexer-abstract";
import { isRootRegistryId } from "@ensnode/ensnode-sdk";
import { isBridgedResolver, isBridgedTargetRegistry } from "@ensnode/ensnode-sdk/internal";

Expand Down Expand Up @@ -151,6 +155,7 @@ export async function ensureDomainInRegistry(
await context.ensDb.update(ensIndexerSchema.domain, { id: domainId }).set({
canonical: true,
canonicalName,
__canonicalNamePrefix: truncateCanonicalNamePrefix(canonicalName),
canonicalLabelHashPath,
canonicalPath,
canonicalDepth: canonicalLabelHashPath.length,
Expand Down Expand Up @@ -359,8 +364,9 @@ async function reconcileRegistryCanonicality(

/**
* Propagate a Label heal to every canonical Domain whose `canonicalLabelHashPath` contains
* `labelHash`. Re-renders `canonical_name` by joining each path element to its current
* `label.interpreted` value. `canonicalLabelHashPath` is head-first (root → leaf), but
* `labelHash`. Re-renders `canonical_name` (and its materialized `__canonical_name_prefix`) by
* joining each path element to its current `label.interpreted` value, computing the name once in a
* CTE so the `string_agg` isn't run twice. `canonicalLabelHashPath` is head-first (root → leaf), but
* `canonicalName` is the standard leaf-first ENS string (e.g. "vitalik.eth"), so the
* WITH ORDINALITY rows are joined in DESC ordinal order.
*
Expand All @@ -376,14 +382,23 @@ export async function cascadeLabelHeal(
labelHash: LabelHash,
): Promise<void> {
await context.ensDb.sql.execute(sql`
UPDATE ${ensIndexerSchema.domain} AS d
SET canonical_name = (
SELECT string_agg(l.interpreted, '.' ORDER BY p.ord DESC)
FROM unnest(d.canonical_label_hash_path) WITH ORDINALITY AS p(lh, ord)
JOIN ${ensIndexerSchema.label} l ON l.label_hash = p.lh
)
WITH healed AS (
Comment thread
shrugs marked this conversation as resolved.
Outdated
SELECT
d.id,
(
SELECT string_agg(l.interpreted, '.' ORDER BY p.ord DESC)
FROM unnest(d.canonical_label_hash_path) WITH ORDINALITY AS p(lh, ord)
JOIN ${ensIndexerSchema.label} l ON l.label_hash = p.lh
) AS name
FROM ${ensIndexerSchema.domain} d
WHERE d.canonical = true
AND d.canonical_label_hash_path @> ARRAY[${labelHash}]::text[];
AND d.canonical_label_hash_path @> ARRAY[${labelHash}]::text[]
)
UPDATE ${ensIndexerSchema.domain} AS d
SET canonical_name = h.name,
__canonical_name_prefix = left(h.name, ${CANONICAL_NAME_PREFIX_LENGTH})
FROM healed h
WHERE d.id = h.id;
`);
}

Expand Down Expand Up @@ -494,6 +509,7 @@ async function cascadeCanonicality(
UPDATE ${ensIndexerSchema.domain} AS d
SET canonical = ${nextCanonical},
canonical_name = CASE WHEN ${nextCanonical} THEN dt.new_name ELSE NULL END,
__canonical_name_prefix = CASE WHEN ${nextCanonical} THEN left(dt.new_name, ${CANONICAL_NAME_PREFIX_LENGTH}) ELSE NULL END,
canonical_label_hash_path = CASE WHEN ${nextCanonical} THEN dt.new_path ELSE NULL END,
canonical_path = CASE WHEN ${nextCanonical} THEN dt.new_path_ids ELSE NULL END,
canonical_depth = CASE WHEN ${nextCanonical} THEN array_length(dt.new_path, 1) ELSE NULL END,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,20 @@ Performing SQL queries on the ENS Unigraph requires that you have the `unigraph`

Fetch a Domain by its canonical name. Because `canonical_name` is materialized across both ENSv1 and ENSv2, the same lookup works regardless of protocol version. See [Connect](/docs/integrate/unigraph/examples) for setup.

:::tip[Prefix search & typeahead]
For left-anchored / typeahead search, query the materialized `__canonical_name_prefix` column — the first 64 code points of `canonical_name`, backed by a GIN trigram index — instead of `canonical_name`:

```sql
SELECT id, type, canonical_name, canonical_node, owner_id
FROM ensindexer_0.domains
WHERE __canonical_name_prefix LIKE 'vit%'
Comment thread
shrugs marked this conversation as resolved.
Outdated
ORDER BY __canonical_name_prefix
LIMIT 10;
```

Use `canonical_name` only for exact matches (`canonical_name = 'vitalik.eth'`).
Comment thread
shrugs marked this conversation as resolved.
Outdated
:::

:::note[Canonical fields]
Canonical fields are populated on every Domain reachable from the canonical root, across both ENSv1 and ENSv2 — query them uniformly without branching by `type`. In SQL, these columns are `canonical_name`, `canonical_path`, `canonical_node`, and `canonical_depth`; in `ensdb-sdk`, the corresponding fields are `canonicalName`, `canonicalPath`, `canonicalNode`, and `canonicalDepth`.
:::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,13 +223,14 @@ Domain-Resolver relations are tracked via the Protocol Acceleration plugin, not
| `owner_id` | `text` | yes | If `ENSv1Domain`, the materialized effective owner address. If `ENSv2Domain`, the on-chain owner address (the HCA account address if used). |
| `root_registry_owner_id` | `text` | yes | ENSv1 only: the owner recorded in the root ENSv1 registry. `null` for ENSv2 domains. |
| `canonical` | `boolean` | no | Whether this Domain is part of the canonical nametree. This encodes bi-directional agreement between `domains.subregistry_id` and `registries.canonical_domain_id`, so traversal of the canonical nametree filtered to domains/registries where `canonical=true` is safe and doesn't require edge-authenticating oneself (i.e. don't need to compare `domains.subregistry_id` and `registries.canonical_domain_id` in the query, can just `WHERE canonical = true`). Mirrors the parent Registry's flag. Default `false`. |
| `canonical_name` | `text` | yes | Materialized Canonical Name, `NULL` iff `canonical = false`. Maintained by `canonicality-db-helpers.ts`. Example: `"vitalik.eth"`. |
| `canonical_name` | `text` | yes | Materialized Canonical Name, `NULL` iff `canonical = false`. Maintained by `canonicality-db-helpers.ts`. Use for exact matches (`canonical_name = 'vitalik.eth'`) and display. Example: `"vitalik.eth"`. |
| `__canonical_name_prefix` | `text` | yes | Materialized prefix of `canonical_name` (first 64 code points), `NULL` iff `canonical = false`. Maintained by `canonicality-db-helpers.ts`. Use for left-anchored / substring search (`__canonical_name_prefix LIKE 'vit%'`) and NAME ordering without `canonical_name`'s full-length btree size hazard. The `__` prefix marks it an internal implementation detail — query `canonical_name` for exact matches and display. |
Comment thread
shrugs marked this conversation as resolved.
Outdated
| `canonical_label_hash_path` | `text[]` | yes | Materialized Canonical LabelHashPath, `NULL` iff `canonical = false`. Head-first (root → leaf), i.e. `[labelhash("eth"), labelhash("vitalik")]` for `"vitalik.eth"`. Maintained by `canonicality-db-helpers.ts`. |
| `canonical_path` | `text[]` | yes | Materialized Canonical Domain Path, `NULL` iff `canonical = false`. Head-first (root → leaf), i.e. `[DomainId("eth"), DomainId("vitalik")]` for `"vitalik.eth"`. Maintained by `canonicality-db-helpers.ts`. |
| `canonical_depth` | `integer` | yes | Materialized Canonical Depth, `NULL` iff `canonical = false`. The depth of this Domain in the Canonical Nametree, i.e. the number of Labels in its Canonical Name (e.g. `"eth"` depth 1, `"vitalik.eth"` depth 2). Maintained by `canonicality-db-helpers.ts`. |
| `canonical_node` | `text` | yes | Materialized Canonical Node, `NULL` iff `canonical = false`. The computed Node (via `namehash`) of this Domain's Canonical Name. Maintained by `canonicality-db-helpers.ts`. |

**Indexes:** `type`, `subregistry_id` (partial: non-null only), `owner_id`, `label_hash`, `(registry_id, label_hash)` (composite; leading-column prefix also serves `WHERE registry_id = X` lookups, so no separate `registry_id` index is needed), `(registry_id, left(canonical_name, 256), id)` (composite expression index for registry-scoped `WHERE registry_id = X ORDER BY canonical_name LIMIT N` — the `Domain.subdomains` shape; the 256-char prefix bounds the index tuple under btree's per-tuple max, and NAME-ordered queries must sort by the same `left(...)` expression for the planner to use this index for ordered scan), `canonical_name` (hash, exact match — avoids the btree 8191-byte row-size hazard for spam names), `canonical_name` (GIN trigram for substring / similarity queries), `canonical_label_hash_path` (GIN containment for `cascadeLabelHeal`'s `canonical_label_hash_path @> ARRAY[lh]` lookup), `canonical_node` (hash, for resolver-record → canonical-domain joins), `canonical_depth` (btree, for `ORDER BY canonical_depth` — typeahead and depth-ordered browse).
**Indexes:** `type`, `subregistry_id` (partial: non-null only), `owner_id`, `label_hash`, `(registry_id, label_hash)` (composite; leading-column prefix also serves `WHERE registry_id = X` lookups, so no separate `registry_id` index is needed), `(registry_id, __canonical_name_prefix, id)` (composite for registry-scoped `WHERE registry_id = X ORDER BY __canonical_name_prefix LIMIT N` — the `Domain.subdomains` shape; ordering by the materialized, length-capped prefix column avoids replicating a `left(...)` expression and keeps the index tuple under btree's per-tuple max), `canonical_name` (hash, exact match — avoids the btree 8191-byte row-size hazard for spam names), `__canonical_name_prefix` (GIN trigram for left-anchored `LIKE 'vit%'` and substring search), `canonical_label_hash_path` (GIN containment for `cascadeLabelHeal`'s `canonical_label_hash_path @> ARRAY[lh]` lookup), `canonical_node` (hash, for resolver-record → canonical-domain joins), `canonical_depth` (btree, for `ORDER BY canonical_depth` — typeahead and depth-ordered browse).
Comment thread
shrugs marked this conversation as resolved.
Outdated

**Relations:** belongs to one `registries` record, belongs to one `registries` record (as subregistry), has one `accounts` record (owner), has one `accounts` record (rootRegistryOwner), has one `labels` record, has many `registrations` records.

Expand Down
Loading
Loading