Skip to content

Resolve semantic identifiers in wiki reverse-link parser#23201

Merged
akabiru merged 7 commits into
devfrom
feature/wiki-reverse-link-semantic-id
May 15, 2026
Merged

Resolve semantic identifiers in wiki reverse-link parser#23201
akabiru merged 7 commits into
devfrom
feature/wiki-reverse-link-semantic-id

Conversation

@akabiru
Copy link
Copy Markdown
Member

@akabiru akabiru commented May 13, 2026

Ticket

https://community.openproject.org/wp/74947 Split out of #22976.

What are you trying to accomplish?

The wiki module maintains a parser that scans saved wiki pages for work-package references and creates formal database links from each wiki page back to the referenced WPs (so the WP's "referenced from" tab populates). The parser was numeric-only — semantic-shape references like #PROJ-1 were silently dropped, leaving reverse-link coverage uneven once a project is in semantic mode.

The matcher now accepts the same shape as the inline-text macro — #NNN, #PROJ-1, plus the ## / ### widget variants — and resolves each capture through WorkPackage.where_display_id_in, which mixes primary keys, current identifiers and historical aliases in one query. A rename history continues to produce a reverse link via the alias table.

Screenshots

pr22976-wiki-autocomplete-annotated pr22976-wiki-rendered-annotated pr22976-wiki-reverse-link-tab-annotated

Merge checklist

  • Added/updated tests covering both modes plus the alias-rename case
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

Base automatically changed from refactor/extract-semantic-project-identifier-format to dev May 13, 2026 14:49
@akabiru akabiru force-pushed the feature/wiki-reverse-link-semantic-id branch from c0c279a to 9cf5e72 Compare May 13, 2026 14:52
@akabiru akabiru requested a review from NobodysNightmare May 13, 2026 15:06
@akabiru akabiru marked this pull request as ready for review May 13, 2026 15:06
@akabiru
Copy link
Copy Markdown
Member Author

akabiru commented May 13, 2026

Previous 👍🏾 #22976 (comment)

@akabiru akabiru self-assigned this May 13, 2026
@akabiru akabiru added this to the 17.5.x milestone May 13, 2026
@akabiru akabiru requested a review from thykel May 13, 2026 18:07
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Deploying openproject with PullPreview

Field Value
Latest commit 22c70f9
Job deploy
Status ✅ Deploy successful
Preview URL https://pr-23201-wiki-reverse-lin-ip-178-105-27-142.my.opf.run:443

View logs

akabiru added 4 commits May 13, 2026 21:18
The wiki module maintains a separate work-package macro parser that
produces formal database links from wiki pages back to referenced work
packages. It still spoke numeric-only and silently dropped semantic
identifiers, leaving reverse-link coverage uneven once a project is in
semantic mode.

The matcher now accepts the same shape as the inline-text macro
(`#NNN`, `#PROJ-1`, plus the `##`/`###` widget variants) and resolves
each capture through `WorkPackage.where_display_id_in`, which already
mixes primary keys, current identifiers and historical aliases in one
query. A rename history continues to produce a reverse link via the
alias table.
The wiki reverse-link parser reads identifiers straight out of saved
wiki page text. Without a bound, a multi-megabyte pasted body could
push thousands of values into the alias-aware WP lookup in one query.

`MAX_PRELOAD_IDENTIFIERS = 500` caps the per-save lookup; references
past the cap simply don't get a reverse link recorded.
WP_REF_RE applies `(?!\w)` only to the semantic alternation branch — the
numeric branch is intentionally unbounded so historic shapes like
`#13-blubb` keep matching. There was no spec pinning the historic
behaviour, so a future tightening of the boundary could silently strip
reverse-links from existing wiki content.
The semantic-shape branch of WP_REF_RE inlined the literal pattern
`[A-Z][A-Z0-9_]*-\d+` instead of composing from
`WorkPackage::SemanticIdentifier::SEMANTIC_ID_PATTERN.source`.
A future tightening of the upstream pattern would silently drift away
from the wiki parser's shape unless someone hand-edited this regex too.

The multi-line /x form also restores the readable structure: prefix
class on its own line, branches separated on the alternation, and the
`(?!\w)` boundary visible next to the branch it constrains.
@akabiru akabiru force-pushed the feature/wiki-reverse-link-semantic-id branch from 0d6c7e2 to 2650b65 Compare May 13, 2026 18:18
The 500-identifier cap was added defensively but silently truncated
references past the boundary with no signal to the author. The prior
numeric-only baseline had no cap at all and never showed a problem in
practice; PostgreSQL handles large IN-lists comfortably at the scales
realistically seen in a wiki body. If extreme volumes ever do surface,
the per-row INSERT loop is the true bottleneck, not the SELECT --
a problem worth solving on its own terms rather than masking here.

Constants now sit at the top of the module, above the method body,
keeping the Ruby style convention.
Copy link
Copy Markdown
Contributor

@thykel thykel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Very nice! Leaving you with some fairly low-priority comments.

# Mirrors the prefix character class of the inline-text macro matcher.
# The trailing `(?!\w)` on the semantic branch keeps `#PROJ-1abc` from
# matching `#PROJ-1`; the numeric branch deliberately has no trailing
# boundary to preserve historic behaviour for inputs like `#13-blubb`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of boundary for numeric IDs is interesting and worth a separate non-blocking discussion. Do we actually want to retain that behaviour?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we should retain it, but I'm hesitant to disrupt it in this swing. This test shows how it can be used in classic mode i.e. "Trailing: #1234abc"

# matching `#PROJ-1`; the numeric branch deliberately has no trailing
# boundary to preserve historic behaviour for inputs like `#13-blubb`.
# rubocop:disable Style/RedundantRegexpEscape
WP_REF_RE = /
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread modules/wikis/spec/services/wiki_pages/create_service_spec.rb
find_wp_links(wiki_page.text).uniq.each do |wp_id|
wp = WorkPackage.find_by(id: wp_id)
next if wp.nil?
identifiers = find_wp_links(wiki_page.text).uniq
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we also somehow uniq distinct references pointing to the same WP? e.g. scenarios where 123, OLDPROJ-10 and NEWPROJ-5 all resolve to the same record.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, I suppose we can take care of that at the where_display_id_in scope- as 123, OLDPROJ-10 and NEWPROJ-5 should resolve to the same work package record

Copy link
Copy Markdown
Member Author

@akabiru akabiru May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 TDD'd this — the failing test went green on first run, so no production change.

Added 22c70f9 as regression coverage: a body with #<id>, #<current-identifier>, and #OLD-<seq> (alias) all pointing at the same WP produces exactly one reverse-link row.

The reason it already works: WorkPackage.where_display_id_in composes the three lookups as WHERE id IN (…) OR identifier IN (…) OR EXISTS(alias …). Single SELECT over work_packages, set semantics, so find_each yields the WP once and create! fires once. Adding .distinct on the scope would force a redundant SELECT DISTINCT pass on every call without changing the result.

akabiru added 2 commits May 14, 2026 23:27
Adds a single example exercising both reference shapes in one wiki
body, guarding against any future regex divergence that drops one
branch when the other matches first on the same line.
A wiki body referencing the same work package via its primary key,
current semantic identifier, and a historical alias must still produce
one reverse-link row. Where_display_id_in's OR-composed relation
already guarantees this; the spec keeps it that way.
@akabiru akabiru force-pushed the feature/wiki-reverse-link-semantic-id branch from 1e8cec1 to 22c70f9 Compare May 15, 2026 04:27
@akabiru akabiru merged commit 27ff34f into dev May 15, 2026
15 of 17 checks passed
@akabiru akabiru deleted the feature/wiki-reverse-link-semantic-id branch May 15, 2026 05:43
@github-actions github-actions Bot locked and limited conversation to collaborators May 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants