Skip to content

Update URI join semantics#630

Open
jchmura-sc wants to merge 6 commits intomainfrom
jchmura/uri_join_hardening
Open

Update URI join semantics#630
jchmura-sc wants to merge 6 commits intomainfrom
jchmura/uri_join_hardening

Conversation

@jchmura-sc
Copy link
Copy Markdown
Collaborator

@jchmura-sc jchmura-sc commented May 7, 2026

UpdatesUri.join() semantics so only the first token may be absolute. Additional join tokens now raise TypeError when they are absolute local paths or URI-looking values such as gs://... / http://.... See more details on updated semantics in the inline comments.

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Reject absolute join suffixes and cover suffix edge cases.
@jchmura-sc jchmura-sc self-assigned this May 7, 2026
Comment thread gigl/common/types/uri/uri.py
def _is_absolute_path(cls, token: _URI_LIKE) -> bool:
token_str = cls._token_to_string(token)

# Note: "://" is used to detect GcsUri and HttpUri prefixes, but will incorrectly
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not thrilled about this but it seems like the simplest way to detect GcsUri or HttpUri without either (a) storing subclass prefixes in the base class or (b) checking against subclass type directly

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not check the subclasses here? e.g. if isinstance(token, (GcsURI, HTTPURI): raise

Copy link
Copy Markdown
Collaborator Author

@jchmura-sc jchmura-sc May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that it's not great to have to hardcode this, in case new subclasses appear.

Also, we probably want to raise when the rhs token is a string like "gs://foo"?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_abs() could be an abstract method or property that subclasses could implement.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the check against :// is still there to prevent joins of the sort: Uri.join(LocalUri("foo"), "gs://bar").

@@ -88,10 +106,6 @@ def __eq__(self, other: Any) -> bool:
return False

def __truediv__(self, other: _URI_LIKE) -> Self:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, __truediv__ would reject joining a relative LocalUri onto a GCSUri or HTTPUri. In this implementation, we allow it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comment above, Its probably safe to disallow this.
There is a small case where a GcSUri can be joined with string. (same with other URI) - which should be the only case this is allowed.

@jchmura-sc jchmura-sc changed the title Validate URI join suffixes Update URI join semantics May 7, 2026
Copy link
Copy Markdown
Collaborator Author

@jchmura-sc jchmura-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/all_test

@jchmura-sc
Copy link
Copy Markdown
Collaborator Author

/all_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

GiGL Automation

@ 23:15:24UTC : 🔄 C++ Unit Test started.

@ 23:17:15UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

GiGL Automation

@ 23:15:26UTC : 🔄 Lint Test started.

@ 23:23:42UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

GiGL Automation

@ 23:15:28UTC : 🔄 E2E Test started.

@ 24:43:02UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

GiGL Automation

@ 23:15:28UTC : 🔄 Integration Test started.

@ 24:33:46UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

GiGL Automation

@ 23:15:29UTC : 🔄 Python Unit Test started.

@ 24:12:52UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

GiGL Automation

@ 23:15:31UTC : 🔄 Scala Unit Test started.

@ 23:24:24UTC : ✅ Workflow completed successfully.

Copy link
Copy Markdown
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, what we want to prevent something like following:

>>> from gigl.common import GcsUri, HttpUri, LocalUri, Uri
>>> http_uri = HttpUri("https://www.something.com")
>>> gcs_uri = GcsUri("gs://sdsds")
>>> Uri.join(gcs_uri, http_uri)
gs://sdsds/https://www.something.com

I think more generally whether abs path or not, we should:

  1. disallow joining URI of two different types.
  2. allow an abs path of any type to be joined against a string and for it to resolve correctly.

In regards to abs paths:

  1. disallow two abs paths from joining
  2. disallow a abs path on the right

Comment thread gigl/common/types/uri/uri.py Outdated
# *or* join HTTP on HTTP or GCS on GCS.
# This is not backwards compatible, so come around to this later.
@classmethod
def _is_absolute_path(cls, token: _URI_LIKE) -> bool:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_abs() could be an abstract method or property that subclasses could implement.

def _is_absolute_path(cls, token: _URI_LIKE) -> bool:
token_str = cls._token_to_string(token)

# Note: "://" is used to detect GcsUri and HttpUri prefixes, but will incorrectly
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_abs() could be an abstract method or property that subclasses could implement.

def __init__(self, uri: Union[str, Path, HttpUri]) -> None:
self._has_valid_prefix(uri=uri)
self._has_no_backslash(uri=uri)
self.is_valid(uri=self._token_to_string(uri), raise_exception=True)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching the behaviour of the GcsUri constructor which I suspect is the right thing to do.

This simplifies joining logic as now validly constructed GcsUri and HttpUri are always absolute.

def _is_absolute_path(cls, token: _URI_LIKE) -> bool:
token_str = cls._token_to_string(token)

# Note: "://" is used to detect GcsUri and HttpUri prefixes, but will incorrectly
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the check against :// is still there to prevent joins of the sort: Uri.join(LocalUri("foo"), "gs://bar").

@jchmura-sc
Copy link
Copy Markdown
Collaborator Author

/all_test

@jchmura-sc jchmura-sc marked this pull request as ready for review May 9, 2026 00:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 24:46:09UTC : 🔄 Python Unit Test started.

@ 01:41:11UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 24:46:09UTC : 🔄 E2E Test started.

@ 02:14:06UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 24:46:10UTC : 🔄 Integration Test started.

@ 02:13:39UTC : ❌ Workflow failed.
Please check the logs for more details.

@jchmura-sc jchmura-sc requested a review from svij-sc May 9, 2026 00:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 24:46:12UTC : 🔄 Scala Unit Test started.

@ 24:55:07UTC : ✅ Workflow completed successfully.

@jchmura-sc jchmura-sc requested a review from kmontemayor2-sc May 9, 2026 00:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 24:46:17UTC : 🔄 Lint Test started.

@ 24:50:56UTC : ❌ Workflow failed.
Please check the logs for more details.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 24:46:18UTC : 🔄 C++ Unit Test started.

@ 24:48:15UTC : ✅ Workflow completed successfully.

@jchmura-sc
Copy link
Copy Markdown
Collaborator Author

/all_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 04:05:25UTC : 🔄 Scala Unit Test started.

@ 04:16:42UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 04:05:27UTC : 🔄 Python Unit Test started.

@ 05:16:54UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 04:05:27UTC : 🔄 Lint Test started.

@ 04:13:41UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 04:05:29UTC : 🔄 E2E Test started.

@ 05:29:06UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 04:05:32UTC : 🔄 C++ Unit Test started.

@ 04:07:24UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

GiGL Automation

@ 04:05:32UTC : 🔄 Integration Test started.

@ 05:29:18UTC : ❌ Workflow failed.
Please check the logs for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants