Update URI join semantics#630
Conversation
Reject absolute join suffixes and cover suffix edge cases.
| def _is_absolute_path(cls, token: _URI_LIKE) -> bool: | ||
| token_str = cls._token_to_string(token) | ||
|
|
||
| # Note: "://" is used to detect GcsUri and HttpUri prefixes, but will incorrectly |
There was a problem hiding this comment.
I'm not thrilled about this but it seems like the simplest way to detect GcsUri or HttpUri without either (a) storing subclass prefixes in the base class or (b) checking against subclass type directly
There was a problem hiding this comment.
Any reason to not check the subclasses here? e.g. if isinstance(token, (GcsURI, HTTPURI): raise
There was a problem hiding this comment.
My thought was that it's not great to have to hardcode this, in case new subclasses appear.
Also, we probably want to raise when the rhs token is a string like "gs://foo"?
There was a problem hiding this comment.
is_abs() could be an abstract method or property that subclasses could implement.
There was a problem hiding this comment.
As discussed, the check against :// is still there to prevent joins of the sort: Uri.join(LocalUri("foo"), "gs://bar").
| @@ -88,10 +106,6 @@ def __eq__(self, other: Any) -> bool: | |||
| return False | |||
|
|
|||
| def __truediv__(self, other: _URI_LIKE) -> Self: | |||
There was a problem hiding this comment.
Previously, __truediv__ would reject joining a relative LocalUri onto a GCSUri or HTTPUri. In this implementation, we allow it.
There was a problem hiding this comment.
As per my comment above, Its probably safe to disallow this.
There is a small case where a GcSUri can be joined with string. (same with other URI) - which should be the only case this is allowed.
|
/all_test |
GiGL Automation@ 23:15:24UTC : 🔄 @ 23:17:15UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:15:26UTC : 🔄 @ 23:23:42UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:15:28UTC : 🔄 @ 24:43:02UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:15:28UTC : 🔄 @ 24:33:46UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:15:29UTC : 🔄 @ 24:12:52UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:15:31UTC : 🔄 @ 23:24:24UTC : ✅ Workflow completed successfully. |
svij-sc
left a comment
There was a problem hiding this comment.
IIUC, what we want to prevent something like following:
>>> from gigl.common import GcsUri, HttpUri, LocalUri, Uri
>>> http_uri = HttpUri("https://www.something.com")
>>> gcs_uri = GcsUri("gs://sdsds")
>>> Uri.join(gcs_uri, http_uri)
gs://sdsds/https://www.something.comI think more generally whether abs path or not, we should:
- disallow joining URI of two different types.
- allow an abs path of any type to be joined against a string and for it to resolve correctly.
In regards to abs paths:
- disallow two abs paths from joining
- disallow a abs path on the right
| # *or* join HTTP on HTTP or GCS on GCS. | ||
| # This is not backwards compatible, so come around to this later. | ||
| @classmethod | ||
| def _is_absolute_path(cls, token: _URI_LIKE) -> bool: |
There was a problem hiding this comment.
is_abs() could be an abstract method or property that subclasses could implement.
| def _is_absolute_path(cls, token: _URI_LIKE) -> bool: | ||
| token_str = cls._token_to_string(token) | ||
|
|
||
| # Note: "://" is used to detect GcsUri and HttpUri prefixes, but will incorrectly |
There was a problem hiding this comment.
is_abs() could be an abstract method or property that subclasses could implement.
| def __init__(self, uri: Union[str, Path, HttpUri]) -> None: | ||
| self._has_valid_prefix(uri=uri) | ||
| self._has_no_backslash(uri=uri) | ||
| self.is_valid(uri=self._token_to_string(uri), raise_exception=True) |
There was a problem hiding this comment.
Matching the behaviour of the GcsUri constructor which I suspect is the right thing to do.
This simplifies joining logic as now validly constructed GcsUri and HttpUri are always absolute.
| def _is_absolute_path(cls, token: _URI_LIKE) -> bool: | ||
| token_str = cls._token_to_string(token) | ||
|
|
||
| # Note: "://" is used to detect GcsUri and HttpUri prefixes, but will incorrectly |
There was a problem hiding this comment.
As discussed, the check against :// is still there to prevent joins of the sort: Uri.join(LocalUri("foo"), "gs://bar").
|
/all_test |
GiGL Automation@ 24:46:09UTC : 🔄 @ 01:41:11UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:46:09UTC : 🔄 @ 02:14:06UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:46:10UTC : 🔄 @ 02:13:39UTC : ❌ Workflow failed. |
GiGL Automation@ 24:46:12UTC : 🔄 @ 24:55:07UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:46:17UTC : 🔄 @ 24:50:56UTC : ❌ Workflow failed. |
GiGL Automation@ 24:46:18UTC : 🔄 @ 24:48:15UTC : ✅ Workflow completed successfully. |
|
/all_test |
GiGL Automation@ 04:05:25UTC : 🔄 @ 04:16:42UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 04:05:27UTC : 🔄 @ 05:16:54UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 04:05:27UTC : 🔄 @ 04:13:41UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 04:05:29UTC : 🔄 @ 05:29:06UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 04:05:32UTC : 🔄 @ 04:07:24UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 04:05:32UTC : 🔄 @ 05:29:18UTC : ❌ Workflow failed. |
Updates
Uri.join()semantics so only the first token may be absolute. Additional join tokens now raiseTypeErrorwhen they are absolute local paths or URI-looking values such asgs://.../http://.... See more details on updated semantics in the inline comments.Scope of work done
Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO