[fix](timestamptz) Preserve correct DST fold branch to go cross the transition point#63034
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
{
"event": "REQUEST_CHANGES",
"commit_id": "81d6e31144bddb0e013415df1e72ec92dfe873fb",
"body": "I found one blocking FE compatibility regression in the new TIMESTAMPTZ literal path. The BE fold-branch handling is focused and the new regression test covers the main DST fold case, but the FE change bypasses existing timezone normalization/range validation for explicit casts.\n\nCritical checkpoint conclusions:\n- Goal and proof: The PR addresses DST fold branch preservation and adds a regression test for explicit -04/-05 values plus date_trunc/floor/ceil behavior. Coverage misses legacy timezone abbreviation and offset-range behavior on the changed FE cast path.\n- Scope/focus: The BE changes are small and localized. The FE shortcut is small, but it bypasses existing parsing semantics instead of reusing the established normalization path.\n- Concurrency/lifecycle: No new concurrency, locking, or special lifecycle concerns found.\n- Config/compatibility: No new config or storage/protocol format changes. There is an incompatible FE behavior change for explicit string-to-TIMESTAMPTZ casts.\n- Parallel paths: Explicit casts and implicit character-literal coercion were checked; the blocking regression is on the explicit StringLikeLiteral cast path that previously normalized CST and validated offset range.\n- Tests/results: New regression output is deterministic and appears generated, but tests do not cover the compatibility cases affected by the parser bypass.\n- Observability/transactions/memory/performance: No new observability, transaction, memory-tracking, or meaningful performance issues found for this change.\n\nUser focus: No additional user-provided review focus was supplied.",
"comments": [
{
"path": "fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/StringLikeLiteral.java",
"line": 152,
"side": "RIGHT",
"body": "This new direct path bypasses the existing castToDateTime normalization and validation for timezone-bearing strings. Two concrete regressions on explicit casts: CAST('2024-01-01 00:00:00 CST' AS TIMESTAMPTZ) used to normalize CST to +08:00 in castToDateTime, but TimestampTzLiteral's Java formatter rejects CST; and CAST('2024-01-01 00:00:00 +15:00' AS TIMESTAMPTZ) now bypasses the existing hourOffset > 14 guard. Please keep the fold-preserving direct construction for valid explicit offsets, but run the same timezone normalization/range validation before constructing the TimestampTzLiteral."
}
]
}
|
/review |
|
run buildall |
There was a problem hiding this comment.
Review Summary
I reviewed the full PR diff and the existing review context. No distinct actionable correctness issues found.
Critical checkpoint conclusions:
- Goal and tests: The PR fixes TIMESTAMPTZ DST fold branch preservation for explicit-offset literals and BE date_trunc/floor/ceil paths. The added regression test covers explicit pre/post-fold values, equality, rendering, truncation, and persisted UTC results.
- Scope and clarity: The change is focused on TIMESTAMPTZ conversion/literal coercion and keeps the implementation small.
- Concurrency/lifecycle: No new shared mutable state, threads, locks, or lifecycle-sensitive static initialization introduced.
- Config/compatibility: No new configs, storage format changes, or FE-BE protocol changes identified.
- Parallel paths: I checked FE string literal coercion/signature search and BE date_trunc/floor/ceil TIMESTAMPTZ paths; the relevant changed paths are covered.
- Error handling/memory: No ignored Status, new large allocations, or memory ownership concerns identified.
- Testing: Regression and FE unit coverage are present. I did not rerun the PR tests in this review runner.
- Observability: No new critical operational path requiring additional logging/metrics.
- User focus: No additional user-provided review focus was specified.
### What problem does this PR solve?
Issue Number: DORIS-25353, DORIS-25467
Related PR: None
Problem Summary: TIMESTAMPTZ values in a DST fold hour could be converted through an ambiguous local time, making explicit post-fold instants use the earlier branch during FE literal handling and BE floor/ceil/date_trunc execution.
### Release note
Fix TIMESTAMPTZ handling for explicit offsets and sub-hour truncation in DST fold hours.
### Check List (For Author)
- Test: Regression test / Unit Test / Build / Static Analysis
- `./build.sh --fe --be -j 100`
- `./run-regression-test.sh --run -d datatype_p0/timestamptz -s test_timestamptz_dst_fold -forceGenOut`
- `./run-regression-test.sh --run -d datatype_p0/timestamptz -s test_timestamptz_dst_fold`
- `mvn -f fe/pom.xml -pl fe-core -Dtest=org.apache.doris.nereids.util.TypeCoercionUtilsTest#testCharacterLiteralTypeCoercion -DfailIfNoTests=false -Dmaven.build.cache.enabled=false test`
- `build-support/run-clang-tidy.sh --files be/src/core/value/timestamptz_value.h be/src/core/value/timestamptz_value.cpp be/src/exprs/function/function_datetime_floor_ceil.cpp be/src/exprs/function/function_other_types_to_date.cpp`
- Behavior changed: Yes. TIMESTAMPTZ DST fold-hour values keep the correct post-fold branch.
- Does this need documentation: No
|
run buildall |
|
/review |
|
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Review completed. I did not find blocking issues in this PR.
Critical checkpoint conclusions:
- Goal/test evidence: The change targets TIMESTAMPTZ DST fold branch preservation for explicit offsets and sub-hour truncation. The new regression test covers pre/post fold rendering, equality against an explicit post-fold literal, date_trunc(second/minute/hour), minute_floor/ceil, hour_floor, and persisted UTC results; the FE coercion unit test covers wildcard TIMESTAMPTZ literal typing.
- Scope/focus: The implementation is focused on FE literal/coercion handling and BE local-to-UTC conversion for ambiguous repeated civil times.
- Concurrency/lifecycle: No new shared mutable state, threading, locks, or non-trivial lifecycle/static initialization hazards were introduced.
- Configuration/compatibility: No new config, storage format, thrift/proto, or mixed-version protocol changes are introduced.
- Parallel paths: Both FE paths visible in this PR (StringLikeLiteral casting and TypeCoercionUtils character literal coercion) now preserve explicit offset literals; BE date_trunc and floor/ceil sub-hour paths use the preferred offset when the result civil time is repeated.
- Data correctness: The BE conversion keeps the original offset only when the target local civil time is actually REPEATED, otherwise it falls back to existing cctz conversion behavior. This preserves the intended UTC instant for the covered DST-fold cases without changing non-ambiguous conversions.
- Error handling/memory/observability: No ignored Status, new allocations requiring Doris memory tracking, or new observability requirements were found.
- Test result review: Added/updated regression outputs are consistent with the tested fold behavior. I did not run tests locally in this review runner.
- User focus: No additional user-provided review focus was present.
|
PR approved by anyone and no changes requested. |
…ransition point (#63034) Problem Summary: TIMESTAMPTZ values in a DST fold hour could be converted through an ambiguous local time, making explicit post-fold instants use the earlier branch during FE literal handling and BE floor/ceil/date_trunc execution. The old code would treat DST repeated hours like `2024-11-03 01:xx` as local civil time without offset when parsing TIMESTAMPTZ with explicit offset in FE, and when BE converts local time to UTC using date_trunc/floor/ceil, it would lose the `-04/-05` offset branch information and default to the earlier branch. After the fix, FE directly constructs TimestampTzLiteral from timezone-aware strings, and BE uses the original UTC instant's offset to select pre/post branches when converting repeated hours back to UTC, thus preserving the correct fold branch. when `time_zone = 'America/New_York'`: ```sql SELECT CAST(CAST('2024-11-03 01:05:00 -05:00' AS TIMESTAMPTZ(6)) AS VARCHAR(64)) AS ts, CAST(date_trunc(CAST('2024-11-03 01:05:30 -05:00' AS TIMESTAMPTZ(6)), 'hour') AS VARCHAR(64)) AS hour_trunc; ``` before: ``` ts hour_trunc 2024-11-03 01:05:00.000000-04:00 2024-11-03 01:00:00.000000-04:00 ``` now: ``` ts hour_trunc 2024-11-03 01:05:00.000000-05:00 2024-11-03 01:00:00.000000-05:00 ```
…ransition point (#63034) Problem Summary: TIMESTAMPTZ values in a DST fold hour could be converted through an ambiguous local time, making explicit post-fold instants use the earlier branch during FE literal handling and BE floor/ceil/date_trunc execution. The old code would treat DST repeated hours like `2024-11-03 01:xx` as local civil time without offset when parsing TIMESTAMPTZ with explicit offset in FE, and when BE converts local time to UTC using date_trunc/floor/ceil, it would lose the `-04/-05` offset branch information and default to the earlier branch. After the fix, FE directly constructs TimestampTzLiteral from timezone-aware strings, and BE uses the original UTC instant's offset to select pre/post branches when converting repeated hours back to UTC, thus preserving the correct fold branch. when `time_zone = 'America/New_York'`: ```sql SELECT CAST(CAST('2024-11-03 01:05:00 -05:00' AS TIMESTAMPTZ(6)) AS VARCHAR(64)) AS ts, CAST(date_trunc(CAST('2024-11-03 01:05:30 -05:00' AS TIMESTAMPTZ(6)), 'hour') AS VARCHAR(64)) AS hour_trunc; ``` before: ``` ts hour_trunc 2024-11-03 01:05:00.000000-04:00 2024-11-03 01:00:00.000000-04:00 ``` now: ``` ts hour_trunc 2024-11-03 01:05:00.000000-05:00 2024-11-03 01:00:00.000000-05:00 ```
Problem Summary: TIMESTAMPTZ values in a DST fold hour could be converted through an ambiguous local time, making explicit post-fold instants use the earlier branch during FE literal handling and BE floor/ceil/date_trunc execution.
The old code would treat DST repeated hours like
2024-11-03 01:xxas local civil time without offset when parsing TIMESTAMPTZ with explicit offset in FE, and when BE converts local time to UTC using date_trunc/floor/ceil, it would lose the-04/-05offset branch information and default to the earlier branch. After the fix, FE directly constructs TimestampTzLiteral from timezone-aware strings, and BE uses the original UTC instant's offset to select pre/post branches when converting repeated hours back to UTC, thus preserving the correct fold branch.when
time_zone = 'America/New_York':before:
now: