Phase 2: Add extraction logic for Unicode test data#7213
Open
samyoon20 wants to merge 5 commits into
Open
Conversation
- Add extraction targets to codepoint/build.xml for UCD and Unihan files - Add extraction targets to unicode/build.xml for UCD files - Add ICU4J copy targets to CLDR_11/build.xml - Extract all Unicode versions (10.0.0-17.0.0) for compatibility - Follow dacapo pattern for dependency management - Comprehensive inline documentation of design decisions Related to issue adoptium#5161
- Change from downloading Unihan ZIP archives (6-8 MB each) to individual Unihan_IRGSources.txt files (~1-2 MB each) - Replace unzip operations with simple copy operations in codepoint/build.xml - Reduces bandwidth by 75% for Unihan data (40% overall) - Simpler code: 1 line copy vs 6 line unzip block per version - Faster execution: no decompression needed - Requires corresponding TKG PR update to getDependencies.pl Related to issue adoptium#5161
Implementation: - Load JDK-to-Unicode mapping from UnicodeVers.properties - Calculate version code (JDK_VERSION + '000000') - Extract only the mapped Unicode version - Conditional GB18030 copy based on mapping Error handling: - Validate mapping exists before extraction - Fail fast with clear error if mapping missing - Prevents silent failures or wrong version usage Related to issue adoptium#5161
Add xmlns:if namespace declaration to support if:set attribute for conditional GB18030 file copying. Changes: - Add xmlns:if='ant:if' to project tag in both build.xml files - Enables conditional copy based on Unicode version mapping Related to issue adoptium#5161
Ant doesn't support nested property expansion like ${unicode.${jdk.version.code}}.
Changed to use <propertycopy> task which properly handles dynamic property names.
This fixes the build error:
src '/home/jenkins/externalDependency/lib/UCD-${unicode.${jdk.version.code}}.zip' doesn't exist.
Related to issue adoptium#5161
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 2: Extract and Use Downloaded Unicode Test Data
Summary
Adds extraction logic to MBCS test build files to use Unicode test data downloaded by Phase 1 dependency management system.
Changes Made
functional/MBCS_Tests/codepoint/build.xml
extractUnicodeDatatargetfunctional/MBCS_Tests/unicode/build.xml
extractUnicodeDatatargetfunctional/MBCS_Tests/CLDR_11/build.xml
copyIcu4jDependenciestargetDesign Decisions
Related Work
Testing Plan
Will test on Jenkins Grinder with these parameters:
For MBCSTest_codepoint_0: