Phase 2: Add extraction logic for Unicode test data by samyoon20 · Pull Request #7213 · adoptium/aqa-tests

samyoon20 · 2026-06-22T17:51:42Z

Phase 2: Extract and Use Downloaded Unicode Test Data

Summary

Adds extraction logic to MBCS test build files to use Unicode test data downloaded by Phase 1 dependency management system.

Changes Made

functional/MBCS_Tests/codepoint/build.xml
- Added extractUnicodeData target
- Extracts 19 files: 9 UnicodeData.txt + 9 Unihan_IRGSources.txt + 1 GB18030
- Handles special test file (u32FF) that stays in git
functional/MBCS_Tests/unicode/build.xml
- Added extractUnicodeData target
- Extracts 45 files (5 files × 9 Unicode versions)
- Handles 4 special test files that stay in git
functional/MBCS_Tests/CLDR_11/build.xml
- Added copyIcu4jDependencies target
- Copies 2 ICU4J JARs from ${LIB_DIR}

Design Decisions

Extract ALL Unicode versions (10.0.0-17.0.0) for backward compatibility
Follow dacapo pattern for dependency management consistency
Comprehensive inline documentation explaining design choices
Minimal changes to existing code (only 17 lines modified)

Related Work

Depends on: Add Unicode test data and ICU4J JARS to getDependencies.pl TKG#847 (Phase 1 - downloads)
Addresses: Reduce duplication of test data in MBCS test suite #5161
Phase 1 tested: https://ci.adoptium.net/view/Test_grinder/job/test.getDependency/2885/

Testing Plan

Will test on Jenkins Grinder with these parameters:

For MBCSTest_codepoint_0:

- Add extraction targets to codepoint/build.xml for UCD and Unihan files - Add extraction targets to unicode/build.xml for UCD files - Add ICU4J copy targets to CLDR_11/build.xml - Extract all Unicode versions (10.0.0-17.0.0) for compatibility - Follow dacapo pattern for dependency management - Comprehensive inline documentation of design decisions Related to issue adoptium#5161

- Change from downloading Unihan ZIP archives (6-8 MB each) to individual Unihan_IRGSources.txt files (~1-2 MB each) - Replace unzip operations with simple copy operations in codepoint/build.xml - Reduces bandwidth by 75% for Unihan data (40% overall) - Simpler code: 1 line copy vs 6 line unzip block per version - Faster execution: no decompression needed - Requires corresponding TKG PR update to getDependencies.pl Related to issue adoptium#5161

Implementation: - Load JDK-to-Unicode mapping from UnicodeVers.properties - Calculate version code (JDK_VERSION + '000000') - Extract only the mapped Unicode version - Conditional GB18030 copy based on mapping Error handling: - Validate mapping exists before extraction - Fail fast with clear error if mapping missing - Prevents silent failures or wrong version usage Related to issue adoptium#5161

Add xmlns:if namespace declaration to support if:set attribute for conditional GB18030 file copying. Changes: - Add xmlns:if='ant:if' to project tag in both build.xml files - Enables conditional copy based on Unicode version mapping Related to issue adoptium#5161

Ant doesn't support nested property expansion like ${unicode.${jdk.version.code}}. Changed to use <propertycopy> task which properly handles dynamic property names. This fixes the build error: src '/home/jenkins/externalDependency/lib/UCD-${unicode.${jdk.version.code}}.zip' doesn't exist. Related to issue adoptium#5161

samyoon20 added 5 commits June 22, 2026 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase 2: Add extraction logic for Unicode test data#7213

Phase 2: Add extraction logic for Unicode test data#7213
samyoon20 wants to merge 5 commits into
adoptium:masterfrom
samyoon20:phase2-extract-unicode-data

samyoon20 commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

samyoon20 commented Jun 22, 2026

Phase 2: Extract and Use Downloaded Unicode Test Data

Summary

Changes Made

Design Decisions

Related Work

Testing Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant