Wrap CRT acquire timeout and other transient HTTP errors in IOException#7037
Open
zoewangg wants to merge 1 commit into
Open
Wrap CRT acquire timeout and other transient HTTP errors in IOException#7037zoewangg wants to merge 1 commit into
zoewangg wants to merge 1 commit into
Conversation
Connection-pool acquire timeout and several transient HTTP error codes were surfacing as raw HttpException, so the SDK retry layer treated them as non-retryable. Wrap them in IOException to restore retry behavior, and strengthen the shared LongRunningRequestTestSuite to enforce the contract.
zoewangg
commented
Jun 15, 2026
| // HTTP error codes that the native CRT classifier (CRT.awsIsTransientError) does NOT mark as transient | ||
| // but that the SDK considers recoverable. See enum aws_http_errors in aws-c-http/include/aws/http/http.h | ||
| // for symbolic names. | ||
| private static final Set<Integer> ADDITIONAL_RETRYABLE_ERROR_CODES; |
Contributor
Author
There was a problem hiding this comment.
We will remove the list once those error codes get added to CRT
Contributor
There was a problem hiding this comment.
Where did we derive these seven errors?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Customers using the AWS CRT HTTP client see increased error of
software.amazon.awssdk.crt.http.HttpException: Connection Manager failed to acquire a connection within the defined timeout.and the cause is that this error is not retried bythe SDK retry layer.
Root cause:
CrtUtils.wrapWithIoExceptionIfRetryableonly wrapped errors thatthe native CRT layer classified as transient via
CRT.awsIsTransientError,plus a single special case for
HEALTH_CHECK_FAILURE. The acquire-timeouterror code (
AWS_ERROR_HTTP_CONNECTION_MANAGER_ACQUISITION_TIMEOUT, 2093)falls outside that allowlist, so it surfaced as raw
HttpExceptionand theretry layer treated it as non-retryable. Several other recoverable HTTP
errors (
GOAWAY_RECEIVED,RESPONSE_FIRST_BYTE_TIMEOUT,MAX_CONCURRENT_STREAMS_EXCEEDED, etc.) had the same problem.Internal ref:
Modifications
CrtUtilsTesting
Strengthened suite assertion exercised in
apache-client,apache5-client,url-connection-client,aws-crt-client(sync + async): all pass.netty-nio-clientNettyAsyncHttpClientLongRunningRequestTest: 3 testsincluding the inherited (and overridden) acquire-timeout test.
SdkHttpClientLongRunningRequestTestSuiteandSdkAsyncHttpClientLongRunningRequestTestSuite-executeWhenConnectionAcquireTimeoutAndPoolExhaustedFailsWithinTimeoutBoundpreviously asserted only the timing bound. Strengthened to also assert the
failure cause chain contains
IOException, via a newLongRunningRequestTestSupport.assertFailsWithIoExceptionWithinTimeBoundhelper. Apache, Apache5, URLConnection, CRT sync, and CRT async all pass
the strengthened assertion.
NettyAsyncHttpClientLongRunningRequestTest- Netty does NOT pass thestrengthened assertion;
NettyUtils.decorateExceptionwraps acquire timeoutin a plain
Throwable, the same class of bug we're fixing for CRT here.The Netty fix is intentionally out of scope for this change. Override the
test in the Netty subclass to keep the timing-bound contract while skipping
the IOException assertion, with a
// TODOexplaining why.Types of changes
Checklist
mvn installsucceedsLicense