Skip to content

feat: RBS size reduction#3382

Open
sichanyoo wants to merge 4 commits into
version-3from
feat/rbs-size-reduction
Open

feat: RBS size reduction#3382
sichanyoo wants to merge 4 commits into
version-3from
feat/rbs-size-reduction

Conversation

@sichanyoo
Copy link
Copy Markdown
Contributor

@sichanyoo sichanyoo commented May 13, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Changes & Impact

  • Reduces size of aws-sdk-<service>/sig/ folder in generated service gems by factoring out repeatedly used nested input struct definitions into sig/params.rbs as typealiases that get reused in sig/client.rbs.
    • The type definitions already defined under sig/types.rbs cannot be reused because that would make methods no longer accept hash literals.
  • The biggest beneficiary of this change is aws-sdk-quicksight. Its sig/ folder at the time of this PR (aws-sdk-quicksight v1.179.0) was 35.5MB, but after this refactor it's reduced to 0.925MB, which is a reduction of about 97.3%. With other big services like EC2 and S3, reduction is much less due to those services primarily using flat input shape structure. EC2 gets about 5% reduction to its sig/ (3.34MB => 3.17MB) and S3 gets about 1.5% reduction to its sig/ (445KB => 439KB).
  • With this change, method auto-complete suggestions for QuickSight methods actually render instead of being shortened as ... in RubyMine.
  • Gem build time doesn't seem to be impacted based on local, brief manual testing for QuickSight (running build commands and not seeing even a tick of difference AFAICT).
  • InputTypeAliasCollector walks through shapes, caching names of shapes within operation input shapes that appear more than once & would end up taking more than 5 lines in rendered RBS if it were to be generated fully without typealias. It then returns topologically sorted shapes so that leaf shapes (shapes without dependencies) can get rendered first in RBS, allowing latter shapes to simply refer to leaf shapes in their own definitions.
    • Because current changes determine a shape needs aliasing based on how long it would be if it was fully expanded, there is a possibility of unnecessary typealias generation in a special case where a large shape actually becomes 1~2 lines long after child typealias substitutions so it doesn't need to be aliased. But customers would need to refer to params.rbs to look at child typealias definitions anyways & couple extra type alises don't hurt, so additional complexity isn't worth it.

@sichanyoo sichanyoo requested a review from a team as a code owner May 13, 2026 20:51
Copy link
Copy Markdown
Contributor

@jterapin jterapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far. I'm curious to see the overall size reduction. Could you calculate the before vs after for all of our service gems? I think this will be great to highlight once this gets to prod.

As for the CodeQL failure, will this failure persist after this merge? If so, we should find a way to suppress (if not useful) or find a fix for it.

return 0 if visited.include?(shape_name)

# Cache results to deduplicate calculation for nested structure types that get used multiple times in the model.
@size_cache ||= {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be redundant since it was done at class init?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, forgot to remove it after moving it off to initializer. Removed.

Comment on lines +11 to +17
def initialize(api:, shape:, newline:, options: {})
def initialize(api:, shape:, newline:, options: {}, aliased_shapes: Set.new, alias_namespace: nil)
@api = api
@shape = shape
@newline = newline
@options = options
@aliased_shapes = aliased_shapes
@alias_namespace = alias_namespace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature are getting long here. We can do options = {} hash and deconstruct the has inside. Something like:

def initialize(options = {})
  @api = options[:api]
  ...

We have some examples on this pattern in our codebase.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I just merged the two new fields into existing options field.

Comment on lines 72 to 74
if visited.include?(ref['shape'])
return "untyped"
else
visited = visited + [ref['shape']]
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style:

return "untyped" if visited.include?(ref['shape'])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as recommended.

)
{
'name' => Underscore.underscore(shape_name),
'definition' => builder.struct(shape, ' ', []),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy with calling struct directly here. Looking at the prior version of KeywordArgumentBuilder, every external caller uses .format so I think .struct meant to be an internal detail so we could do the following:

def format_as_alias(indent: '')
  struct(@shape, indent, [])
end

Benefits of this:

  • struct, struct_members, ref_value, etc. can all move under private
  • Alias callers don't need to know about the visited array or indent format
  • Clear two-method public API: format for method signatures, format_as_alias for type alias bodies

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed as suggested (put private keyword below format_as_alias and format so all other methods are now private).

shape: input_shape,
newline: true,
aliased_shapes: @aliased_shapes,
alias_namespace: 'Params',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be off base here, but it looks like alias_namespace is only ever 'Params' or nil so effectively functioning as a boolean flag. If so, could we do a simple boolean and default value will be false?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of substituting with a flag, I just removed that entirely and used string literal in the keyword argument builder when it swaps shape name with type alias.

@@ -114,8 +114,17 @@ def rbs_files(options = {})
prefix = options.fetch(:prefix, '')
codegenerated_plugins = codegen_plugins(prefix)
unless @service.h2_required_setting?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are done for regular Client but what about AsyncClient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops, good catch. I didn't know Ruby published separate client type for bidirectional streaming lol. Added to async clients as well as pointed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants