Skip to content

fix: relax JSON normalizer type validation#3906

Open
rooperuu wants to merge 5 commits into
dlt-hub:develfrom
rooperuu:fix/json-normalizer-validation
Open

fix: relax JSON normalizer type validation#3906
rooperuu wants to merge 5 commits into
dlt-hub:develfrom
rooperuu:fix/json-normalizer-validation

Conversation

@rooperuu
Copy link
Copy Markdown
Contributor

Description

Support various operations that previously only worked on the dlt.common.normalizers.json.relational normalizer. This includes dlt.common.normalizers.json.relational_no_coercion or any derived custom normalizers such as:

from dlt.common.normalizers.json import relational

class DataItemNormalizer(relational.DataItemNormalizer):
    pass

This also fixes max_nesting configuration on these normalizer subtypes.

import os
import dlt

os.environ["SCHEMA__JSON_NORMALIZER"] = (
    '{"module": "dlt.common.normalizers.json.relational_no_coercion", "config": {"max_nesting": 3}}'
)

@dlt.source
def nothing():
    yield from []

source = nothing()
print(source.max_table_nesting)  # 3

Related Issues

@rooperuu rooperuu changed the title Fix/json normalizer validation fix: relax JSON normalizer type validation Apr 29, 2026
@Travior Travior self-assigned this Apr 29, 2026
@Travior Travior self-requested a review May 5, 2026 08:59
@Travior
Copy link
Copy Markdown
Contributor

Travior commented May 5, 2026

@rooperuu thanks a lot for posting your fix. There's a couple of things I would want to do differently when incorporating this into dlt. I forgot to ask: Are you willing (and have time 😁) to work on this then I can give you a proper review, otherwise I can take this over

@rooperuu
Copy link
Copy Markdown
Contributor Author

rooperuu commented May 5, 2026

@Travior Thanks for taking a look!

I think we need this for our use case, so I can spend some time on this, or whichever is easiest. Let me know what you would like to change and I can do changes accordingly.

Copy link
Copy Markdown
Contributor

@Travior Travior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind we can have a bit cleaner separation of concerns.
I think ensure_this_normalizer should be the place where we check normalizer compatibility (we should probably treat subclass == compatible for now). Then we need minimal to no changes in extraction or schema handling

Comment on lines 432 to 437
@classmethod
def ensure_this_normalizer(cls, norm_config: TJSONNormalizer) -> None:
# make sure schema has right normalizer
present_normalizer = norm_config["module"]
if present_normalizer != cls.__module__:
raise InvalidJsonNormalizer(cls.__module__, present_normalizer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind we should check the "shape" of the normalizer here instead of rigidly checking the module for equality

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to an import and subclass check rather than direct string comparison. Now Superclass.ensure_this_normalizer(subclass_config) should always pass.

Comment thread dlt/common/schema/normalizers.py Outdated

try:
DataItemNormalizer.ensure_this_normalizer(item_normalizer)
if issubclass(json_module.DataItemNormalizer, DataItemNormalizer):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should keep the ensure_this_normalizer check.

Comment thread dlt/extract/source.py Outdated
@root_key.setter
def root_key(self, value: bool) -> None:
data_normalizer = self._schema.data_item_normalizer
assert isinstance(data_normalizer, RelationalNormalizer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this assertion if we correctly validate in dlt.common.normalizers.json.relational

@rooperuu
Copy link
Copy Markdown
Contributor Author

rooperuu commented May 12, 2026

@Travior Thanks for the review. I think what you're suggesting makes sense. I changed the implementation so that ensure_this_normalizer is relaxed from equality to a subclass comparison, and code paths elsewhere are kept essentially unchanged.

Edit: It seems a similar fix has already been applied in #3808. Maybe this PR can be scrapped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data item normalizer type validation is overly strict

2 participants