Skip to content

Add language-model-only export option#9670

Open
prefersung wants to merge 1 commit into
modelscope:mainfrom
prefersung:export-language-model-only
Open

Add language-model-only export option#9670
prefersung wants to merge 1 commit into
modelscope:mainfrom
prefersung:export-language-model-only

Conversation

@prefersung

Copy link
Copy Markdown

What does this PR do?

This PR adds an explicit --export_language_model_only option for exporting the language model part of multimodal checkpoints.

When enabled, export:

  • keeps the language model subtree and lm_head
  • maps keys such as model.language_model.* to CausalLM-style model.*
  • writes the text model config instead of the multimodal config
  • saves tokenizer files without copying multimodal preprocessor_config.json

Default multimodal export remains unchanged.

Why is this needed?

Some Qwen3.5-style multimodal checkpoints use a composite namespace such as model.language_model.*, while text-only inference backends expect CausalLM-style names such as model.* plus lm_head.*. Users currently need ad-hoc safetensors renaming scripts to bridge this gap.

This PR makes that bridge an explicit export mode instead of changing the default multimodal save behavior.

This is separate from #9057: that PR addresses default save_pretrained key corruption in newer Transformers versions, while this PR adds a text-only export path for multimodal checkpoints.

Tests

  • python -m py_compile swift/model/utils.py swift/arguments/export_args.py swift/pipelines/export/merge_lora.py swift/pipelines/export/export.py tests/export/test_language_model_export.py
  • git diff --check
  • Direct execution of the new language-model export test functions with local stubs, covering default multimodal export and explicit language-model-only export

Full pytest was not run locally because the available local Python environments do not have pytest installed.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the export_language_model_only feature, which allows users to export only the language model component of a multimodal model along with its text configuration. This is particularly useful for text-only inference backends that expect CausalLM-style weight names. The changes include adding command-line parameters, updating the checkpoint saving logic, and adding comprehensive unit tests. A potential AttributeError was identified in _get_language_model_prefixes where nested getattr calls on a potentially None object could lead to a crash.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread swift/model/utils.py
Comment on lines +292 to +293
model_arch = getattr(getattr(model, 'model_meta', None), 'model_arch', None)
prefixes = getattr(model_arch, 'language_model', None) or []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If model does not have a model_meta attribute (or if it is None), calling getattr(getattr(model, 'model_meta', None), 'model_arch', None) will raise an AttributeError because the first argument to the outer getattr will be None. We should safely retrieve model_meta first before attempting to access model_arch.

    model_meta = getattr(model, 'model_meta', None)
    model_arch = getattr(model_meta, 'model_arch', None) if model_meta is not None else None
    prefixes = getattr(model_arch, 'language_model', None) or []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant