Add language-model-only export option#9670
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the export_language_model_only feature, which allows users to export only the language model component of a multimodal model along with its text configuration. This is particularly useful for text-only inference backends that expect CausalLM-style weight names. The changes include adding command-line parameters, updating the checkpoint saving logic, and adding comprehensive unit tests. A potential AttributeError was identified in _get_language_model_prefixes where nested getattr calls on a potentially None object could lead to a crash.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| model_arch = getattr(getattr(model, 'model_meta', None), 'model_arch', None) | ||
| prefixes = getattr(model_arch, 'language_model', None) or [] |
There was a problem hiding this comment.
If model does not have a model_meta attribute (or if it is None), calling getattr(getattr(model, 'model_meta', None), 'model_arch', None) will raise an AttributeError because the first argument to the outer getattr will be None. We should safely retrieve model_meta first before attempting to access model_arch.
model_meta = getattr(model, 'model_meta', None)
model_arch = getattr(model_meta, 'model_arch', None) if model_meta is not None else None
prefixes = getattr(model_arch, 'language_model', None) or []
What does this PR do?
This PR adds an explicit
--export_language_model_onlyoption for exporting the language model part of multimodal checkpoints.When enabled, export:
lm_headmodel.language_model.*to CausalLM-stylemodel.*preprocessor_config.jsonDefault multimodal export remains unchanged.
Why is this needed?
Some Qwen3.5-style multimodal checkpoints use a composite namespace such as
model.language_model.*, while text-only inference backends expect CausalLM-style names such asmodel.*pluslm_head.*. Users currently need ad-hoc safetensors renaming scripts to bridge this gap.This PR makes that bridge an explicit export mode instead of changing the default multimodal save behavior.
This is separate from #9057: that PR addresses default
save_pretrainedkey corruption in newer Transformers versions, while this PR adds a text-only export path for multimodal checkpoints.Tests
python -m py_compile swift/model/utils.py swift/arguments/export_args.py swift/pipelines/export/merge_lora.py swift/pipelines/export/export.py tests/export/test_language_model_export.pygit diff --checkFull pytest was not run locally because the available local Python environments do not have
pytestinstalled.