VLM: support passing --mm-process-config for all models by edwingao28 · Pull Request #18467 · sgl-project/sglang (original) (raw)

Summary of Changes

Hello @edwingao28, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the flexibility and reliability of multimodal model processing by ensuring that the --mm-process-config CLI flag is correctly applied across all supported multimodal language models. It addresses a critical bug where configuration parameters were not being passed to most models and resolves a parameter collision issue from a previous fix attempt. The changes involve refactoring how modality-specific configurations are extracted and injected into HuggingFace processors, adding robust type validation, and introducing comprehensive unit tests to guarantee correct behavior.

Highlights

Universal mm_process_config Support: The --mm-process-config CLI flag now correctly applies processing configurations to all multimodal language models, not just Qwen VL, resolving a long-standing limitation.
Collision-Free Modality Configuration: Implemented a robust solution using HuggingFace's images_kwargs, videos_kwargs, and audio_kwargs to pass modality-specific settings, preventing parameter collisions that occurred in previous attempts.
Enhanced Configuration Validation: Added strict type validation for mm_process_config in server_args.py, ensuring that the top-level configuration and its modality-specific sub-configurations are always dictionaries.
Refactored Processor Logic: Centralized the extraction and injection of modality-specific configurations within base_processor.py, simplifying code and ensuring consistent behavior across different multimodal processors.
Comprehensive Testing: Introduced a new dedicated test file (test_mm_process_config.py) to thoroughly validate the new configuration handling, including server-args validation, config extraction, collision prevention, and regression tests for specific processors.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/supported_models/text_generation/multimodal_language_models.md
- Updated documentation to remove the previous caveat that only qwen_vl supported mm_process_config.
- Clarified that config entries are now passed as images_kwargs, videos_kwargs, and audio_kwargs to the HuggingFace processor, ensuring separation and preventing collisions.
python/sglang/srt/multimodal/processors/base_processor.py
- Added logic in __init__ to extract image_config, video_config, and audio_config from server_args.mm_process_config.
- Modified process_mm_data to inject these extracted configurations into the kwargs dictionary as images_kwargs, videos_kwargs, and audio_kwargs using setdefault().update() for merging.
python/sglang/srt/multimodal/processors/ernie45_vl.py
- Updated the process_mm_data method to correctly inject images_kwargs and videos_kwargs based on the new base processor configuration handling.
python/sglang/srt/multimodal/processors/midashenglm.py
- Modified the process_mm_data method to properly inject audio_kwargs using setdefault().update(), allowing for merging with existing audio processing settings.
python/sglang/srt/multimodal/processors/qwen_vl.py
- Removed redundant image_config and video_config extraction from its __init__ method, as this is now handled by the BaseMultimodalProcessor.
python/sglang/srt/server_args.py
- Implemented type validation for the mm_process_config argument, ensuring it is a dictionary and that its image, video, and audio keys (if present) are also dictionaries. This validation now occurs early in the __post_init__ method.
test/registered/core/test_mm_process_config.py
- Added a new test file containing unit tests for mm_process_config validation in ServerArgs.
- Included tests for correct configuration extraction in BaseMultimodalProcessor.
- Verified the proper injection of images_kwargs, videos_kwargs, and audio_kwargs in process_mm_data and confirmed no parameter collisions.
- Added regression tests for Ernie4_5_VLImageProcessor and MiDashengLMMultimodalProcessor to ensure their overridden methods correctly handle config injection. Activity
This pull request addresses and fixes issue [Bug] VLM: --mm-process-config is currently defined but not working #14672.
It resolves a parameter collision problem identified by @JustinTong0323 in a previous pull request (Fix mm_process_config not being passed to processors (#14672) #14968), demonstrating iterative improvement.
The author has ensured code formatting, added unit tests, and updated documentation, as indicated by the checklist. Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	/gemini review	Performs a code review for the current pull request in its current state.
Pull Request Summary	/gemini summary	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	/gemini help	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩