VLM: support passing --mm-process-config for all models by edwingao28 · Pull Request #18467 · sgl-project/sglang (original) (raw)
Summary of Changes
Hello @edwingao28, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly improves the flexibility and reliability of multimodal model processing by ensuring that the --mm-process-config CLI flag is correctly applied across all supported multimodal language models. It addresses a critical bug where configuration parameters were not being passed to most models and resolves a parameter collision issue from a previous fix attempt. The changes involve refactoring how modality-specific configurations are extracted and injected into HuggingFace processors, adding robust type validation, and introducing comprehensive unit tests to guarantee correct behavior.
Highlights
- Universal
mm_process_configSupport: The--mm-process-configCLI flag now correctly applies processing configurations to all multimodal language models, not just Qwen VL, resolving a long-standing limitation. - Collision-Free Modality Configuration: Implemented a robust solution using HuggingFace's
images_kwargs,videos_kwargs, andaudio_kwargsto pass modality-specific settings, preventing parameter collisions that occurred in previous attempts. - Enhanced Configuration Validation: Added strict type validation for
mm_process_configinserver_args.py, ensuring that the top-level configuration and its modality-specific sub-configurations are always dictionaries. - Refactored Processor Logic: Centralized the extraction and injection of modality-specific configurations within
base_processor.py, simplifying code and ensuring consistent behavior across different multimodal processors. - Comprehensive Testing: Introduced a new dedicated test file (
test_mm_process_config.py) to thoroughly validate the new configuration handling, including server-args validation, config extraction, collision prevention, and regression tests for specific processors.
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.
Changelog
- docs/supported_models/text_generation/multimodal_language_models.md
- Updated documentation to remove the previous caveat that only
qwen_vlsupportedmm_process_config. - Clarified that config entries are now passed as
images_kwargs,videos_kwargs, andaudio_kwargsto the HuggingFace processor, ensuring separation and preventing collisions.
- Updated documentation to remove the previous caveat that only
- python/sglang/srt/multimodal/processors/base_processor.py
- Added logic in
__init__to extractimage_config,video_config, andaudio_configfromserver_args.mm_process_config. - Modified
process_mm_datato inject these extracted configurations into thekwargsdictionary asimages_kwargs,videos_kwargs, andaudio_kwargsusingsetdefault().update()for merging.
- Added logic in
- python/sglang/srt/multimodal/processors/ernie45_vl.py
- Updated the
process_mm_datamethod to correctly injectimages_kwargsandvideos_kwargsbased on the new base processor configuration handling.
- Updated the
- python/sglang/srt/multimodal/processors/midashenglm.py
- Modified the
process_mm_datamethod to properly injectaudio_kwargsusingsetdefault().update(), allowing for merging with existing audio processing settings.
- Modified the
- python/sglang/srt/multimodal/processors/qwen_vl.py
- Removed redundant
image_configandvideo_configextraction from its__init__method, as this is now handled by theBaseMultimodalProcessor.
- Removed redundant
- python/sglang/srt/server_args.py
- Implemented type validation for the
mm_process_configargument, ensuring it is a dictionary and that itsimage,video, andaudiokeys (if present) are also dictionaries. This validation now occurs early in the__post_init__method.
- Implemented type validation for the
- test/registered/core/test_mm_process_config.py
- Added a new test file containing unit tests for
mm_process_configvalidation inServerArgs. - Included tests for correct configuration extraction in
BaseMultimodalProcessor. - Verified the proper injection of
images_kwargs,videos_kwargs, andaudio_kwargsinprocess_mm_dataand confirmed no parameter collisions. - Added regression tests for
Ernie4_5_VLImageProcessorandMiDashengLMMultimodalProcessorto ensure their overridden methods correctly handle config injection. Activity
- Added a new test file containing unit tests for
- This pull request addresses and fixes issue [Bug] VLM: --mm-process-config is currently defined but not working #14672.
- It resolves a parameter collision problem identified by @JustinTong0323 in a previous pull request (Fix mm_process_config not being passed to processors (#14672) #14968), demonstrating iterative improvement.
- The author has ensured code formatting, added unit tests, and updated documentation, as indicated by the checklist. Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review | Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary | Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help | Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
- Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩