【Hackathon 8th No.28】在 PaddleNLP 中复现 Phi3 by robinbg · Pull Request #10688 · PaddlePaddle/PaddleNLP (original) (raw)
robinbg pushed a commit to robinbg/PaddleNLP that referenced this pull request
Fix(phi3): Address comments from PR PaddlePaddle#10688
This commit incorporates your suggestions and requirements from the review comments on PR PaddlePaddle#10688 for the Phi3 model implementation.
The following changes were made:
Tokenizer Configuration Cleanup:
- Removed
pretrained_resource_files_map,pretrained_init_configuration, andmax_model_input_sizesfrompaddlenlp/transformers/phi3/tokenizer.pyas you requested, to decouple it from specific pre-trained model download paths.
- Removed
Test Init File Completion:
- Added a docstring to
tests/transformers/phi3/__init__.pyto ensure it's a valid and non-empty Python module initialization file.
- Added a docstring to
PretrainedModel Mapping Methods:
- Implemented
_get_name_mappings,_get_tensor_parallel_mappings, and_get_fuse_or_split_param_mappingsin thePhi3PreTrainedModelclass inpaddlenlp/transformers/phi3/modeling.py. These methods are crucial for model conversion and tensor parallelism, based on the Qwen2 model's implementation.
- Implemented
Parallel Strategy Support:
- Integrated support for sequence parallelism and recomputation into
paddlenlp/transformers/phi3/modeling.py. - This includes:
- Configuration flags for enabling/disabling these features.
- Modifications to
Phi3Model,Phi3DecoderLayer,Phi3Attention, andPhi3MLPto handle sequence-parallel linear layers and recomputation logic (full layer, full attention, and core attention granularities). - Necessary imports and utilities for sequence parallelism (ScatterOp, GatherOp, sequence-parallel linear layers) and recomputation.
- Tensor parallelism considerations for weight initialization and layer configurations.
- Integrated support for sequence parallelism and recomputation into
Code Formatting:
- Applied
pre-committo all modified files to ensure code style consistency and address linting issues. This included removing some unused imports and a duplicated code segment.
- Applied