【Hackathon 8th No.28】在 PaddleNLP 中复现 Phi3 by robinbg · Pull Request #10688 · PaddlePaddle/PaddleNLP (original) (raw)

robinbg pushed a commit to robinbg/PaddleNLP that referenced this pull request

Jun 8, 2025

@google-labs-jules

Fix(phi3): Address comments from PR PaddlePaddle#10688

This commit incorporates your suggestions and requirements from the review comments on PR PaddlePaddle#10688 for the Phi3 model implementation.

The following changes were made:

  1. Tokenizer Configuration Cleanup:

    • Removed pretrained_resource_files_map, pretrained_init_configuration, and max_model_input_sizes from paddlenlp/transformers/phi3/tokenizer.py as you requested, to decouple it from specific pre-trained model download paths.
  2. Test Init File Completion:

    • Added a docstring to tests/transformers/phi3/__init__.py to ensure it's a valid and non-empty Python module initialization file.
  3. PretrainedModel Mapping Methods:

    • Implemented _get_name_mappings, _get_tensor_parallel_mappings, and _get_fuse_or_split_param_mappings in the Phi3PreTrainedModel class in paddlenlp/transformers/phi3/modeling.py. These methods are crucial for model conversion and tensor parallelism, based on the Qwen2 model's implementation.
  4. Parallel Strategy Support:

    • Integrated support for sequence parallelism and recomputation into paddlenlp/transformers/phi3/modeling.py.
    • This includes:
      • Configuration flags for enabling/disabling these features.
      • Modifications to Phi3Model, Phi3DecoderLayer, Phi3Attention, and Phi3MLP to handle sequence-parallel linear layers and recomputation logic (full layer, full attention, and core attention granularities).
      • Necessary imports and utilities for sequence parallelism (ScatterOp, GatherOp, sequence-parallel linear layers) and recomputation.
      • Tensor parallelism considerations for weight initialization and layer configurations.
  5. Code Formatting:

    • Applied pre-commit to all modified files to ensure code style consistency and address linting issues. This included removing some unused imports and a duplicated code segment.