[Hackathon 7th] 修复 int 与 Value 取 max 的问题 by megemini · Pull Request #3903 · PaddlePaddle/PaddleSpeech (original) (raw)
PR types
Bug fixes
PR changes
Others
Describe
修复 int 与 Value 取 max 的问题 ~
这里的输入会是:
[Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor, stop_gradient=True), 2, Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor, stop_gradient=True), Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor, stop_gradient=True)] [Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor, stop_gradient=True), 1, 1, Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor, stop_gradient=True)]
另外,paddlespeech/t2s/modules/transformer/embedding.py 中 self.pe = pe 改为 self.pe = paddle.assign(pe),否则提示错误:
... File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/tensor/creation.py", line 2678, in assign C_ops.assign_out(input, output)
Sorry about what's happened. In to_static mode, pd_op.assign_out_'s output variable is a viewed Tensor in dygraph. This will result in inconsistent calculation behavior between dynamic and static graphs. You must find the location of the strided ops be called, and call paddle.assign() before inplace input.If you certainly make sure it's safe, you can set env stride_in_no_check_dy2st_diff to 1.
将 stride_in_no_check_dy2st_diff=0 export 后,也可以正常运行,因此:
- 这里是否需要修改
self.pe = paddle.assign(pe)? - 这个文件中还有多处
self.pe = pe类似的赋值方式,是否一并修改? - 是否使用
paddle.assign(pe, self.pe)的方式?
修改后,可正常执行如下命令:
$ FLAGS_allocator_strategy=naive_best_fit FLAGS_fraction_of_gpu_memory_to_use=0.01 python3 BINDIR/../synthesizee2e.py−−am=fastspeech2aishell3−−amconfig=fastspeech2cantonckpt1.4.0/default.yaml−−amckpt=fastspeech2cantonckpt1.4.0/snapshotiter140000.pdz−−amstat=fastspeech2cantonckpt1.4.0/speechstats.npy−−voc=pwganaishell3−−vocconfig=pwgaishell3ckpt0.5/default.yaml−−vocckpt=pwgaishell3ckpt0.5/snapshotiter1000000.pdz−−vocstat=pwgaishell3ckpt0.5/featsstats.npy−−lang=canton−−text={BIN_DIR}/../synthesize_e2e.py --am=fastspeech2_aishell3 --am_config=fastspeech2_canton_ckpt_1.4.0/default.yaml --am_ckpt=fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz --am_stat=fastspeech2_canton_ckpt_1.4.0/speech_stats.npy --voc=pwgan_aishell3 --voc_config=pwg_aishell3_ckpt_0.5/default.yaml --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy --lang=canton --text=BINDIR/../synthesizee2e.py−−am=fastspeech2aishell3−−amconfig=fastspeech2cantonckpt1.4.0/default.yaml−−amckpt=fastspeech2cantonckpt1.4.0/snapshotiter140000.pdz−−amstat=fastspeech2cantonckpt1.4.0/speechstats.npy−−voc=pwganaishell3−−vocconfig=pwgaishell3ckpt0.5/default.yaml−−vocckpt=pwgaishell3ckpt0.5/snapshotiter1000000.pdz−−vocstat=pwgaishell3ckpt0.5/featsstats.npy−−lang=canton−−text={BIN_DIR}/../../assets/sentences_canton.txt --output_dir=exp/default/test_e2e --phones_dict=fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt --speaker_dict=fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt --spk_id=10 --inference_dir=exp/default/inference ========Args======== am: fastspeech2_aishell3 am_ckpt: fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz am_config: fastspeech2_canton_ckpt_1.4.0/default.yaml am_stat: fastspeech2_canton_ckpt_1.4.0/speech_stats.npy inference_dir: exp/default/inference lang: canton ngpu: 1 nmlu: 0 nnpu: 0 nxpu: 0 output_dir: exp/default/test_e2e phones_dict: fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt pinyin_phone: null speaker_dict: fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt speech_stretchs: null spk_id: 10 text: /home/aistudio/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/../../assets/sentences_canton.txt tones_dict: null use_rhy: false voc: pwgan_aishell3 voc_ckpt: pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz voc_config: pwg_aishell3_ckpt_0.5/default.yaml voc_stat: pwg_aishell3_ckpt_0.5/feats_stats.npy
========Config======== batch_size: 32 f0max: 400 f0min: 110 fmax: 7600 fmin: 80 fs: 24000 max_epoch: 1000 model: adim: 384 aheads: 2 decoder_normalize_before: True dlayers: 4 dunits: 1536 duration_predictor_chans: 256 duration_predictor_kernel_size: 3 duration_predictor_layers: 2 elayers: 4 encoder_normalize_before: True energy_embed_dropout: 0.0 energy_embed_kernel_size: 1 energy_predictor_chans: 256 energy_predictor_dropout: 0.5 energy_predictor_kernel_size: 3 energy_predictor_layers: 2 eunits: 1536 init_dec_alpha: 1.0 init_enc_alpha: 1.0 init_type: xavier_uniform pitch_embed_dropout: 0.0 pitch_embed_kernel_size: 1 pitch_predictor_chans: 256 pitch_predictor_dropout: 0.5 pitch_predictor_kernel_size: 5 pitch_predictor_layers: 5 positionwise_conv_kernel_size: 3 positionwise_layer_type: conv1d postnet_chans: 256 postnet_filts: 5 postnet_layers: 5 reduction_factor: 1 spk_embed_dim: 256 spk_embed_integration_type: concat stop_gradient_from_energy_predictor: False stop_gradient_from_pitch_predictor: True transformer_dec_attn_dropout_rate: 0.2 transformer_dec_dropout_rate: 0.2 transformer_dec_positional_dropout_rate: 0.2 transformer_enc_attn_dropout_rate: 0.2 transformer_enc_dropout_rate: 0.2 transformer_enc_positional_dropout_rate: 0.2 use_scaled_pos_enc: True n_fft: 2048 n_mels: 80 n_shift: 300 num_snapshots: 5 num_workers: 2 optimizer: learning_rate: 0.001 optim: adam seed: 10086 updater: use_masking: True win_length: 1200 window: hann allow_cache: True batch_max_steps: 24000 batch_size: 8 discriminator_grad_norm: 1 discriminator_optimizer_params: epsilon: 1e-06 weight_decay: 0.0 discriminator_params: bias: True conv_channels: 64 in_channels: 1 kernel_size: 3 layers: 10 nonlinear_activation: LeakyReLU nonlinear_activation_params: negative_slope: 0.2 out_channels: 1 use_weight_norm: True discriminator_scheduler_params: gamma: 0.5 learning_rate: 5e-05 step_size: 200000 discriminator_train_start_steps: 100000 eval_interval_steps: 1000 fmax: 7600 fmin: 80 fs: 24000 generator_grad_norm: 10 generator_optimizer_params: epsilon: 1e-06 weight_decay: 0.0 generator_params: aux_channels: 80 aux_context_window: 2 dropout: 0.0 gate_channels: 128 in_channels: 1 kernel_size: 3 layers: 30 out_channels: 1 residual_channels: 64 skip_channels: 64 stacks: 3 upsample_scales: [4, 5, 3, 5] use_weight_norm: True generator_scheduler_params: gamma: 0.5 learning_rate: 0.0001 step_size: 200000 lambda_adv: 4.0 n_fft: 2048 n_mels: 80 n_shift: 300 num_save_intermediate_results: 4 num_snapshots: 10 num_workers: 4 pin_memory: True remove_short_samples: True save_interval_steps: 5000 seed: 42 stft_loss_params: fft_sizes: [1024, 2048, 512] hop_sizes: [120, 240, 50] win_lengths: [600, 1200, 240] window: hann train_max_steps: 1000000 win_length: 1200 window: hann frontend done! W1122 10:37:30.571856 22376 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W1122 10:37:30.573297 22376 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. /home/aistudio/.local/lib/python3.8/site-packages/paddle/nn/layer/layers.py:2194: UserWarning: Skip loading for encoder.embed.1.alpha. encoder.embed.1.alpha receives a shape [1], but the expected shape is []. /home/aistudio/.local/lib/python3.8/site-packages/paddle/nn/layer/layers.py:2194: UserWarning: Skip loading for decoder.embed.0.alpha. decoder.embed.0.alpha receives a shape [1], but the expected shape is []. acoustic model done! voc done! convert am and voc to static model. /home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py:747: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect. You can set full_graph=True, then you can assign input spec.
W1122 10:37:36.563637 22376 pd_api.cc:31283] got different data type, run type promotion automatically, this may cause data type been changed. /home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py:747: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect. You can set full_graph=True, then you can assign input spec.
001 白云山爬过一次嘅,好远啊,爬上去都成两个钟 I1122 10:37:41.684955 22376 pir_interpreter.cc:1564] New Executor is Running ... I1122 10:37:42.039050 22376 pir_interpreter.cc:1591] pir interpreter is running by multi-thread mode ... 001, mel: [163, 80], wave: (119700, 1), time: 5376s, Hz: 22.265625, RTF: 1077.8947368421052. 001 done! 002 睇书咯,番屋企,而家好多人好少睇书噶喎 002, mel: [237, 80], wave: (113100, 1), time: 4007s, Hz: 28.225605190915896, RTF: 850.291777188329. 002 done! 003 因为如果唔考试嘅话,工资好低噶 003, mel: [117, 80], wave: (93600, 1), time: 2628s, Hz: 35.61643835616438, RTF: 673.8461538461539. 003 done! 004 冇固定噶,你中意休边日就边日噶 004, mel: [184, 80], wave: (86400, 1), time: 2738s, Hz: 31.555880204528854, RTF: 760.5555555555555. 004 done!