【Hackathon 8th No.16】 data_efficient_nopt 论文复现 by xiaoyewww · Pull Request #1111 · PaddlePaddle/PaddleScience (original) (raw)
如果有可复现的精度结果,可以日志截图到github+上传log,这边可以开始测试
目前复现了一下poisson fno 预训练,pd和pt没有固定随机数种子,所以前期loss会有差异,经过几百个step后趋势一致。
复现结果和论文中有点差异,猜测超参哪里有差异,论文上没看到相关描述:
paddle:
Epoch 1 Batch 0 Train Loss 0.3359823226928711 train_l2 loss 1.0018577575683594 train_rmse loss 0.7787399888038635
Total Times. Global step: 0, Batch: 0, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 1.274991750717163, Forward: 0.5737528800964355, Backward: 0.18942928314208984, Optimizer: 0.023642539978027344
Epoch 1 Batch 1 Train Loss 0.3453991115093231 train_l2 loss 0.9957258105278015 train_rmse loss 0.7938657999038696
Total Times. Global step: 1, Batch: 1, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08668327331542969, Forward: 0.019646406173706055, Backward: 0.012578487396240234, Optimizer: 0.02925419807434082
Epoch 1 Batch 2 Train Loss 0.33508121967315674 train_l2 loss 0.9866492748260498 train_rmse loss 0.7812024354934692
Total Times. Global step: 2, Batch: 2, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08733677864074707, Forward: 0.016697168350219727, Backward: 0.011176347732543945, Optimizer: 0.032007694244384766
Epoch 1 Batch 3 Train Loss 0.3373328149318695 train_l2 loss 0.9720052480697632 train_rmse loss 0.7818474769592285
Total Times. Global step: 3, Batch: 3, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08364057540893555, Forward: 0.01732492446899414, Backward: 0.011551856994628906, Optimizer: 0.0324702262878418
Epoch 1 Batch 4 Train Loss 0.3260154128074646 train_l2 loss 0.9649569988250732 train_rmse loss 0.7634750604629517
Total Times. Global step: 4, Batch: 4, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08973145484924316, Forward: 0.017143964767456055, Backward: 0.011417150497436523, Optimizer: 0.0321955680847168
Epoch 1 Batch 5 Train Loss 0.33446627855300903 train_l2 loss 0.9470340609550476 train_rmse loss 0.7787452936172485
Total Times. Global step: 5, Batch: 5, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08418393135070801, Forward: 0.015747785568237305, Backward: 0.010698318481445312, Optimizer: 0.03490447998046875
Epoch 1 Batch 6 Train Loss 0.31356751918792725 train_l2 loss 0.9271667003631592 train_rmse loss 0.7467784285545349
Total Times. Global step: 6, Batch: 6, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08341240882873535, Forward: 0.015986919403076172, Backward: 0.010836601257324219, Optimizer: 0.03387713432312012
Epoch 1 Batch 7 Train Loss 0.32571274042129517 train_l2 loss 0.918164074420929 train_rmse loss 0.7629383206367493
Total Times. Global step: 7, Batch: 7, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08514046669006348, Forward: 0.01694965362548828, Backward: 0.011275529861450195, Optimizer: 0.033127784729003906
Epoch 1 Batch 8 Train Loss 0.3198857605457306 train_l2 loss 0.8946603536605835 train_rmse loss 0.7452360391616821
Total Times. Global step: 8, Batch: 8, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08858108520507812, Forward: 0.01703476905822754, Backward: 0.011278867721557617, Optimizer: 0.032628536224365234
Epoch 1 Batch 9 Train Loss 0.28028005361557007 train_l2 loss 0.8539849519729614 train_rmse loss 0.6707751154899597
Total Times. Global step: 9, Batch: 9, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.08603501319885254, Forward: 0.017177581787109375, Backward: 0.011240959167480469, Optimizer: 0.032387495040893555
Epoch 1 Batch 10 Train Loss 0.303079217672348 train_l2 loss 0.8385870456695557 train_rmse loss 0.7334427833557129
Total Times. Global step: 10, Batch: 10, Rank: 0, Data Shape: [128, 4, 64, 64], Data time: 0.07994580268859863, Forward: 0.01645064353942871, Backward: 0.011035442352294922, Optimizer: 0.03404521942138672
torch:
Epoch 1 Batch 0 Train Loss 0.3359190821647644
Total Times. Batch: 0, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 1.5983459949493408, Forward: 2.262549877166748, Backward: 0.5132086277008057, Optimizer: 0.012012720108032227
Epoch 1 Batch 1 Train Loss 0.34538960456848145
Total Times. Batch: 1, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03945159912109375, Forward: 0.03422045707702637, Backward: 0.045007944107055664, Optimizer: 0.010618925094604492
Epoch 1 Batch 2 Train Loss 0.33507877588272095
Total Times. Batch: 2, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03759288787841797, Forward: 0.011688470840454102, Backward: 0.06759905815124512, Optimizer: 0.010719060897827148
Epoch 1 Batch 3 Train Loss 0.3374229967594147
Total Times. Batch: 3, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03585243225097656, Forward: 0.011916399002075195, Backward: 0.06729936599731445, Optimizer: 0.010690450668334961
Epoch 1 Batch 4 Train Loss 0.32614579796791077
Total Times. Batch: 4, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03865694999694824, Forward: 0.011200428009033203, Backward: 0.06806373596191406, Optimizer: 0.010601520538330078
Epoch 1 Batch 5 Train Loss 0.33475780487060547
Total Times. Batch: 5, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03668999671936035, Forward: 0.011771440505981445, Backward: 0.06752943992614746, Optimizer: 0.010657072067260742
Epoch 1 Batch 6 Train Loss 0.3140023946762085
Total Times. Batch: 6, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.0358736515045166, Forward: 0.011792898178100586, Backward: 0.0673990249633789, Optimizer: 0.010664939880371094
Epoch 1 Batch 7 Train Loss 0.3263723850250244
Total Times. Batch: 7, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03751945495605469, Forward: 0.011594295501708984, Backward: 0.06769108772277832, Optimizer: 0.010838031768798828
Epoch 1 Batch 8 Train Loss 0.3208008110523224
Total Times. Batch: 8, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03836321830749512, Forward: 0.011111259460449219, Backward: 0.06809735298156738, Optimizer: 0.010608196258544922
Epoch 1 Batch 9 Train Loss 0.28148674964904785
Total Times. Batch: 9, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.03702974319458008, Forward: 0.01204824447631836, Backward: 0.06731843948364258, Optimizer: 0.010616540908813477
Epoch 1 Batch 10 Train Loss 0.30438655614852905
Total Times. Batch: 10, Rank: 0, Data Shape: torch.Size([128, 4, 64, 64]), Data time: 0.034720659255981445, Forward: 0.011367082595825195, Backward: 0.06775617599487305, Optimizer: 0.010625123977661133


