[models] add ViTSTR TF and PT and update ViT to work as backbone by felixdittrich92 · Pull Request #1055 · mindee/doctr (original) (raw)

NOTE:
Unlike the SAR or MASTER architecture, I am not able to fully train the model because ViT requires a lot of data and I cannot muster the computing power. So just a little test this time based on our word generator to show that it trains well.

(doctr-dev) felix@felix-GS66-Stealth-11UH:~/Desktop/doctr$ python3 /home/felix/Desktop/doctr/references/recognition/train_pytorch.py vitstr
2022-09-19 09:35:14.038958: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Namespace(amp=False, arch='vitstr', batch_size=64, device=None, epochs=10, find_lr=False, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', input_size=32, lr=0.001, max_chars=12, min_chars=1, name=None, pretrained=False, push_to_hub=False, resume=None, sched='cosine', show_samples=False, test_only=False, train_path=None, train_samples=1000, val_path=None, val_samples=20, vocab='french', wb=False, weight_decay=0, workers=None)
Validation set loaded in 1.727s (2520 samples in 40 batches)
WARNING:root:Invalid model URL, using default initialization.
Train set loaded in 0.002008s (126000 samples in 1968 batches)
Validation loss decreased inf --> 3.90789: saving state...                                                                                                      
Epoch 1/10 - Validation loss: 3.90789 (Exact: 1.35% | Partial: 2.26%)
Validation loss decreased 3.90789 --> 3.54396: saving state...                                                                                                  
Epoch 2/10 - Validation loss: 3.54396 (Exact: 3.93% | Partial: 5.00%)

(doctr-dev-tf) felix@felix-GS66-Stealth-11UH:~/Desktop/doctr$ python3 /home/felix/Desktop/doctr/references/recognition/train_tensorflow.py vitstr
Namespace(amp=False, arch='vitstr', batch_size=64, epochs=10, find_lr=False, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', input_size=32, lr=0.001, max_chars=12, min_chars=1, name=None, pretrained=False, push_to_hub=False, resume=None, show_samples=False, test_only=False, train_path=None, train_samples=1000, val_path=None, val_samples=20, vocab='french', wb=False, workers=None)
Validation set loaded in 0.002643s (2520 samples in 40 batches)
WARNING:root:Invalid model URL, using default initialization.
Train set loaded in 0.004127s (126000 samples in 1968 batches)
Validation loss decreased inf --> 3.9933: saving state...                                                                                                       
Epoch 1/10 - Validation loss: 3.9933 (Exact: 3.45% | Partial: 3.73%)
Validation loss decreased 3.9933 --> 3.74347: saving state...                                                                                                   
Epoch 2/10 - Validation loss: 3.74347 (Exact: 7.50% | Partial: 7.78%)

Additional:
pred works also: (only tested with a model which reaches ~15% exact after 9 epochs trained with WordGenerator samples)

Word(value='฿฿_', confidence=0.048),
Word(value='4฿^฿', confidence=0.038),
Word(value='|', confidence=0.05),
Word(value='ërwW', confidence=0.031),
Word(value='x¢฿MMo', confidence=0.018),