Feature Refinement to Improve High Resolution Image Inpainting by ankuPRK · Pull Request #112 · advimman/lama (original) (raw)
We are a team of researchers at Geomagical Labs (geomagical.com), a subsidiary of IKEA. We work on pioneering Mixed Reality apps which allow customers to scan photorealistic models of their indoor spaces and re-imagine them with virtual furniture.
In this PR we propose an additional refinement step for LaMa to improve high-resolution inpainting results. We observed that when inpainting large regions at high resolution, LaMa struggles at structure completion. However, at low resolutions, LaMa can infill the same missing region much better. To address this we added an additional refinement step that uses the structure from low resolution predictions to guide higher resolution predictions.
Our approach can work on any inpainting network, and does not require any additional training or network modification.
How to run refinement
To run refinement, simply pass refine=True
in the evaluation step as:
python3 bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output
Evaluation
Here's a few example comparisons, with each triplet showing the masked image, inpainting with LaMa, and inpainting with LaMa using refinement:
Comparison of unrefined and refined images on all test images (kindly shared by you) is available here: https://drive.google.com/drive/folders/15LEa9k_7-dUKb2CPUDuw7e6Zk28KCtzz?usp=sharing
We also performed some numerical evaluation on 1024x1024 size images sampled from [1], using the thin, medium, and thick masks. Results indicate that LaMa+refinement outperforms all the recent inpainting baselines on high resultion inpainting:
Method | FID (thin) | LPIPS (thin) | FID (medium) | LPIPS (medium) | FID (thick) | LPIPS (thick) |
---|---|---|---|---|---|---|
AOTGAN [ |
3]
17.387
0.133
34.667
0.144
54.015
0.184
LatentDiffusion [4]
18.505
0.141
31.445
0.149
38.743
0.172
MAT [6]
16.284
0.137
27.829
0.135
38.120
0.157
ZITS [5]
15.696
0.125
23.500
0.121
31.777
0.140
LaMa-Fourier [2]
14.780
0.124
22.584
0.120
29.351
0.140
Big-LaMa [2]
13.143
0.114
21.169
0.116
29.022
0.140
Big-LaMa+refinement (ours)
13.193
0.112
19.864
0.115
26.401
0.135
Table 1. Performance comparison of various recent inpainting approaches on 1k 1024x1024 size images
Video
We have also created a video to explain the technical details of our approach:
https://www.youtube.com/watch?v=gEukhOheWgE
References
[1]
Unsplash Dataset. https://unsplash.com/data, 2020
[2]
Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K. and Lempitsky, V., 2022. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2149-2159)
[3]
Zeng, Y., Fu, J., Chao, H. and Guo, B., 2022. Aggregated contextual transformations for high-resolution image inpainting. IEEE Transactions on Visualization and Computer Graphics.
[4]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B., 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv preprint arXiv:2112.10752.
[5]
Dong, Q., Cao, C. and Fu, Y., 2022. Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. arXiv preprint arXiv:2203.00867.
[6]
Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y. and Jia, J., 2022. MAT: Mask-Aware Transformer for Large Hole Image Inpainting. arXiv preprint arXiv:2203.15270.