Feature Refinement to Improve High Resolution Image Inpainting by ankuPRK · Pull Request #112 · advimman/lama (original) (raw)

We are a team of researchers at Geomagical Labs (geomagical.com), a subsidiary of IKEA. We work on pioneering Mixed Reality apps which allow customers to scan photorealistic models of their indoor spaces and re-imagine them with virtual furniture.

In this PR we propose an additional refinement step for LaMa to improve high-resolution inpainting results. We observed that when inpainting large regions at high resolution, LaMa struggles at structure completion. However, at low resolutions, LaMa can infill the same missing region much better. To address this we added an additional refinement step that uses the structure from low resolution predictions to guide higher resolution predictions.

Our approach can work on any inpainting network, and does not require any additional training or network modification.

How to run refinement

To run refinement, simply pass refine=True in the evaluation step as:

    python3 bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output

Evaluation

Here's a few example comparisons, with each triplet showing the masked image, inpainting with LaMa, and inpainting with LaMa using refinement:
image

Comparison of unrefined and refined images on all test images (kindly shared by you) is available here: https://drive.google.com/drive/folders/15LEa9k_7-dUKb2CPUDuw7e6Zk28KCtzz?usp=sharing

We also performed some numerical evaluation on 1024x1024 size images sampled from [1], using the thin, medium, and thick masks. Results indicate that LaMa+refinement outperforms all the recent inpainting baselines on high resultion inpainting:

Method FID (thin) LPIPS (thin) FID (medium) LPIPS (medium) FID (thick) LPIPS (thick)
AOTGAN [

3]

17.387

0.133

34.667

0.144

54.015

0.184

LatentDiffusion [4]

18.505

0.141

31.445

0.149

38.743

0.172

MAT [6]

16.284

0.137

27.829

0.135

38.120

0.157

ZITS [5]

15.696

0.125

23.500

0.121

31.777

0.140

LaMa-Fourier [2]

14.780

0.124

22.584

0.120

29.351

0.140

Big-LaMa [2]

13.143

0.114

21.169

0.116

29.022

0.140

Big-LaMa+refinement (ours)

13.193

0.112

19.864

0.115

26.401

0.135

Table 1. Performance comparison of various recent inpainting approaches on 1k 1024x1024 size images

Video

We have also created a video to explain the technical details of our approach:
https://www.youtube.com/watch?v=gEukhOheWgE

References

[1]
Unsplash Dataset. https://unsplash.com/data, 2020

[2]
Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K. and Lempitsky, V., 2022. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2149-2159)

[3]
Zeng, Y., Fu, J., Chao, H. and Guo, B., 2022. Aggregated contextual transformations for high-resolution image inpainting. IEEE Transactions on Visualization and Computer Graphics.

[4]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B., 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv preprint arXiv:2112.10752.

[5]
Dong, Q., Cao, C. and Fu, Y., 2022. Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. arXiv preprint arXiv:2203.00867.

[6]
Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y. and Jia, J., 2022. MAT: Mask-Aware Transformer for Large Hole Image Inpainting. arXiv preprint arXiv:2203.15270.