How to Make Lanczos-Montgomery Fast on Modern Supercomputers? (original) (raw)

Abstract

This paper deals with the performance analysis of the INM RAS implementation of the block Lanczos-Montgomery method that was used to accomplish the factorization of the RSA-232 number. Other steps of RSA-232 factorization carried out by CADO-NFS open source library are also mentioned. Further adaptation of parallel block Lanczos-Montgomery method for the modern supercomputers is discussed, including implementation of basic operations of the method on GPUs.

RSA-232 factorization was supported by the Moscow Center of Fundamental and Applied Mathematics at INM RAS (Agreement with the Ministry of Education and Science of the Russian Federation No. 075-15-2022-286), block Lanczos performance analysis and its improvements research was supported by the Russian Science Foundation project № 19-11-00338, https://rscf.ru/en/project/19-11-00338/.

References

Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/359340.359342
Montgomery, P.L.: A block Lanczos algorithm for finding dependencies over GF(2). In: Guillou, L.C., Quisquater, J.-J. (eds.) EUROCRYPT 1995. LNCS, vol. 921, pp. 106–120. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-49264-X_9
Chapter Google Scholar
Zamarashkin, N., Zheltkov, D.: Block Lanczos–montgomery method with reduced data exchanges. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 15–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55669-7_2
Chapter Google Scholar
Lenstra, A.K., Lenstra, H.W., Manasse, M.S., Pollard, J.M.: The number field sieve. In: Lenstra, A.K., Lenstra, H.W. (eds.) The development of the number field sieve. LNM, vol. 1554, pp. 11–42. Springer, Heidelberg (1993). https://doi.org/10.1007/BFb0091537
Chapter Google Scholar
CADO-NFS Homepage. https://gitlab.inria.fr/cado-nfs/cado-nfs. Accessed 30 Apr 2023
Lanczos, C.: Solution of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand. 49, 33–53 (1952). https://doi.org/10.6028/jres.049.006
Article MathSciNet Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952). https://doi.org/10.6028/jres.049.044
Article MathSciNet Google Scholar
Saad, Y.: Iterative methods for sparse linear systems. Soc. Ind. Appl. Math. (2003). https://doi.org/10.1137/1.9780898718003
Article Google Scholar
Kleinjung, T., et al.: Factorization of a 768-Bit RSA modulus. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 333–350. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14623-7_18
Chapter Google Scholar
Boudot, F., Gaudry, P., Guillevic, A., Heninger, N., Thomé, E., Zimmermann, P.: Comparing the difficulty of factorization and discrete logarithm: a 240-digit experiment. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020. LNCS, vol. 12171, pp. 62–91. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56880-1_3
Chapter Google Scholar
Arlazarov, V.L., Dinitz, Y.A., Kronrod, M.A., Faradzhev, I.A.: On economical construction of the transitive closure of an oriented graph. Dokl. Akad. Nauk SSSR 194(3), 487–488 (1970)
MathSciNet Google Scholar
Coppersmith, D.: Solving homogeneous linear equations over via block Wiedemann algorithm. Math. Comp. 62, 333–350 (1994). https://doi.org/10.1090/S0025-5718-1994-1192970-7
Article MathSciNet Google Scholar
Albrecht M., Bard G., Hart W.: Algorithm 898: Efficient multiplication of dense matrices over GF(2). ACM Trans. Math. Softw. 37(1), Article 9, pp. 1–14 (2010). https://doi.org/10.1145/1644001.1644010
Zamarashkin, N.L., Zheltkov, D.A.: GPU acceleration of dense matrix and block operations for Lanczos method for systems over GF(2). Lobachevskii J. Math. 40, 1881–1991 (2019). https://doi.org/10.1134/S1995080219110337
Article MathSciNet Google Scholar
Thomé, E.: A modified block Lanczos algorithm with fewer vectors. arXiv preprint 1604.02277 (2016). https://doi.org/10.48550/arXiv.1604.02277
Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Comput. 40(7), 224–238 (2014). https://doi.org/10.1016/j.parco.2013.06.001
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia
Dmitry Zheltkov, Nikolai Zamarashkin & Sergey Matveev
Lomonosov Moscow State University, Moscow, Russia
Nikolai Zamarashkin & Sergey Matveev

Authors

Dmitry Zheltkov
Nikolai Zamarashkin
Sergey Matveev

Corresponding author

Correspondence toDmitry Zheltkov .

Editor information

Editors and Affiliations

Research Computing Center (RCC), Moscow State University, Moscow, Russia
Vladimir Voevodin
Research Computing Center (RCC), Moscow State University, Moscow, Russia
Sergey Sobolev
Russian Academy of Sciences (RAS), Keldysh Institute of Applied Mathematics, Moscow, Russia
Mikhail Yakobovskiy
Russian Federal Nuclear Center, Sarov, Russia
Rashit Shagaliev

Rights and permissions

Copyright information

About this paper

Cite this paper

Zheltkov, D., Zamarashkin, N., Matveev, S. (2023). How to Make Lanczos-Montgomery Fast on Modern Supercomputers?. In: Voevodin, V., Sobolev, S., Yakobovskiy, M., Shagaliev, R. (eds) Supercomputing. RuSCDays 2023. Lecture Notes in Computer Science, vol 14388. Springer, Cham. https://doi.org/10.1007/978-3-031-49432-1\_9

Download citation

.RIS
.ENW
.BIB
DOI: https://doi.org/10.1007/978-3-031-49432-1\_9
Published: 05 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49431-4
Online ISBN: 978-3-031-49432-1
eBook Packages: Computer Science Computer Science (R0)Springer Nature Proceedings Computer Science