How to Make Lanczos-Montgomery Fast on Modern Supercomputers? (original) (raw)

Abstract

This paper deals with the performance analysis of the INM RAS implementation of the block Lanczos-Montgomery method that was used to accomplish the factorization of the RSA-232 number. Other steps of RSA-232 factorization carried out by CADO-NFS open source library are also mentioned. Further adaptation of parallel block Lanczos-Montgomery method for the modern supercomputers is discussed, including implementation of basic operations of the method on GPUs.

RSA-232 factorization was supported by the Moscow Center of Fundamental and Applied Mathematics at INM RAS (Agreement with the Ministry of Education and Science of the Russian Federation No. 075-15-2022-286), block Lanczos performance analysis and its improvements research was supported by the Russian Science Foundation project № 19-11-00338, https://rscf.ru/en/project/19-11-00338/.

Similar content being viewed by others

References

  1. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/359340.359342
  2. Montgomery, P.L.: A block Lanczos algorithm for finding dependencies over GF(2). In: Guillou, L.C., Quisquater, J.-J. (eds.) EUROCRYPT 1995. LNCS, vol. 921, pp. 106–120. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-49264-X_9
    Chapter Google Scholar
  3. Zamarashkin, N., Zheltkov, D.: Block Lanczos–montgomery method with reduced data exchanges. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 15–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55669-7_2
    Chapter Google Scholar
  4. Lenstra, A.K., Lenstra, H.W., Manasse, M.S., Pollard, J.M.: The number field sieve. In: Lenstra, A.K., Lenstra, H.W. (eds.) The development of the number field sieve. LNM, vol. 1554, pp. 11–42. Springer, Heidelberg (1993). https://doi.org/10.1007/BFb0091537
    Chapter Google Scholar
  5. CADO-NFS Homepage. https://gitlab.inria.fr/cado-nfs/cado-nfs. Accessed 30 Apr 2023
  6. Lanczos, C.: Solution of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand. 49, 33–53 (1952). https://doi.org/10.6028/jres.049.006
    Article MathSciNet Google Scholar
  7. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952). https://doi.org/10.6028/jres.049.044
    Article MathSciNet Google Scholar
  8. Saad, Y.: Iterative methods for sparse linear systems. Soc. Ind. Appl. Math. (2003). https://doi.org/10.1137/1.9780898718003
    Article Google Scholar
  9. Kleinjung, T., et al.: Factorization of a 768-Bit RSA modulus. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 333–350. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14623-7_18
    Chapter Google Scholar
  10. Boudot, F., Gaudry, P., Guillevic, A., Heninger, N., Thomé, E., Zimmermann, P.: Comparing the difficulty of factorization and discrete logarithm: a 240-digit experiment. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020. LNCS, vol. 12171, pp. 62–91. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56880-1_3
    Chapter Google Scholar
  11. Arlazarov, V.L., Dinitz, Y.A., Kronrod, M.A., Faradzhev, I.A.: On economical construction of the transitive closure of an oriented graph. Dokl. Akad. Nauk SSSR 194(3), 487–488 (1970)
    MathSciNet Google Scholar
  12. Coppersmith, D.: Solving homogeneous linear equations over via block Wiedemann algorithm. Math. Comp. 62, 333–350 (1994). https://doi.org/10.1090/S0025-5718-1994-1192970-7
    Article MathSciNet Google Scholar
  13. Albrecht M., Bard G., Hart W.: Algorithm 898: Efficient multiplication of dense matrices over GF(2). ACM Trans. Math. Softw. 37(1), Article 9, pp. 1–14 (2010). https://doi.org/10.1145/1644001.1644010
  14. Zamarashkin, N.L., Zheltkov, D.A.: GPU acceleration of dense matrix and block operations for Lanczos method for systems over GF(2). Lobachevskii J. Math. 40, 1881–1991 (2019). https://doi.org/10.1134/S1995080219110337
    Article MathSciNet Google Scholar
  15. Thomé, E.: A modified block Lanczos algorithm with fewer vectors. arXiv preprint 1604.02277 (2016). https://doi.org/10.48550/arXiv.1604.02277
  16. Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Comput. 40(7), 224–238 (2014). https://doi.org/10.1016/j.parco.2013.06.001
    Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

  1. Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia
    Dmitry Zheltkov, Nikolai Zamarashkin & Sergey Matveev
  2. Lomonosov Moscow State University, Moscow, Russia
    Nikolai Zamarashkin & Sergey Matveev

Authors

  1. Dmitry Zheltkov
  2. Nikolai Zamarashkin
  3. Sergey Matveev

Corresponding author

Correspondence toDmitry Zheltkov .

Editor information

Editors and Affiliations

  1. Research Computing Center (RCC), Moscow State University, Moscow, Russia
    Vladimir Voevodin
  2. Research Computing Center (RCC), Moscow State University, Moscow, Russia
    Sergey Sobolev
  3. Russian Academy of Sciences (RAS), Keldysh Institute of Applied Mathematics, Moscow, Russia
    Mikhail Yakobovskiy
  4. Russian Federal Nuclear Center, Sarov, Russia
    Rashit Shagaliev

Rights and permissions

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper

Zheltkov, D., Zamarashkin, N., Matveev, S. (2023). How to Make Lanczos-Montgomery Fast on Modern Supercomputers?. In: Voevodin, V., Sobolev, S., Yakobovskiy, M., Shagaliev, R. (eds) Supercomputing. RuSCDays 2023. Lecture Notes in Computer Science, vol 14388. Springer, Cham. https://doi.org/10.1007/978-3-031-49432-1\_9

Download citation

Keywords

Publish with us