Add LSRN solver (original) (raw)

April 27, 2025, 10:29am 1

Hi everyone!

Over the past year, I’ve been heavily using LSRN, a randomized iterative linear least squares solver. I found it extremely valuable, especially for certain numerical problems that aren’t very well addressed by the existing iterative solvers.

In particular, the current options (e.g., lsqr, lsmr) tend to converge very slowly when the matrix is ill-conditioned. LSRN uses randomized preconditioning to automatically improve the system’s conditioning, and is well-suited for large, strongly over- or under-determined systems.

I was wondering: Would SciPy be interested in adding such a solver? I feel like SciPy could really benefit from incorporating more modern randomized methods (e.g., randomized SVD too), which have proven quite effective for certain linear algebra problems.

If there’s interest, I’d be happy to contribute.

ilayn April 27, 2025, 11:20am 2

Hi @agNikitaras

The rise of randomized linalg is now at a point now that cannot be overlooked. Historically speaking, we have introduced Tygert’s id_dist library in scipy.linalg.interpolative (a bit prematurely if I am honest) back in the day which caused us way too many Fortran77 issues and only very recently we re-mapped all of it back to Cython. While written by a top domain expert, the code had many problems that we don’t need to get into here and was de facto in an unmaintained state. I think that’s the code the acknowledgement section mentions because the ortho projection is something I remember translating during the Cython mapping.

Nevertheless, we currently have CUR/interpolative decomposition and randomized SVD though I think scikit-learn randomized SVD is superior. The ideas are similar but implementation is less convoluted and uses contemporary NumPy random API (we mapped most of it to NumPy PRNGs with Cython code but still). We also have a singleton sketch function clarkson_woodruff_transform sticking out by itself.

The API we currently have is very haphazardly put together and not well-maintained. I am not sure if this is the reason why it is not widely used. Not sure if you already knew about any of these, and if you didn’t that’s also another data point for us to ponder. Hence, I think we need to evaluate the current state a bit more critically before we add more into randomized linalg suite.

A separate library is also an option but often lacks the adoption that SciPy enjoys unless it is offering a significant and comprehensive extra functionality. That discourages many interested folks to contribute. But in turn new code puts a lot of maintenance burden on current SciPy maintainers.

Otherwise you are absolutely right that, a more comprehensive randomized linalg functionality is often sought after within SciPy.

Hi @ilayn,

Thanks a lot for your detailed answer and for providing all the historical context. I have to admit I wasn’t aware of many of these issues. My thinking was a bit more naive: I couldn’t find a suitable solver for my use case, so I ended up writing or adapting older code myself, and thought it might be nice to offer something back to the community.

I’ll keep an eye on SciPy’s progress in this area and would be happy to contribute when the opportunity arises.

ilayn April 27, 2025, 8:53pm 4

Thank you, such efforts are highly appreciated.

When we start deprecating the existing ones, I’ll make sure that we ping you once again.

Hi, @ilayn and @agNikitaras. I work professionally in the field of numerical linear algebra, and primarily on randomized linear algebra. I’ve written a monograph on the subject, developed C++ code in the form of RandBLAS and RandLAPACK, and developed Python code in PARLA.

Please keep me updated if the core SciPy developers want to make progress on bringing RandNLA into scipy. I don’t have a lot of bandwidth for this kind of work right now, but I’m extremely interested in its success.

smarie June 30, 2025, 1:49pm 6

ogrisel July 1, 2025, 9:16am 7

+1 for having randomized SVD / eigh solvers implemented upstream in SciPy. The scikit-learn implementation can serve as a building block to build upon, and it supports array API (when passing power_iteration_normalizer="QR") which is very nice to get benefit from GPU support (for instance via PyTorch).

I am not familiar with LSRN, but from a quick glance at the paper, it looks like it could also be implemented via the array API without the need for low level Cython.