Issue 34227: Weighted random.sample() (weighted sampling without replacement) (original) (raw)

Function random.choices(), which appeared in Python 3.6, allows to perform weighted random sampling with replacement. Function random.sample() performs random sampling without replacement, but cannot do it weighted.

I propose to enhance random.sample() to perform weighted sampling. That way all four possibilities will be supported:

Rationale:

Weighted sampling without replacement is a popular problem. There are lot of questions on StackOverflow and similar sites how to implement it. Unfortunately, many proposed solutions are wrong, for example:

https://stackoverflow.com/a/353510/2178047 https://softwareengineering.stackexchange.com/a/233552/161807

or have excessive computational complexity (e.g. quadratic). There are lot of suggestions to use numpy.random.choice() to do that, which supports all four possibilities with a single function:

numpy.random.choice(a, size=None, replace=True, p=None)

But of course this is an overkill to install numpy just to do that.

I think that this should be possible with stdlib, without the need to implement it by yourself or to install numpy. Especially, that it can be implemented in 2 lines (plus 4 lines of error checking), as you can see in the PR.

Thank for the suggestion and patch, but I will decline. To me, the reseviour style algorithm doesn't fit in well (with the slow sorting step and the frequent calls to the random in the key function). For large population sizes (which sample() was designed for), the API and algorithm perform badly. For typical student "urn problems", the better approach is to pre-weight the population by the exact counts specified in the problem. I recommend that you post a recipe for what you want to do, but I don't think the standard library is the place for it.