An average function builtin? (original) (raw)
Ahmed,
I sympathize with many wishes to include all kinds of functionality commonly used in as convenient place as possible.
But Python is growing and changing, and some think way too much or too fast, and has become bloated in an amazing number of ways.
If we add your version of the “mean” under any name, then someone will want the harmonic mean, and someone will want a function to do standard deviation or kurtosis or a trimmed mean where you specify what kind of outliers to remove and it can be endless.
It can make more sense to gather lots of statistical functions into one place under some module names stats or throw them into the math module and so on. You often can end up finding multiple versions such as in numpy too. It then does become necessary for the user to either use colon notation to specify which one they want or do an import and perhaps rename it to whatever they want.
As for speed, others have commented on this and I find it unsettling. It is true that Python is one of many interpreted languages where a purist using mainly functions built on calling other functions all written in the higher language, be it Python or R, will find it to be significantly less efficient. I have seen people write shell scripts that read in data line by line and call a pipeline with a dozen separate programs to process each line. It can take a long time while replacing it with mainly a single program, like an AWK script that runs once as a single process, shortened it from hours to seconds. But that too was largely interpreted and obviously a decently written program using SQL on a database or highly efficient compiled languages like C++ or more modern ones, could result in it completing almost as soon as it starts.
If certain kinds of optimized speed considerations dominate, maybe interpreted languages are not ideal. But what we now have is a bit of a Frankenstein monster where more and more pieces are grafted in not just from C or C++ but libraries from FORTRAN or RUST. There is nothing necessarily wrong with that, albeit it may remove some flexibility.
I have often wanted to use some code as in a function as a model I can study and adapt. Often I find a few dozen lines written that I can look at, see how they did something and borrow it into something I create, or make my own modified copy where I added some features not in the original.
But, more and more often, I end up seeing it is “.internal” as in it just calls a compiled library function. It is now a black box. Yes, with some work I can find the code in some other language, and perhaps borrow parts to make my own version that I then have to figure out how to link in and so on. Most people will just give up!
This is not to say that you cannot make a compiled bit in C++ that supports six dozen scenarios just in case so perhaps I just need to read the manual page and perhaps find out how to ask nicely and get what I want. But with a goal for efficiency, you probably would find it easier to make the function compile into a small amount of memory and run fast.
As just an example of what I mean, here is an outline. I often have a function I call that I would like to do a bit more processing before returning a result, perhaps in the mean example, one that removes what I consider not available (NA) or things like Inf before doing things like calculating the mean. Or I may want it to first trim away outliers or do some kind of rounding. One way to do it is to add additional optional arguments that the function would ignore but have it passed along as a … to other functions it calls which do know what to do with it, perhaps ones I wrote.
Your proposed function would not likely do the things I mention. That is not necessarily bad but if you also look at python as a sort of interactive teaching tool, …
But my personal view is that when python was created, it made decisions that later users did not appreciate. Lists are nice abstract structures that can do anything. But doing serious arithmetical operations on them is a pain. A pretty concept like a list of lists to represent a matrix, let alone deeper nested list structures to represent 3-D and 4-D matrices where nothing seriously checks data integrity, are not really ideal. Vectors of a sort, as in other languages, have advantages especially in speed. So do dataframe objects and more. Hence, to do serious computing of some kinds, some have left python entirely for languages designed differently, such as R, or have had to add modules like numpy and pandas that extend the language. But note that as useful as these are, and often heavily used also by other packages that do statics or “AI” and so on, they are not in the python core.
Would you argue your proposal will bring more bang for the buck than if something like that became standard?
I actually recently needed to locate functions to do the mean and sd and so on and a brief search told me what to include and use. Many languages are now designed in modular fashion and often the core is mainly a bootstrap for loading what is actually needed in your program.
Perhaps there can be an intermediate idea here. As an example, in R, too many programs used a growing set of packages that came to be called the tidyverse. Any one program might start early on by including one after another such package even if hardly anyone ever used them all in the same program. So, someone set up a package you could load, called tidyverse, which did little more than load a whole bunch of the commonly used packages as a bundle. Your program got a tad simpler and you no longer knew or cared which function was from which package but if you used other packages, you still needed to add them one at a time.
So, can python have a similar concept? Can you start with a relatively small and sparse core and then pick one or a few add-in clusters of modules? Right now, most versions of python simply load such a batch whether you need them or not and there is contention on who gets to be in whatever “core” python means. But could you easily just load “statistics_group” or “text_processing_group” so that a few lines made your own “core” and then the discussion could change to lobbying for your favorite functionality to be included in that group.
Please note the above comments are not against adding or changing for better versions. Some things can and should be done to keep python competitive and usable by many people. We just cannot put everything imaginable in, let alone near the core.