[Tutor] Python vs. MATLAB (original) (raw)
Steven D'Aprano steve at pearwood.info
Tue Dec 7 00:16:43 CET 2010
- Previous message: [Tutor] Python vs. MATLAB
- Next message: [Tutor] Python vs. MATLAB
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Joel Schwartz wrote:
Chris,
Can you say more about number (7) in your list? What does "pass by value" mean and what are the alternatives?
Oh boy, is that a can of worms... and this is going to be a long post. You might want to go make yourself a coffee first :)
Pass by whatever (also written as "call by ...") is one of those frustrating topics where people have added vast amounts of confusion where no confusion need exist.
Take a variable, "x", and give it a value, say, 42. When you pass that variable to a function, func(x), what happens?
Such a simple question, but you wouldn't believe how many angry words have been written about it.
The problem is that there are a whole lot of answers to that question, used by many different programming languages, but most people are only familiar with two: pass by value, and pass by reference. This is particularly strange since most popular modern languages, like Ruby, Python and Java, don't use either of those! Nevertheless, people have got it in their head that there are only two calling conventions, and so they hammer the square peg of the language's actual behaviour until it will fit one or the other of the round holes in their mind.
So there are huge flame wars about whether Python is pass by value or pass by reference, with some people wrongly claiming that Python is p-b-v for "simple" objects like numbers and strings and p-b-r for "complicated" objected like lists. This is nonsense.
But that pales before the craziness of the Java community, that claims that Java is pass by value so long as you understand that that values being passed are references and not the value of the variable. But don't make the mistake of thinking that makes Java pass by reference! Pass by value-which-is-actually-a-reference is completely different from pass by reference. Only the Java people don't call it that, they just call it pass by value, even though Java's behaviour is different from pass by value in common languages like C, Pascal and Visual Basic.
How is this helpful? Even if technically true, for some definition of "reference" and "value", it is gobbledygook. It's as helpful as claiming that every language ever written, without exception, is actually pass by flipping bits. No values are actually passed anywhere, it's all just flipping bits in memory.
100% true, and 100% useless.
Translated into Python terms, the Java argument is this:
Take a variable, call it "x". When you say x = 42, the value of x is not actually the number 42, like naive non-Java programmers might think, but some invisible reference to 42. Then when you call function(x), what gets passed to the function is not 42 itself (what Pascal or C programmers call "pass by value"), nor is it a reference to the variable "x" (which would be "pass by reference"), but the invisible reference to 42. Since this is the "true" value of x, Java is pass by value.
(The situation is made more complicated because Java actually does pass ints like 5 by value, in the C or Pascal sense. The above description should be understood as referring to "boxed" integers, rather than unboxed. If this means nothing to you, be glad. All you need know is that in Java terms, all Python integers are boxed.)
And note that in the Ruby community, they call the exact same behaviour "pass by reference". And then folks wonder why people get confused.
All this because people insist on the false dichotomy that there are only two argument passing conventions, pass by value and pass by reference. But as this Wikipedia page shows, there are actually many more than that:
http://en.wikipedia.org/wiki/Evaluation_strategy
Let's go back to my earlier question. You have a variable x = 42, and you pass it to a function. What happens?
In Pascal, or C, the compiler keeps a table mapping variable names to fixed memory addresses, like this:
Variable Address ======== ======= x 10234 y 10238 z 10242
The command "x = 42" stuffs the value 42 into memory address 10234. Then, when you call func(x), the compiler looks up memory address 10234 and copies whatever it finds (in this case, 42) into another memory address (say, 27548), where func can see it. This is pass by value.
What this means is the the variable x inside the function is not the same as the variable x outside the function. Inside the function, x has the address 27548, and the command "x = x + 1" will store 43 there, leaving the outside x at 10234 unchanged. This is normally a good thing.
The classic test of pass by value is, does the value get copied when you pass it to a function? We can test Python to see if it copies values:
def func(arg): ... print(id(arg)) ... x = 42 print(id(x)) 135996112 func(x) 135996112
The local variable arg and the global variable x have the same ID, which means they are the same object. This is conclusive proof that Python does not make a copy of x to pass to the function. So Python is not pass by value.
Pass by value is nice and fast for values like 42, which are ints and therefore small. But what if x is (say) an array of a million numbers? Then the compiler has to copy all one million numbers, which is expensive, and your function will be slow.
One alternative is pass by reference: instead of copying 42 to memory address 27548 (which is where func looks), the compiler can pass a reference to the variable x. That's as simple as passing 10234 instead. The compiler then treats that as the equivalent of "See here..." and follows that reference to get to the actual value wanted. Because addresses are small numbers, this is fast, but it means that the local variable and the global variable are, in fact, the same variable. This means that func can now operate on the variable x directly: if func uses call by reference, and func executes "x = x + 1", then the value 43 will be written into memory address 10234.
Pascal and Visual Basic (and Perl, I think) have compiler support for pass by reference. In C, you have to fake it by hand by passing a pointer to the value, and then doing your own re-direction. Except for arrays, which are handled differently, to the confusion of all.
The classic test of pass by reference is to write a "swap" function -- can you swap the value of two variables without returning them? In other words, something like this:
a = 1 b = 2 swap(a, b) assert a == 2 and b == 1
In Python, you would swap two values like this: a, b = b, a
but we want to do it inside a function. Doing this would be cheating:
a, b = swap(a, b)
because that explicitly re-assigns the variables a and b outside of the function. To be pass by reference, the swap must be done inside the function.
There's no way of writing a general purpose swap function like this in Python. You can write a limited version:
def swap(): global a, b a, b = b, a
but that doesn't meet the conditions of the test: swap must take the variables to swap as arguments, and not hard-coded into the function.
Python is not pass by value, because it doesn't make a copy of the value before passing it to the function. And it's not pass by reference, because it doesn't pass a reference to the variable itself: assignment inside the function doesn't effect the outer variable, only the inner variable (except in the limited case that you use the global statement). So Python is neither pass by value nor pass by reference.
So what does Python actually do?
Well, to start with Python doesn't have variables in the C or Pascal sense. There is no table of variable:address available to the compiler. Python's model is of name binding, not fixed memory addresses. So Python keeps a global dictionary of names and objects:
{'x': <integer object 42>, 'y': <string object 'hello world'>, 'z': <list object [1,2,3]>, }
The general name for this is "namespace".
(Aside: you can access the global namespace with the globals() function. Don't mess with it unless you know what you're doing.)
Functions have access to the global namespace, but they also get their own local namespace. You can access it with the locals() function.
(Aside: as an optimization, CPython doesn't use a real dictionary for locals. Consequently, the dict returned by locals() is a copy, not the real thing, and you can't modify local variables by messing with locals(). Other Pythons may do differently.)
So when you have this function:
def func(arg): # do stuff in here...
and then call func(x), Python initialises the function and creates a local namespace containing:
('arg': <integer object 42>}
No copy is made -- it is very fast to add the object to the namespace, regardless of how big or small the object is. (Implementation note: CPython does it by using pointers. Other Pythons may use different strategies, although it's hard to think of one which would be better.) So the local arg and the global x share the same value: 42. But if you do an assignment inside the function, say:
arg += 1
43 will be stored in the local namespace, leaving the global x untouched.
So far so good -- Python behaves like pass by value, when you assign to a local variable inside the function. But we've already seen it doesn't copy the value, so it isn't actually pass by value.
This is where it gets interesting, and leads to people mistakingly thinking that Python is sometimes pass by value and sometimes pass by reference. Suppose you call func(z) instead, where z is a list. This time the local namespace will be:
('arg': <list object [1,2,3]>}
Now, instead of assigning to the name arg, suppose we modify it like so:
arg.append(1)
Naturally the list object [1,2,3] becomes [1,2,3,1]. But since the local arg and the global x are the same object, and not copies, both the local and the global list see the same change. Naturally, since they are one and the same object!
So, for immutable objects that can't be modified in place, Python behaves superficially like pass by value (except it doesn't copy values) and for mutable objects that can be modified in place, Python behaves superficially like pass by reference (except you can't assign to the global variable, only the local). So Python's behaviour combines some behaviour of both pass by value and pass by reference, while being implemented differently from both.
This strategy has been known as "pass by sharing" or "pass by object sharing" since 1974, when it was invented by Barbara Liskov. It is the same as what Ruby calls "pass by reference" and Java calls "pass by value", to the confusion of all. There is no need for this confusion except for the stubborn insistence that there are only two argument passing strategies.
-- Steven
- Previous message: [Tutor] Python vs. MATLAB
- Next message: [Tutor] Python vs. MATLAB
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]