The Cafes » Why Hate the for Loop? (original) (raw)

There’s one example that comes up sooner or later every time someone starts talking about closures. This time it’s from Bruce Tate on developerWorks:

Listing 1. The simplest possible closure

3.times {puts "Inside the times method."}

Results: Inside the times method. Inside the times method. Inside the times method.

times is a method on the 3 object. It executes the code in the closure three times. {puts "Inside the times method."} is the closure. It’s an anonymous function that’s passed into the times method and prints a static sentence. This code is tighter and simpler than the alternative with a for loop, shown in Listing 2:

Listing 2: Looping without closures

for i in 1..3 puts "Inside the times method." end

Personally I find the latter example simpler, clearer, and easier to understand. For one thing, when I see the word times suffixed to a number like 3 I expect multiplication, not action. But even if times were changed to a more reasonable method name such as do or act, I’d still prefer the for loop. (Perhaps what you really want here is “do 3 times”. That might really be clearer. )

I don’t know what it is some people have against for loops that they’re so eager to get rid of them. This isn’t the first or even the second time CS theorists have revolted against for loops (or equivalent). One advantage cited for RATFOR over traditional Fortran-77 was the ability to use index-less while loops instead of DO loops (Fortran’s equivalent of for). The Java Collections API brought us a confusing mass of iterators with weird modification behavior to avoid having to do something as simple and obvious as indexing our walks through a list. Then in Java 5 this wasn’t good enough. Some people were still ignoring iterators and stubbornly persisting in indexing their list traversals, so we got a whole new indexless for ( String arg : args ) syntax.

What about a simple indexed loop offends people so much that they invent massive, complex syntax just to avoid it? Personally I find the indexed for loop syntax to be familiar and comforting. That’s why I deliberately designed XOM to support indexed access to the various components of the document tree. No iterators. No fancy loops. Just plain, old

for (int i = 0; i < element.getChildCount() ; i++) {
  Node child = element.getChild(i);
  //...
}

The times syntax does avoid the explicit declaration and creation of a loop variable, though I’m sure that’s still happening behind the scenes and there’s no performance difference. Still, it is nice that you don’t have the variable getting in your way if you don’t want it. Certainly in C and even in Java programmers are always enbugging their code with wrongly scoped loop indices or fencepost errors. This makes indexless loops a nice feature for a language that has it from the get-go like Ruby. However I don’t think this is a big enough advantage to justify changing Java.

A much more serious concern is that indexless loops don’t have to be serial. That is, there’s nothing in a statement like 3.times {doSomething()} that promises any particular order of execution. In fact, just maybe we can do all three actions at the same time. This enables parallel processing, and is going to be very important as multicore processors and multi-CPU systems become even more common. For example, consider the code to sum an array:

double sum = 0;
for (int i = 0; i < array.length; i++) {
  sum += array[i];
}

The programmer probably doesn't care (and certainly shouldn't care) that we add array[0] to the sum before array[5]. If the array is large, we could actually execute this on eight separate CPUs, each summing an eighth of the array, and then add the subtotals at the end. A smart compiler could figure this out, but it's easier to do that if there's nothing in the code that refers to the loop index.

The for syntax implies serialization where you may not need it. The closure syntax doesn't necessarily guarantee the order of execution of the various statements. However, sometimes you actually do need a particular order of execution, or you need to refer to the loop index from within or outside of the code. For example, consider this simple loop that concatenates an array of strings named args:

String s = "";
for (int i = 0; i < args.length; i++) {
  s += args[i];
}

String concatenation is not commutative, and it's really important that we add the strings in the proper order. A really smart compiler might still break this up into multiple threads. but it would have to be a lot more careful that the intended order was preserved.

Thus the closure syntax and the for syntax really aren't equivalent and closures can't replace for loops. They might supplement them, but this is only relevant if they really can be run on multiple processors simultaneously. In functional systems, this works because there are no side-effects. Thread safety is almost free. However, this isn't true in Java. In Java you have to think very carefully about thread safety, and typical closure code doesn't do that. Unless we have true functional programming, I'm not sure I see the point.

The current proposals for closures in Java all seem to still have sequential execution of code. For instance, the BGGA proposal makes a big point out of allowing break and continue inside closures, but what does that mean if the different iterations of the loop are in fact running on different processors at the same time? If the code is going to be sequential anyway, I prefer the style that makes that more obvious. The traditional indexed loop does that. A closure doesn't.

This entry was posted on Wednesday, February 7th, 2007 at 10:26 am and is filed under Blogroll. You can follow any responses to this entry through the Atom feed. Both comments and pings are currently closed.