How SubString method works in Java - Memory Leak Fixed in JDK 1.7 (original) (raw)

Substring method from the String classis one of the most used methods in Java, and it's also part of an interesting String interview question e.g. How substring works in Java or sometimes asked as to how does substring creates memory leak in Java. In order to answer these questions, your knowledge of implementation details is required. Recently one of my friends was drilled on the substring method in Java during a Java interview, he was using the substring() method for a long time, and of course, all of us has used this, but what surprises him was the interviewer's obsession on Java substring, and deep-dive till the implementation level.

Though String is a special class in Java, and the subject of many interview questions e.g. Why char array is better than String for storing passwords. In this case, it was the substring method, which took center stage. Most of us rather just use substring(..) and then forgot. Not every Java programmer goes into code and sees how exactly it's working. To get a feel of how his interview was let's start.

Update: This issue was actually a bug http://bugs.sun.com/view\_bug.do?bug\_id=6294060, which is fixed in substring implementation of Java 7. Now, Instead of sharing original character array, substring method creates a copy of it. In short, substring method only retains as much data, as it needed. Thanks to Yves Gillet for pointing this.

As some of my readers pointed out, java.lang.String class has also grown into some change in Java 1.7 version and offset and count variable which is used to track positions are removed from String.

This may save some bytes with each String instance, but not sharing original array makes substring perform linearly, as compared to constant time previously. Anyway, it's worth to remove any string related memory leak in Java. Having said that, if you have not yet upgraded your Server to Java 7 and still working on Java 1.6 updates, this is one thing, which is worth knowing.

The question starts with normal chit-chat, and the interviewer asks, "Have you used the substring method in Java", and my friend proudly said Yes, lot many times, which brings a smile to the interviewer's face. He says well, that’s good.

The next question was Can you explain what does substring does? My friend got an opportunity to show off his talent, and how much he knows about Java API; He said the substring method is used to get parts of String in Java. It’s defined in java.lang.String class, and it's an overloaded method.

One version of the substring method takes just beginIndex and returns part of String started from beginIndex till the end, while the other takes two parameters, beginIndex, and endIndex, and returns part of String starting from beginIndex to endIndex-1. He also stressed that every time you call the substring() method in Java, it will return a new String because the String is immutable in Java.

The next question was, what will happen if beginIndex is equal to length in substring(int beginIndex), no it won't throw IndexOutOfBoundException instead it will return empty String.

The same is the case when beginIndex and endIndex is equal, in case of second method. It will only throw StringIndexBoundException when beginIndex is negative, larger than endIndex or larger than length of String.

So far so good, my friend was happy and interview seems going good, until Interviewee asked him**, Do you know how substring works in Java**? Most of Java developers fail here because they don't know how exactly substring method works, until they have not seen the code of java.lang.String.

If you look substring method inside String class, you will figure out that it calls the String (int offset, int count, char value []) constructor to create a new String object. What is interesting here is, value[], which is the same character array used to represent the original string. So what's wrong with this?

In case If you have still not figured it out, If the original string is very long, and has array of size 1GB, no matter how small a substring is, it will hold 1GB array. This will also stop original string to be garbage collected, in case if doesn't have any live reference. This is a clear case of memory leak in Java, where memory is retained even if it's not required. That's how substring method creates memory leak.

Substring working in Java

How SubString in Java works

Obviously, next question from the interviewer would be, how do you deal with this problem? Though you can not go, and change Java substring method, you can still take some workaround, in case you are creating substring of a significantly longer String.

A simple solution is to trim the string, and keep size of the character array according to length of substring. Luckily java.lang.String has a constructor to do this, as shown in the below example.

// comma separated stock symbols from NYSE String listOfStockSymbolsOnNYSE = getStockSymbolsForNYSE();

//calling String(string) constructor String apple = new String( 

           listOfStockSymbolsOnNYSE.substring(appleStartIndex, appleEndIndex)

           );

If you look code on java.lang.String class, you will see that this constructor trim the array, if it’s bigger than String itself.

public String(String original) { ...

    if (originalValue.length > size) {
        // The array representing the String is bigger than the new
        // String itself.  Perhaps this constructor is being called
        // in order to trim the baggage, so make a copy of the array.
        int off = original.offset;
        v = Arrays.copyOfRange(originalValue, off, off+size);

    } else {

        // The array representing the String is the same
        // size as the String, so no point in making a copy.
        v = originalValue;
    }
...

}

Another way to solve this problem is to call the intern() method on substring, which will then fetch an existing string from the pool or add it if necessary. Since the String in the pool is a real string it only takes space as much it requires. It’s also worth noting that sub-strings are not internalized when you call the intern() method on the original String.

The most developer successfully answers first three questions, which is related to the usage of the substring, but they get stuck on last two, How substring creates memory leak or How substring works. It's not completely their fault, because what you know is that every time

substring()

returns a new String which is not exactly true since it’s backed by the same character array.

This was the only interview question, which bothers my friend little otherwise, its standard service level company Java interview in India. By the way, he got the call a day after, even though he struggled a little bit on

How the SubString method works in Java

, and that was the reason he shared this interview experience with me.

Related Java tutorials