startswith function gives misleading results when passed a series · Issue #3485 · pandas-dev/pandas (original) (raw)

Using pandas 0.10.1. I think this is probably "working correctly as designed", but the design seems non-obvious and there's not much documentation on this.
The Series object has a vectorized .str.startswith() method which works when you give it a constant string. But if you give it a series instead of a string, it "works" (in the sense of returning data) but that data is not what you'd expect. It appears the function was simply not designed to handle being passed a series, which is OK, but passing a series seems like a logicial thing to do, so can we at least make the function return an error rather than misleading data, so people don't get confused? I had to spent a half hour debugging to figure out why this wasn't working as I expected.

I want to select the rows where column 'two' starts with the string in column 'one'

d = pandas.DataFrame([['hi,hi 2'],['bye','bye 2']], columns=['one','two'])
d.two.str.startswith(d.one) #Returns false for everything, but you'd expect it to be true

The startswith function doesn't handle Series. Ideally, we should change it so it does handle series. But if that's too much work, can we at least have it return "series is not supported, pass a constant string instead" error?