Implement vectorized, NA-friendly friendly string utils, a la R's stringr (original) (raw)

cc @hammer, @arthurgerigk. Not sure when will be able to make this happen, but this would be a very nice addition. I've often found myself doing stuff like:

df[col].map(lambda x: x[:10])

or various other forms of string munging / regex-processing.

Obviously that would fail if any of df[col] is NA. And having to write this kinda sucks:

df[col].map(lambda x: x[:10] if notnull(x) else x)

If multiple columns were involved in some kind of string processing exercise, you'd just want the whole operation to short circuit and be NA if an NA is encountered.