PythonParser::_check_thousands appears broken · Issue #4596 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

@cancan101

Description

@cancan101

This code appears broken:

    def _check_thousands(self, lines):
        if self.thousands is None:
            return lines
        nonnum = re.compile('[^-^0-9^%s^.]+' % self.thousands)
        ret = []
        for l in lines:
            rl = []
            for x in l:
                if (not isinstance(x, compat.string_types) or
                    self.thousands not in x or
                        nonnum.search(x.strip())):
                    rl.append(x)
                else:
                    rl.append(x.replace(',', ''))
            ret.append(rl)
        return ret

It looks like the thousands argument to the class is used to check if the value is "non numeric" but then a hard coded comma is used when actually performing the cleaning.

In addition to fixing this, I would recommend factoring out this method so that it can be used elsewhere.