bpo-12499: support custom len function in textwrap.wrap by xi · Pull Request #28136 · python/cpython (original) (raw)

The issue is with the following original code (comments are my own):

def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width): ... if width < 1: space_left = 1 else: space_left = width - cur_len ... if self.break_long_words: end = space_left # <-- HERE chunk = reversed_chunks[-1] if self.break_on_hyphens and self.text_len(chunk) > space_left: ...
hyphen = chunk.rfind('-', 0, space_left) # <-- HERE if hyphen > 0 and any(c != '-' for c in chunk[:hyphen]): end = hyphen + 1 cur_line.append(chunk[:end]) # <-- HERE reversed_chunks[-1] = chunk[end:] # <-- HERE ...

As you can see, it's tacitly assumed the text width is equal to the amount of characters, as it breaks on the index equal to the space left. You need a function to find on which index the width exceeds the line. I suggest the following:

def _find_width_index(self, text, width):
    """_find_width_index(text : string, width: int)

    Find at which index the text has the required width.
    """
    # In most cases text_len will just use the number of characters, so this heuristic prevents calculating width
    # for each character
    if self.text_len(text[:width]) == width:
        # For character widths greater than one, width can be more than the number of characters
        return min(width, len(text))
    cur_text = ''
    for i, c in enumerate(text):
        cur_text += c
        cur_width = self.text_len(cur_text)
        if cur_width >= width:
            return i+1

And then the following amended _handle_long_word:

def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
    """_handle_long_word(chunks : [string],
                         cur_line : [string],
                         cur_len : int, width : int)

    Handle a chunk of text (most likely a word, not whitespace) that
    is too long to fit in any line.
    """
    # Figure out when indent is larger than the specified width, and make
    # sure at least one character is stripped off on every pass
    if width < 1:
        space_left = 1
    else:
        space_left = width - cur_len

    # If we're allowed to break long words, then do so: put as much
    # of the next chunk onto the current line as will fit.
    if self.break_long_words:
        chunk = reversed_chunks[-1]
        end = self._find_width_index(chunk, space_left)
        if self.break_on_hyphens and self.text_len(chunk) > space_left:
            # break after last hyphen, but only if there are
            # non-hyphens before it
            hyphen = chunk.rfind('-', 0, end)
            if hyphen > 0 and any(c != '-' for c in chunk[:hyphen]):
                end = hyphen + 1
        cur_line.append(chunk[:end])
        reversed_chunks[-1] = chunk[end:]
...

I've tested this in tiptenbrink/textwrap, see tiptenbrink/textwrap@f93d7a2 for a proper diff.

A question for @merwok: how does a contribution to an existing PR work in this case? Do I need to also sign a CLA?