Re: [Patch] expand,unexpand multibyte support (original) (raw)
On 02/18/2013 04:30 PM, Ondrej Oprala wrote:
Hi, I've been working on multibyte support for the {un,}expand utilities lately, my approach being similar to Padraig's from 2010 ( http://lists.gnu.org/archive/html/coreutils/2010-09/msg00029.html ) . Both tools now read by lines, not bytes, and then iterate over the characters properly.
Thanks - the package maintainer of all distributions will be very happy once multi-byte support is integrated to all upstream coreutils programs.
I can't tell too much about your patch yet - I even tried to avoid the umlaut-รถ in my own name whenever possible my whole life ;-) So here are only a few general notes.
I had a bit problems to compile because expand-core.h needs the types uintmax_t, uint32_t and uint8_t which are not yet defined. Moving the #include down (in src/{expand,expand-core,unexpand}.c) worked around this problem, e.g. expand.c:
--- a/src/expand.c +++ b/src/expand.c @@ -46,13 +46,13 @@
include <unistring/localcharset.h>
#endif
-#include "expand-core.h"
#include "system.h" #include "error.h" #include "fadvise.h" #include "xstrndup.h"
+#include "expand-core.h" + /* The official name of this program (e.g., no 'g' prefix). */ #define PROGRAM_NAME "expand"
In the expand() and unexpand() functions, the variable rawline can be changed from char** to char*. This makes the code better readable. I'm not sure whether we should free the rawline buffer on any input line anyway, because it's filled by getline() and resized as needed; i.e. reusing malloc'ed space is not a bad thing.
And instead of strdup() without checking, we should use xstrdup(): {
line = (uint8_t *) strdup (*rawline);
line = (uint8_t *) xstrdup (rawline); clen = 1; }
Hopefully, we can avoid that strdup in the final version, and just use the buffer we already have.
In the tests, please check the exit status of expand/unexpand, and use our own compare function; e.g. tests/expand/mb.sh:
-expand < in > out -cmp out exp > /dev/null 2>&1 || fail=1 +expand < in > out || fail=1 +compare exp out || fail=1
It would be good to have much more tests. I have the feeling that the coverage rate of expand/unexpand wasn't too great also in the existing tests - though the unexpand test looks slightly better than the expand test.
Have a nice day, Berny