GitHub - gpakosz/PackedArray: Random access array of tightly packed unsigned integers (original) (raw)
PackedArray: random access array of tightly packed unsigned integers
TLDR
PackedArray comes to the rescue when you're in a desperate need for an uint9_t or uint17_t array.
What?
When you want to hold an unordered sequence of unsigned integers into memory, the C programming language lets you choose among 4 data types:
uint8_tuint16_tuint32_tuint64_t
If your numbers are within the [0, 100000] range, only 17 bits per integer are needed since 217 = 131072. However, you can't use an array ofuint16_t because 16 bits are not enough to store numbers between 65536 and 100000. When you use the next available type, uint32_t, you're wasting 15 bits per integer which represents a 47% overhead in terms of storage requirements.
PackedArray saves memory by packing integers/items together at the bit-level:
| b0 | b1 | b2 | ... | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 | ... |
A PackedArray is backed by an uint32_t buffer. Several items end up being stored inside the same buffer cell, e.g. i0, i1, and i2. Some items span two buffer cells, e.g. i3, and i7. PackedArray is responsible for encoding/decoding items into/from the storage buffer.
PackedArraySIMD is a PackedArray variant that makes use of SSE2 or NEON instructions.
Going SIMD processes integers 4 by 4 but imposes an interleaved layout in the storage buffer.
PackedArraySIMD interleaved layout, 13 bits per item:
| b0 | b1 | b2 | b3 | ... | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| i0 | i4 | i8a | i1 | i5 | i9a | i2 | i6 | i10a | i3 | i7 | i11a | i8b | ... |
As a consequence, the data layout of PackedArraySIMD isn't compatible with its non SIMD counterpart. In other words, you cannot use PackedArray to unpack data packed with PackedArraySIMD or the other way around.
It is also worth noting the implementations of PackedArraySIMD_pack andPackedArraySIMD_unpack require more plumbing than their non-SIMD counterparts. Additional computations are needed to find out and adjust a data window that can be processed 4 by 4 with SIMD instructions.
PackedArray and PackedArraySIMD are released under the WTFPL v2 license.
For more information, see the PackedArray announcement on my personal website.
Why?
PackedArray is designed as a drop-in replacement for an unsigned integer array. I couldn't find such a data structure in the wild, so I implemented one.
Instead of writing:
uint32_t* a = (uint32_t*)malloc(sizeof(uint32_t) * count);
...
value = a[i];
...
a[j] = value;
You write:
PackedArray* a = PackedArray_create(bitsPerItem, count);
...
value = PackedArray_get(a, i);
...
PackedArray_set(a, j, value);
The PackedArray_computeBitsPerItem helper scans a uint32_t array and returns the number of bits needed to create a PackedArray capable of holding its content.
There are also PackedArray_pack and PackedArray_unpack that operate on several items in a row. Those two could really have been namedPackedArray_write and PackedArray_read but I decided "pack" / "unpack" conveys better something is happening under the hood.
// bulk packing / unpacking
PackedArray_pack(a, j, in, count);
PackedArray_unpack(a, j, out, count);
// the following are semantically equivalent
PackedArray_set(a, j, value);
PackedArray_pack(a, j, &value, 1);
value = PackedArray_get(a, i);
PackedArray_unpack(a, i, &value, 1);
Compiling
In order to use PackedArray or PackedArraySIMD in your own project, you just have to bring in the two PackedArray.h and PackedArray.c (orPackedArraySIMD.c) files. It's that simple.
You can customize PackedArray.c's behavior by defining the following macros:
PACKEDARRAY_ASSERTPACKEDARRAY_MALLOCPACKEDARARY_FREE
You can customize PackedArraySIMD.c's behavior by defining the following macros:
PACKEDARRAY_ASSERTPACKEDARRAY_ALIGNED_MALLOCPACKEDARARY_FREE
PackedArray.c and PackedArraySIMD.c can compile themselves into either a test program or a micro-benchmark. For that, you have to use one of the following preprocessor directives:
PACKEDARRAY_SELF_TESTPACKEDARRAY_SELF_BENCH
For example, from command line:
$ cc -o PackedArraySelfTest -DPACKEDARRAY_SELF_TEST -O2 -g PackedArray.c
$ cc -o PackedArraySelfBench -DPACKEDARRAY_SELF_BENCH -DNDEBUG -O2 -g PackedArray.c
$ cc -o PackedArraySIMDSelfTest -DPACKEDARRAY_SELF_TEST -O2 -g PackedArraySIMD.c
$ cc -o PackedArraySIMDSelfBench -DPACKEDARRAY_SELF_BENCH -DNDEBUG -O2 -g PackedArraySIMD.c
Compiling for Windows
There is a Visual Studio 2012 solution in the _win-vs11/ folder.
Compiling for Linux or Mac
There is a GNU Make 3.81 MakeFile in the _gnu-make/ folder:
Compiling for Mac
See above if you want to compile from command line. Otherwise there is an Xcode project located in the _mac-xcode/ folder.
Compiling for iOS
There is an Xcode project located in the _ios-xcode/ folder.
If you prefer compiling from command line and deploying to a jailbroken device through SSH, use:
$ make -C _gnu-make/ binsubdir=ios CC="$(xcrun --sdk iphoneos --find clang) -isysroot $(xcrun --sdk iphoneos --show-sdk-path) -arch armv7 -arch armv7s -arch arm64" postbuild="codesign -s 'iPhone Developer'"
Compiling for Android
You will have to install the Android NDK, and point the $NDK_ROOT environment variable to the NDK path: e.g. export NDK_ROOT=/opt/android-ndk (without a trailing / character).
Next, the easy way is to make a standalone Android toolchain with the following command:
$ <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi><mi>D</mi><msub><mi>K</mi><mi>R</mi></msub><mi>O</mi><mi>O</mi><mi>T</mi><mi mathvariant="normal">/</mi><mi>b</mi><mi>u</mi><mi>i</mi><mi>l</mi><mi>d</mi><mi mathvariant="normal">/</mi><mi>t</mi><mi>o</mi><mi>o</mi><mi>l</mi><mi>s</mi><mi mathvariant="normal">/</mi><mi>m</mi><mi>a</mi><mi>k</mi><mi>e</mi><mo>−</mo><mi>s</mi><mi>t</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>a</mi><mi>l</mi><mi>o</mi><mi>n</mi><mi>e</mi><mo>−</mo><mi>t</mi><mi>o</mi><mi>o</mi><mi>l</mi><mi>c</mi><mi>h</mi><mi>a</mi><mi>i</mi><mi>n</mi><mi mathvariant="normal">.</mi><mi>s</mi><mi>h</mi><mo>−</mo><mo>−</mo><mi>s</mi><mi>y</mi><mi>s</mi><mi>t</mi><mi>e</mi><mi>m</mi><mo>=</mo></mrow><annotation encoding="application/x-tex">NDK_ROOT/build/tools/make-standalone-toolchain.sh --system=</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.00773em;">R</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.13889em;">OOT</span><span class="mord">/</span><span class="mord mathnormal">b</span><span class="mord mathnormal">u</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">d</span><span class="mord">/</span><span class="mord mathnormal">t</span><span class="mord mathnormal">oo</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">s</span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.03148em;">mak</span><span class="mord mathnormal">e</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">s</span><span class="mord mathnormal">t</span><span class="mord mathnormal">an</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord mathnormal">e</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">t</span><span class="mord mathnormal">oo</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">c</span><span class="mord mathnormal">hain</span><span class="mord">.</span><span class="mord mathnormal">s</span><span class="mord mathnormal">h</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8095em;vertical-align:-0.1944em;"></span><span class="mord">−</span><span class="mord mathnormal">sys</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">m</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span></span></span></span>(uname -s | tr [A-Z] [a-z])-$(uname -m) --platform=android-3 --toolchain=arm-linux-androideabi-clang3.3 --install-dir=/tmp/android-clang
Now you can compile the self test and self benchmark programs by running:
$ make -C _gnu-make/ binsubdir=android CC=/tmp/android-clang/bin/clang CFLAGS='-march=armv7-a -mfloat-abi=softfp -mfpu=neon -O2'
Implementation details, what the hell is going on?
First, in PackedArray.c or PackedArraySIMD.c, everything that comes below the - 8< ---- marker is the code for the self test and self micro-benchmark programs and can be discarded if you really want to:
If you want to cut down your anxiety, you can use the provided GNU Makefile and invoke:
This produces the PackedArray.cut.c and PackedArraySIMD.cut.c files.
You may also be troubled by PackedArray.c and PackedArraySIMD.c including themselves with #include PACKEDARRAY_SELF. By combining preprocessing tricks and including themselves, PackedArray.c and PackedArraySIMD.c"generate the code" for the unrolled pack and unpack implementations.
By default PACKEDARRAY_SELF is defined to "PackedArray.c" which assumes the compiler is going to look for the file in the same directory as the file from which the #include statement is being evaluated. This helps compiling when the build system refers to the source files with relative paths. Depending on your compiler/build system combination you may want to override PACKEDARRAY_SELF to__FILE__.
If you want to see the generated code, you can use the provided GNU Makefile and invoke:
$ make -C _gnu-make/ preprocess
This produces the PackedArray.pp.c and PackedArraySIMD.pp.c files.
If you find PackedArray or PackedArraySIMD useful and decide to use it in your own projects please drop me a line @gpakosz.
If you use it in a commercial project, consider using Gittip.
