Simple-16 integer compression. More...

#include <compress_integer_simple_16.h>

Inheritance diagram for JASS::compress_integer_simple_16:

Collaboration diagram for JASS::compress_integer_simple_16:

Public Member Functions
	compress_integer_simple_16 ()
	Constructor.

virtual	~compress_integer_simple_16 ()
	Destructor.

virtual size_t	encode (void encoded, size_t encoded_buffer_length, const integer source, size_t source_integers)
	Encode a sequence of integers returning the number of bytes used for the encoding, or 0 if the encoded sequence doesn't fit in the buffer. More...

virtual void	decode (integer decoded, size_t integers_to_decode, const void source, size_t source_length)
	Decode a sequence of integers encoded with this codex. More...

Public Member Functions inherited from JASS::compress_integer
	compress_integer ()
	Constructor.

virtual	~compress_integer ()
	Destructor.

Static Public Member Functions
static void	unittest (void)
	Unit test this class.

Static Public Member Functions inherited from JASS::compress_integer
static size_t	d1_encode (integer encoded, const integer source, size_t source_integers)
	Convert an array of integers into an array of D1 (delta, d-gap) encoded integers. More...

static size_t	d1_decode (integer decoded, const integer source, size_t source_integers)
	Convert a D1 encoded array of integers into an array of integers. More...

static size_t	dn_encode (integer encoded, const integer source, size_t source_integers, size_t n=1)
	Convert an array of integers into an array of Dn (delta, d-gap) encoded integers with a gap of n. More...

static size_t	dn_decode (integer decoded, const integer source, size_t source_integers, size_t n=1)
	Convert a Dn encoded array of integers into an array of integers. More...

static void	unittest_one (compress_integer &encoder, const std::vector< uint32_t > &sequence)
	Test one sequence to make sure it encodes and decodes to the same thing. Assert if not. More...

static void	unittest (compress_integer &compressor, uint32_t staring_from=0)
	Unit test this class, assert on failure. More...

Static Protected Attributes
static const size_t	ints_packed_table []
	Number of integers packed into a word, given its mask type. More...

static const size_t	can_pack_table []
	Bitmask map for valid masks at an offset (column) for some num_bits_needed (row) More...

static const size_t	row_for_bits_needed []
	Translates the 'bits_needed' to the appropriate 'row' offset for use with can_pack table. More...

static const size_t	invalid_masks_for_offset []
	We AND out masks for offsets where we don't know if we can fully pack for that offset. More...

static const size_t	simple16_shift_table []
	Number of bits to shift across when packing - is sum of prior packed ints (see above) More...

Additional Inherited Members
Public Types inherited from JASS::compress_integer
typedef uint32_t	integer
	This class and descendants will work on integers of this size. Do not change without also changing JASS_COMPRESS_INTEGER_BITS_PER_INTEGER.

Detailed Description

Simple-16 integer compression.

Simple-16 is an extension to Simple-9 that uses all 16 selectors (rather than just 9) for encoding the payloads. This resulrs in a more effective encoding that performs faster than Simple-9. This is because fewer reads ar eneeded and hence its faster. Note that, because there are only 28 bits in a payload, the maximum integer that can be encoded with simple-9 is (2^29) - 1 = 536,870,911. This is less than the number of documens in a large collection (such as ClueWeb).

The encodings are: 28 * 1-bit 7 * 2-bits and 14 * 1-bit 7 * 1-bit and 7 * 2-bits and 7 * 1-bit 14 * 1-bit and 7 * 2-bits 14 * 2-bits 1 * 4-bit and 8 * 3-bits 1 * 3-bits and 4 * 4-bits and 3 * 3-bits 7 * 4-bits 4 * 5 bits and 2 * 4 bits 2 * 4-bits and 4 * 5-bits 3 * 6-bits and 2 * 5-bits 2 * 5-bits and 3 * 6 bits 4 * 7-bits 1 * 10-bits and 2 * 9 bits 2 * 14-bits 1 * 28-bits

See: Zhang J, Long X, Suel T. (2008) Performance of compressed inverted list caching in search engines. Proceeedings of 17th Conference on the World Wide Web, pp 387-396 Yan H, Ding S, Suel T (2009) Inverted index compression and query processing with optimized document ordering. roceeedings of 18th Conference on the World Wide Web, 401-410

Member Function Documentation

◆ decode()

void JASS::compress_integer_simple_16::decode	(	integer *	decoded,
		size_t	integers_to_decode,
		const void *	source,
		size_t	source_length
	)

virtual

Decode a sequence of integers encoded with this codex.

Parameters

decoded	[out] The sequence of decoded integers.
integers_to_decode	[in] The minimum number of integers to decode (it may decode more).
source	[in] The encoded integers.
source_length	[in] The length (in bytes) of the source buffer.

Implements JASS::compress_integer.

◆ encode()

size_t JASS::compress_integer_simple_16::encode	(	void *	encoded,
		size_t	encoded_buffer_length,
		const integer *	source,
		size_t	source_integers
	)

virtual

Encode a sequence of integers returning the number of bytes used for the encoding, or 0 if the encoded sequence doesn't fit in the buffer.

Parameters

encoded	[out] The sequence of bytes that is the encoded sequence.
encoded_buffer_length	[in] The length (in bytes) of the output buffer, encoded.
source	[in] The sequence of integers to encode.
source_integers	[in] The length (in integers) of the source buffer.

Returns: The number of bytes used to encode the integer sequence, or 0 on error (i.e. overflow).

Implements JASS::compress_integer.

Member Data Documentation

◆ can_pack_table

const size_t JASS::compress_integer_simple_16::can_pack_table

staticprotected

Initial value:

=
        {
        0xffff, 0x7fff, 0x3fff, 0x1fff, 0x0fff, 0x03ff, 0x00ff, 0x007f, 0x003f, 0x001f, 0x001f, 0x001f, 0x001f, 0x001f, 0x000f, 0x000f, 0x000f, 0x000f, 0x000f, 0x000f, 0x000f, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001,
        0xfff2, 0x7ff2, 0x3ff2, 0x1ff2, 0x0ff2, 0x03f2, 0x00f2, 0x0074, 0x0034, 0x0014, 0x0014, 0x0014, 0x0014, 0x0014, 0x0008, 0x0008, 0x0008, 0x0008, 0x0008, 0x0008, 0x0008, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xffe0, 0x7fe0, 0x3fe0, 0x1fe0, 0x0fe0, 0x03e0, 0x00e0, 0x0060, 0x0020, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xffa0, 0x7fc0, 0x3fc0, 0x1fc0, 0x0fc0, 0x0380, 0x0080, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xfd00, 0x7d00, 0x3f00, 0x1f00, 0x0e00, 0x0200, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xf400, 0x7400, 0x3c00, 0x1800, 0x0800, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xf000, 0x7000, 0x3000, 0x1000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xe000, 0x6000, 0x2000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xe000, 0x4000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0xc000, 0x4000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x8000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
        }

Bitmask map for valid masks at an offset (column) for some num_bits_needed (row)

◆ ints_packed_table

const size_t JASS::compress_integer_simple_16::ints_packed_table

staticprotected

Initial value:

=
        {
        28, 21, 21, 21, 14, 9, 8, 7, 6, 6, 5, 5, 4, 3, 2, 1
        }

Number of integers packed into a word, given its mask type.

◆ invalid_masks_for_offset

const size_t JASS::compress_integer_simple_16::invalid_masks_for_offset

staticprotected

Initial value:

=
        {
        0x0000, 0x8000, 0xc000, 0xe000, 0xf000, 0xfc00, 0xff00, 0xff80, 0xffc0, 0xffe0, 0xffe0, 0xffe0, 0xffe0, 0xffe0, 0xfff0, 0xfff0, 0xfff0, 0xfff0, 0xfff0, 0xfff0, 0xfff0, 0xfffe, 0xfffe, 0xfffe, 0xfffe, 0xfffe, 0xfffe, 0xfffe, 0xffff
        }

We AND out masks for offsets where we don't know if we can fully pack for that offset.

◆ row_for_bits_needed

const size_t JASS::compress_integer_simple_16::row_for_bits_needed

staticprotected

Initial value:

=
        {
        0, 0, 28, 56, 84, 112, 140, 168, 196, 196, 224, 252, 252, 252, 252, 280, 280, 280, 280, 280, 280, 280, 280, 280, 280, 280, 280, 280, 280,               
        308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 308 
        }

Translates the 'bits_needed' to the appropriate 'row' offset for use with can_pack table.

◆ simple16_shift_table

const size_t JASS::compress_integer_simple_16::simple16_shift_table

staticprotected

Initial value:

=
        {
        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
        0, 2, 4, 6, 8, 10, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 28, 28, 28, 28, 28, 28,
        0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 13, 15, 17, 19, 21, 22, 23, 24, 25, 26, 27, 28, 28, 28, 28, 28, 28, 28,
        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 20, 22, 24, 26, 28, 28, 28, 28, 28, 28, 28,
        0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 4, 7, 10, 13, 16, 19, 22, 25, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 3, 7, 11, 15, 19, 22, 25, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 4, 8, 12, 16, 20, 24, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 5, 10, 15, 20, 24, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 4, 8, 13, 18, 23, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 6, 12, 18, 23, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 5, 10, 16, 22, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 7, 14, 21, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 10, 19, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 14, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
        0, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28
        }

Number of bits to shift across when packing - is sum of prior packed ints (see above)

The documentation for this class was generated from the following files:

source/compress_integer_simple_16.h
source/compress_integer_simple_16.cpp

Public Member Functions

Static Public Member Functions

Static Protected Attributes

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ decode()

◆ encode()

Member Data Documentation

◆ can_pack_table

◆ ints_packed_table

◆ invalid_masks_for_offset

◆ row_for_bits_needed

◆ simple16_shift_table