JASSv2
Classes | Public Member Functions | Static Public Member Functions | Static Protected Attributes | List of all members
JASS::compress_integer_simple_9 Class Reference

Simple-9 integer compression. More...

#include <compress_integer_simple_9.h>

Inheritance diagram for JASS::compress_integer_simple_9:
Inheritance graph
[legend]
Collaboration diagram for JASS::compress_integer_simple_9:
Collaboration graph
[legend]

Classes

class  lookup
 lookup table storing how many integers are encoded and how they are encoded, More...
 

Public Member Functions

 compress_integer_simple_9 ()
 Consructor.
 
virtual ~compress_integer_simple_9 ()
 Destructor.
 
virtual size_t encode (void *encoded, size_t encoded_buffer_length, const integer *source, size_t source_integers)
 Encode a sequence of integers returning the number of bytes used for the encoding, or 0 if the encoded sequence doesn't fit in the buffer. More...
 
virtual void decode (integer *decoded, size_t integers_to_decode, const void *source, size_t source_length)
 Decode a sequence of integers encoded with this codex. More...
 
- Public Member Functions inherited from JASS::compress_integer
 compress_integer ()
 Constructor.
 
virtual ~compress_integer ()
 Destructor.
 

Static Public Member Functions

static void unittest (void)
 Unit test this class.
 
- Static Public Member Functions inherited from JASS::compress_integer
static size_t d1_encode (integer *encoded, const integer *source, size_t source_integers)
 Convert an array of integers into an array of D1 (delta, d-gap) encoded integers. More...
 
static size_t d1_decode (integer *decoded, const integer *source, size_t source_integers)
 Convert a D1 encoded array of integers into an array of integers. More...
 
static size_t dn_encode (integer *encoded, const integer *source, size_t source_integers, size_t n=1)
 Convert an array of integers into an array of Dn (delta, d-gap) encoded integers with a gap of n. More...
 
static size_t dn_decode (integer *decoded, const integer *source, size_t source_integers, size_t n=1)
 Convert a Dn encoded array of integers into an array of integers. More...
 
static void unittest_one (compress_integer &encoder, const std::vector< uint32_t > &sequence)
 Test one sequence to make sure it encodes and decodes to the same thing. Assert if not. More...
 
static void unittest (compress_integer &compressor, uint32_t staring_from=0)
 Unit test this class, assert on failure. More...
 

Static Protected Attributes

static const lookup simple9_table []
 The table mapping bits to slectors and masks. More...
 
static const uint32_t bits_to_use []
 The number of bits used to store an integer of the given the number of bits in length. More...
 
static const uint32_t table_row []
 Given the number of bits, which row of simple9_table should be used? More...
 
static const uint32_t ints_packed_table []
 Number of integers packed into a 32-bit word, given its mask type. More...
 
static const uint32_t can_pack_table []
 Bitmask map for valid masks at an offset (column) for some num_bits_needed (row). More...
 
static const uint32_t row_for_bits_needed []
 Translates the 'bits_needed' to the appropriate 'row' offset for use with can_pack table. More...
 
static const uint32_t invalid_masks_for_offset []
 AND out masks for offsets where we don't know if we can fully pack for that offset. More...
 
static const uint32_t simple9_shift_table []
 Number of bits to shift when packing - 9 rows for simple-9. More...
 

Additional Inherited Members

- Public Types inherited from JASS::compress_integer
typedef uint32_t integer
 This class and descendants will work on integers of this size. Do not change without also changing JASS_COMPRESS_INTEGER_BITS_PER_INTEGER.
 

Detailed Description

Simple-9 integer compression.

Simple-9 compression bit-packs as many integers as possible into a 32-bit word. All integers are packed into the same number of bits. The encoding is stored in a selector stored in the top 4 bits of the 32-bit word and 28-bits for the payload. Note that, because there are only 28 bits in a payload, the maximum integer that can be encoded with simple-9 is (2^29) - 1 = 536,870,911. This is less than the number of documens in a large collection (such as ClueWeb).

In essence, it encodes into a 32-bit word: 28 * 1-bit integers, or 14 * 2-bit integers, 9 * 3-bit integers, 7 * 4-bit integers, 5 * 5-bit integers 4 * 7 bit integers, 3 * 9-bit integers, 2 * 14-bit integers, or 1 * 28-bit integer

See: V. Anh, A. Moffat (2005), Inverted Index Compression Using Word-Aligned Binary Codes, Information Retrieval, 8(1):151-166

Member Function Documentation

◆ decode()

void JASS::compress_integer_simple_9::decode ( integer decoded,
size_t  integers_to_decode,
const void *  source,
size_t  source_length 
)
virtual

Decode a sequence of integers encoded with this codex.

Parameters
decoded[out] The sequence of decoded integers.
integers_to_decode[in] The minimum number of integers to decode (it may decode more).
source[in] The encoded integers.
source_length[in] The length (in bytes) of the source buffer.

Implements JASS::compress_integer.

Reimplemented in JASS::compress_integer_carry_8b, JASS::compress_integer_relative_10, and JASS::compress_integer_carryover_12.

◆ encode()

size_t JASS::compress_integer_simple_9::encode ( void *  encoded,
size_t  encoded_buffer_length,
const integer source,
size_t  source_integers 
)
virtual

Encode a sequence of integers returning the number of bytes used for the encoding, or 0 if the encoded sequence doesn't fit in the buffer.

Parameters
encoded[out] The sequence of bytes that is the encoded sequence.
encoded_buffer_length[in] The length (in bytes) of the output buffer, encoded.
source[in] The sequence of integers to encode.
source_integers[in] The length (in integers) of the source buffer.
Returns
The number of bytes used to encode the integer sequence, or 0 on error (i.e. overflow).

Implements JASS::compress_integer.

Reimplemented in JASS::compress_integer_carry_8b, JASS::compress_integer_relative_10, and JASS::compress_integer_carryover_12.

Member Data Documentation

◆ bits_to_use

const uint32_t JASS::compress_integer_simple_9::bits_to_use
staticprotected
Initial value:
=
{
1, 1, 2, 3, 4, 5, 7, 7,
9, 9, 14, 14, 14, 14, 14, 28,
28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64
}

The number of bits used to store an integer of the given the number of bits in length.

◆ can_pack_table

const uint32_t JASS::compress_integer_simple_9::can_pack_table
staticprotected
Initial value:
=
{
0x01ff, 0x00ff, 0x007f, 0x003f, 0x001f, 0x000f, 0x000f, 0x0007, 0x0007, 0x0003, 0x0003, 0x0003, 0x0003, 0x0003, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001, 0x0001,
0x01fe, 0x00fe, 0x007e, 0x003e, 0x001e, 0x000e, 0x000e, 0x0006, 0x0006, 0x0002, 0x0002, 0x0002, 0x0002, 0x0002, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x01fc, 0x00fc, 0x007c, 0x003c, 0x001c, 0x000c, 0x000c, 0x0004, 0x0004, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x01f8, 0x00f8, 0x0078, 0x0038, 0x0018, 0x0008, 0x0008, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x01f0, 0x00f0, 0x0070, 0x0030, 0x0010, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x01e0, 0x00e0, 0x0060, 0x0020, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x01c0, 0x00c0, 0x0040, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x0180, 0x0080, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x0100, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
}

Bitmask map for valid masks at an offset (column) for some num_bits_needed (row).

◆ ints_packed_table

const uint32_t JASS::compress_integer_simple_9::ints_packed_table
staticprotected
Initial value:
=
{
28, 14, 9, 7, 5, 4, 3, 2, 1
}

Number of integers packed into a 32-bit word, given its mask type.

◆ invalid_masks_for_offset

const uint32_t JASS::compress_integer_simple_9::invalid_masks_for_offset
staticprotected
Initial value:
=
{
0x0000, 0x0100, 0x0180, 0x01c0, 0x01e0, 0x01f0, 0x01f0, 0x01f8, 0x01f8, 0x01fc, 0x01fc, 0x01fc, 0x01fc, 0x01fc, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01fe, 0x01ff
}

AND out masks for offsets where we don't know if we can fully pack for that offset.

◆ row_for_bits_needed

const uint32_t JASS::compress_integer_simple_9::row_for_bits_needed
staticprotected
Initial value:
=
{
0, 0, 28, 56, 84, 112, 140, 140, 168, 168, 196, 196, 196, 196, 196, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224,
252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252
}

Translates the 'bits_needed' to the appropriate 'row' offset for use with can_pack table.

◆ simple9_shift_table

const uint32_t JASS::compress_integer_simple_9::simple9_shift_table
staticprotected
Initial value:
=
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
0, 4, 8, 12, 16, 20, 24, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
0, 5, 10, 15, 20, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
0, 7, 14, 21, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
0, 9, 18, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
0, 14, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
0, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28
}

Number of bits to shift when packing - 9 rows for simple-9.

◆ simple9_table

const compress_integer_simple_9::lookup JASS::compress_integer_simple_9::simple9_table
staticprotected
Initial value:
=
{
{1, 28, 0xFFFFFFF},
{2, 14, 0x3FFF},
{3, 9, 0x1FF},
{4, 7, 0x7F},
{5, 5, 0x1F},
{7, 4, 0xF},
{9, 3, 0x7},
{14, 2, 0x3},
{28, 1, 0x1}
}

The table mapping bits to slectors and masks.

◆ table_row

const uint32_t JASS::compress_integer_simple_9::table_row
staticprotected
Initial value:
=
{
0, 1, 2, 3, 4, 4, 5, 5,
6, 6, 6, 6, 6, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 8, 8
}

Given the number of bits, which row of simple9_table should be used?


The documentation for this class was generated from the following files: