Package org.sindice.siren.index.codecs.block

Abstraction over the encoding and decoding of the block-based posting format.

See: Description

Package org.sindice.siren.index.codecs.block Description

Abstraction over the encoding and decoding of the block-based posting format.

Introduction

This package contains the abstract API for encoding (BlockIndexOutput) and decoding (BlockIndexInput) block-based posting format. It also includes algorithms for compressing and decompressing blocks of bytes.

Block-Based Posting Format

The block-based posting format encodes a posting list as a sequence of blocks. A block is composed of an header, i.e., metadata, and some content, i.e., bytes array. While the content of a block can be anything, it usually contains a sequence of integers. In certain cases it can be composed by multiple blocks of integers, for example to create interleaved blocks. The size of a block can be either variable or fixed.

Block Compression

A BlockCompressor compresses a list of integers into a byte array in one batch. A BlockIndexOutput must ensure that the given byte array is large enough for hosting the compressed data. The method BlockCompressor.maxCompressedSize(int) can be used to estimate the maximum size of a compressed block of values.

A BlockDecompressor decompresses a compressed byte array into a list of integers in one batch. A BlockIndexInput must ensure that the given integer array is large enough for hosting the uncompressed data.

Two block compression algorithms are implemented:

Concurrent Access

During the creation of a new index segment, terms are processed sequentially. This ensures that: During query processing, multiple terms are processed in parallel. The same BlockIndexInput will be used to decode multiple postings list. Safe concurrent access of the index files is ensured only if a different BlockIndexInput.BlockReader is used for each postings list. The method BlockIndexInput.getBlockReader() provides a BlockIndexInput.BlockReader which contains a clone of the underlying IndexInput.

Copyright © 2014. All rights reserved.