Our goal is to run a query which best to describe in pseudo SQL :
SELECT fact1, count(*) FROM FactsTable
WHERE fact1 > ‘value’ AND fact2 BETWEEN 10 and 100 OR fact 3 = ‘valueX’
GROUP BY fact1
HAVING COUNT(*) > 100
The proposed “query plan” for this type of query is a two step process.
All we need to do now is to iteratively read all possible bit-vectors corresponding to fact1. Number of vectors here depends on the total number of variants (facts) in the fact1 field (in other words we are looking at fact1 cardinality). It can be quite large, but many intermediate results have a good chance to be filtered out with a HAVING criteria.
To perform GROUP BY – HAVING we propose to execute logical operation AND between the query vector and all fact1 vectors. HAVING is trivial – all you need is to count bits in the AND product vector.
// deserialize BLOB
*pbv &= bv_query; // GROUP BY AND