A new data structure for the epsilon-approximate range emptiness problem

A new data structure for the epsilon-approximate range emptiness problem

Constructing space-efficient data structures for answering approximate membership queries is a well-studied problem and one that is increasingly important in the era of big data. That is, given a set $S$ of $n$ elements from a large universe $[U] = { 0,2, ..., U -1 }$, preprocess $S$ so as to answer queries of the form, is $x \in S$? Since $|S|$ is prohibitively large to fit directly into RAM, the approach is to give a sketch, or succinct summary, of the set that, while removing elements from S, gives good answers to specified queries. Hence, we allow such data structures to return a small fraction false positives in return for significant savings in space.

This talk will review a new data structure, along with it techniques and lower bounds that generalizes the functionality from single point queries to 1-D queries of intervals of length $L$. Known as the $\epsilon$-approximate range emptiness problem, the paper by Mayank Goswami, Allan Grølund, Kasper Green Larsen, and Rasmus Pagh show that the space/error trade-off of a naive approach of querying a traditional Bloom filter $L$ times cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to $L$ with false positive probability $\epsilon$, must use space $\Omega(n \lg(L/\epsilon)) − O(n)$ bits. While this does seem unfortunate, their data structure does answer such queries in $O(1)$ time.