The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 4419. You can contact us at

Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.


Current schedule can be found here.

We will be sending out announcements through a mailing list; you can subscribe here.


  • Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
  • Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College

Suggested papers

We have compiled a list of papers that might be interesting to present.


A new data structure for the epsilon-approximate range emptiness problem

Constructing space-efficient data structures for answering approximate membership queries is a well-studied problem and one that is increasingly important in the era of big data. That is, given a set $S$ of $n$ elements from a large universe $[U] = { 0,2, ..., U -1 }$, preprocess $S$ so as to answer queries of the form, is $x \in S$? Since $|S|$ is prohibitively large to fit directly into RAM, the approach is to give a sketch, or succinct summary, of the set that, while removing elements from S, gives good answers to specified queries. Hence, we allow such data structures to return a small fraction false positives in return for significant savings in space.

This talk will review a new data structure, along with it techniques and lower bounds that generalizes the functionality from single point queries to 1-D queries of intervals of length $L$. Known as the $\epsilon$-approximate range emptiness problem, the paper by Mayank Goswami, Allan Grølund, Kasper Green Larsen, and Rasmus Pagh show that the space/error trade-off of a naive approach of querying a traditional Bloom filter $L$ times cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to $L$ with false positive probability $\epsilon$, must use space $\Omega(n \lg(L/\epsilon)) − O(n)$ bits. While this does seem unfortunate, their data structure does answer such queries in $O(1)$ time.


  • Data Science and Applied Topology Seminar
  • Reading list