The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 4419. You can contact us at cunygc@appliedtopology.nyc.

Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.

Schedule

Current schedule can be found here.

We will be sending out announcements through a mailing list; you can subscribe here.

Organizers

  • Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
  • Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College

Suggested papers

We have compiled a list of papers that might be interesting to present.

Schedule

A new data structure for the epsilon-approximate range emptiness problem

Constructing space-efficient data structures for answering approximate membership queries is a well-studied problem and one that is increasingly important in the era of big data. That is, given a set $S$ of $n$ elements from a large universe $[U] = { 0,2, ..., U -1 }$, preprocess $S$ so as to answer queries of the form, is $x \in S$? Since $|S|$ is prohibitively large to fit directly into RAM, the approach is to give a sketch, or succinct summary, of the set that, while removing elements from S, gives good answers to specified queries. Hence, we allow such data structures to return a small fraction false positives in return for significant savings in space.

This talk will review a new data structure, along with it techniques and lower bounds that generalizes the functionality from single point queries to 1-D queries of intervals of length $L$. Known as the $\epsilon$-approximate range emptiness problem, the paper by Mayank Goswami, Allan Grølund, Kasper Green Larsen, and Rasmus Pagh show that the space/error trade-off of a naive approach of querying a traditional Bloom filter $L$ times cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to $L$ with false positive probability $\epsilon$, must use space $\Omega(n \lg(L/\epsilon)) − O(n)$ bits. While this does seem unfortunate, their data structure does answer such queries in $O(1)$ time.


Pages

  • Data Science and Applied Topology Seminar
  • Reading list