The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 3209. You can contact us at email@example.com.
Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.
Current schedule can be found here.
We will be sending out announcements through a mailing list; you can subscribe here.
- Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
- Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College
We have compiled a list of papers that might be interesting to present.
We will talk about the upcoming semester, look for volunteers to give talks on specific papers and give a refresher overview of persistent homology.
Topological Structure of Linear Manifold Clustering
In the topological data analysis, the first step is a construction of a simplicial complex from a discrete points set D sampled from some manifold. In this paper, we present an algorithm for the efficient computation of such simplicial complex which utilizes clustering structure, comprised of subspace clusters, of the point set for speeding up a complex construction procedure while keeping relevant topological invariants of the underlying sampled manifold. Experiments show that the proposed construction algorithm provides smaller complexes with less noise which gives a better homological picture than other construction methods as well as an improved construction performance and a topological invariant interpretability on a geometrical level.
On the Metric Distortion of Embedding Persistence Diagrams into Reproducing Kernel Hilbert Spaces
Persistence Diagrams (PDs) are important feature descriptors in Topological Data Analysis. Due to the nonlinearity of the space of PDs equipped with their diagram distances, most of the recent attempts at using PDs in Machine Learning have been done through kernel methods, i.e., embeddings of PDs into Reproducing Kernel Hilbert Spaces (RKHS), in which all computations can be performed easily. Since PDs enjoy theoretical stability guarantees for the diagram distances, the metric properties of a kernel k, i.e., the relationship between the RKHS distance dk and the diagram distances, are of central interest for understanding if the PD guarantees carry over to the embedding. We study the possibility of embedding PDs into RKHS with bi-Lipschitz maps. In particular, we show that when the RKHS is infinite dimensional, any lower bound must depend on the cardinalities of the PDs, and that when the RKHS is finite dimensional, finding a bi-Lipschitz embedding is impossible, even when restricting the PDs to have bounded cardinalities.
Multiple hypothesis testing in persistent homology
We propose a general null model for persistent homology barcodes from a point cloud, to test for example acyclicity in simplicial complexes generated from point clouds. One advantage of the null model we propose is efficiency in generating a null model that applies to a broad set of hypothesis testing procedures. The second key idea in this talk is using the null model to address multiple hypothesis testing via control of family-wise error rates and false discovery rates.
Misinformation Data Science
Obayashi, Hiraoka, Kimura -- Persistence diagrams with linear machine learning models
We will read and discuss the paper Obayashi, Hiraoka, Kimura -- Persistence diagrams with linear machine learning models
NOTA BENE: This talk will take place at 10am.