The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 4419. You can contact us at cunygc@appliedtopology.nyc.

Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.

## Schedule

Current schedule can be found here.

We will be sending out announcements through a mailing list; you can subscribe here.

## Organizers

- Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
- Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College

## Suggested papers

We have compiled a list of papers that might be interesting to present.

# Schedule

### Semester Introduction

We will start off this semester with a light introduction to the basic techniques and ideas in Topological Data Analysis, and a discussion of possible content for this semester's seminars.

### Piecewise Linear Manifold Clustering

Clustering is commonly viewed as an ill-defined problem because there is not clear performance criteria which can tell about relevant element arrangements withing the clustering. Devising such evaluation criteria usually based on geometrical and/or structural composition of the original data. We present a novel algorithm that incorporates a topological structure and information-theoretical description of the clustered data to recognize and evaluate stable nonlinear structures in form of a piecewise linear manifolds as a meta-extension to common statistical clustering algorithms.

### Multiple hypothesis testing with persistent homology

Based on joint work with Sayan Mukherjee, we will talk about the need for multiple hypothesis corrections - both for Family-Wise Error Rates (decrease the risk of even a single false positive) and for False Discovery Rates (fix the rate of false positives at a chosen level) and how to correct both types in hypothesis testing using persistent homology.

### Topology and the Machine Mind

Using standard tools from topological data analysis and basic notions from topology, we explore new ways of understanding how a machine-learning algorithm utilizes the data it is given. This understanding can aid in the computability of a model and offers insight into additional, targeted input for improving its efficacy.

### Computing Minimal Presentations and Bigraded Betti Numbers of 2-Parameter Persistent Homology

Motivated by applications to topological data analysis, we give an efficient algorithm for computing a (minimal) presentation of a bigraded $K[x,y]$-module $M$, where $K$ is a field. The algorithm takes as input a short chain complex of free modules $\displaystyle F_2\xrightarrow{\partial_2} F_1 \xrightarrow{\partial_1} F_0$ such that $M\cong \ker{\partial_1}/\im{\partial_2}$. It runs in time $O(\sum_i |F_i|^3)$ and requires $O(\sum_i |F_i|^2)$ memory, where $|F_i|$ denotes the size of a basis of $F_i$. We observe that, given the presentation computed by our algorithm, the bigraded Betti numbers of $M$ are readily computed. These algorithms have been implemented in RIVET, a software tool for the visualization and analysis of two-parameter persistent homology. In experiments on topological data analysis problems, our approach outperforms the standard computational commutative algebra packages Singular and Macaulay2 by a wide margin.

### Cellular Sheaves, Discrete Hodge Theory, and Applications

Cellular sheaves are a discrete and computable instantiation of the theory of sheaves over cell complexes, and naturally model many situations where data of varying types is parameterized by a space. Assigning inner products to the stalks of a cellular sheaf allows us to apply the constructions of discrete Hodge theory to its cochain complex. We obtain generalizations of the discrete Hodge Laplacians, including the ubiquitous graph Laplacian. These operators have interesting topological and geometric properties, and allow us to use the local-to-global structure of cellular sheaves in real-world systems where robustness is essential. This talk will outline the construction of cellular sheaves and their Laplacians, and sketch various avenues for their application to realistic engineering and scientific problems.

### Singularity in Data Analysis

Statistical data by their very nature are indeterminate in the sense that if one repeated the process of collecting the data the new data set would be somewhat different from the original. Therefore, a statistical method, f, taking a data set x to a point in some space F, should be stable at x: Small perturbations in x should result in a small change in f(x). Otherwise, f is useless at x or -- and this is important -- near x. So one doesn't want f to have "singularities", a data set x such that the the limit of f(y) as y approaches x doesn't exist. (Yes, the same issue arises elsewhere in applied math.)

However, broad classes of statistical methods have topological obstructions of continuity: They must have singularities. In this talk I will show why and give lower bounds on the Hausdorff dimension, even Hausdorff measure, of the set of singularities of such methods. I will give examples.

### TBA

TBA

### TBA

TBA

### TBA

TBA

### TBA

TBA

### TBA

TBA

### TBA

TBA