The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 3209. You can contact us at cunygc@appliedtopology.nyc.

Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.

## Schedule

Current schedule can be found here.

We will be sending out announcements through a mailing list; you can subscribe here.

## Organizers

- Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
- Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College
- Chao Chen, Computer Science Programme, CUNY Graduate Center; Department of Computer Science, CUNY Queens College

## Suggested papers

We have compiled a list of papers that might be interesting to present.

# Schedule

### TBD

TBD

### TBD

TBD

### Statistical Topological Data Analysis using Persistence Landscapes

TBD

### Donald Thevalingam

### Computing Persistent Homology

TBD

### Tom Fallon

Paper discussion

### Functional Data Analysis Using a Topological Summary Statistic: The Smooth Euler Characteristic Transform

We introduce a novel statistic, the smooth Euler characteristic transform (SECT), which is designed to integrate shape information into regression models by representing shapes and surfaces as a collection of curves. Its construction is based on theory from topological data analysis (TDA). Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. We provide mathematical properties of this statistic, notably, its injectivity, which is an implication for statistical sufficiency.

We illustrate the utility of the SECT in a radiomics context by showing that the topological quantification of tumors, assayed by magnetic resonance imaging (MRI), are better predictors of clinical outcomes in patients with glioblastoma multiforme (GBM). We show that topological features of tumors captured by the SECT alone explain more of the variance in patient survival than gene expression, volumetric features, and morphometric features.

### Frechet means for distirbutions of persistence diagrams

Paper discussion

### On Characterizing the Capacity of Neural Networks using Algebraic Topology

Paper discussion. Link

### Fibers of Failure - using Mapper to classify prediction failures

The Mapper algorithm is able to produce intrinsic topological models of arbitrary data in high dimensions. Through a statistical adaptation of the Nerve lemma, the algorithm can be seen to reproduce the topology and parts of the geometry of the data source under assumptions of dense sampling and good parameter choices. In this talk, we will describe how by careful choice of the Mapper model parameters, the resulting topological model can be guaranteed to separate input values to the predictive process for prediction error, grouping high-error and low-error regions separately. This approach produces a diagnostic process where local failure modes can be classified, feeding into either a model development process or a local correction term to improve predictive performance. We have successfully applied this approach to temperature prediction in steel furnaces.

### No Meeting

None

### No meeting

No meeting

### Fibres of Faiure

The Mapper algorithm is able to produce intrinsic topological models of arbitrary data in high dimensions. Through a statistical adaptation of the Nerve lemma, the algorithm can be seen to reproduce the topology and parts of the geometry of the data source under assumptions of dense sampling and good parameter choices. In this talk, we will describe how by careful choice of the Mapper model parameters, the resulting topological model can be guaranteed to separate input values to the predictive process for prediction error, grouping high-error and low-error regions separately. This approach produces a diagnostic process where local failure modes can be classified, feeding into either a model development process or a local correction term to improve predictive performance. We have successfully applied this approach to temperature prediction in steel furnaces.

### Inaugural meeting

For our spring inaugural meeting MVJ will refresh our memory on the basics, and we will discuss paper assignments and focus interests for our active seminar participants.

### The Fuglede conjecture holds in the finite vector space \(\mathbb{Z}_p^2\)

We will see that the Fuglede Conjecture holds in $\mathbb{Z}_p^2$, proved by Iosevich/Mayeli/Pakianathan. That is the subsets E of the finite vector space $\mathbb{Z}_p^2$ tiles the space if and only if every function from E to the complex numbers is a linear combination of orthogonal exponential functions. Key ideas of the proof use Fourier transforms of functions from this space, direction sets, and some Galois theory.

### The fiber of the persistence map

The persistence map is the map that sends a function on a topological space to it's collection of persistence diagrams, which are canonical invariants of filtering a space by sublevel sets and taking homology in each degree. Geometrically, a persistence diagram is simply a configuration of points in the plane. In this talk I will study which configurations of points are possible and what the ramification of this map is for the simplest possible case---functions on the interval. Ongoing work and open problems will also be discussed.

### Barcodes: The Persistent Topology of Data

TBD

### Single cell mapper analysis

Next-generation high-throughput sequencing has generated an explosion of available genomic data. That holds a great potential for exploring biological systems with unprecedented resolution at the single-cell level. However, single-cell sequencing yields complex data output and implies technical challenges to traditional computational methods, which are mostly based on clustering and combinatorics. In this talk we will give a general perspective on the challenges and advances in the field of single-cell and associated data analysis problems. We put special emphasis on topological techniques.

### Cancelled

Cancelled

### Persistent Cohomology and human motion

TBD: Discuss persistent cohomology, circular coordinates and applications to motion capture.

### Extracting Insights from Complex Shapes using Topology

TBD

### The Forman gradient: a discrete tool for topology-based data analysis

Morse theory studies the relationships between the topology of a shape and the critical points of a real-valued smooth function defined on it. It has been recognized as an important tool for shape analysis and understanding in several applications, including physics, chemistry, medicine, and geography. Morse theory is defined for smooth functions, but recently a discrete counterpart, called Discrete Morse Theory (DMT), has been proposed in an entirely combinatorial setting. DMT introduces the idea of discrete Morse functions as functions that assign values to all the cells of a complex. Based on a discrete Morse function we can impose a partial pairing on such cells and eventually build a combinatorial gradient, also called Forman gradient. The Forman gradient is a very powerful tool as it provides a compact way to represent data without altering its homology. This talk will cover our contribution in developing computational tools, based on the Forman gradient, for the analysis of 2D and 3D scalar fields. I will describe in detail our compact representation for the Forman gradient defined on simplicial complexes and how it can be adapted to the n-dimensional case. Moreover, I will describe our recent work on multivariate data (i.e., collections of scalar fields), and how we can adapt the Forman gradient for analyzing such data in a computationally efficient way.

### Topology and Data

TBD

### Inaugural meeting

For an inaugural meeting we will discuss paper assignments and focus interests for our active seminar participants. There will be pizza and brownies.