The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 3209. You can contact us at cunygc@appliedtopology.nyc.

Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.

## Schedule

Current schedule can be found here.

We will be sending out announcements through a mailing list; you can subscribe here.

## Organizers

- Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
- Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College
- Chao Chen, Computer Science Programme, CUNY Graduate Center; Department of Computer Science, CUNY Queens College

## Suggested papers

We have compiled a list of papers that might be interesting to present.

# Schedule

### The relative topological complexity of a pair

Topological complexity is a homotopy invariant introduced by Michael Farber in the early 2000s. Denoted $TC(X)$, it counts the smallest size of a continuous motion planning algorithm on $X$. In this sense, it solves optimally the problem of continuous motion planning in a given topological space. In topological robotics, a part of applied algebraic topology, several variants of $TC$ are studied. In a recent paper, I introduced the relative topological complexity of a pair of spaces $(X,Y)$ where $Y\subset X$. Denoted $TC(X,Y)$, this counts the smallest size of motion planning algorithms that plan from $X$ to $Y$.

In this talk, we will provide an overview of techniques used to study relative topological complexity and compute this invariant for several simple spaces relating to real-world robotics problems.

### Topological Data Analysis of Financial Time Series

We apply persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, for which we compute persistence homology. We use persistence landscapes to quantify the temporal changes in the time series. We test this approach on multidimensional time series generated by various non-linear and non-equilibrium models. As an alternative approach, we construct correlation networks, and track changes in the topology of these networks.

We apply this method to detect early signs for financial bubbles in market indices and asset prices. As case studies, we consider the US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009, as well as at the prices of cryptocurrencies.

This is based on joint work with Yuri Katz (Standard and Poor's Global Market Intelligence), and Pablo Roldan, Daniel Goldsmith, and Yonah Shmalo (Yeshiva University).

### Statistical Topological Data Analysis using Persistence Landscapes

TBD

### Donald Thevalingam

### Computing Persistent Homology

TBD

### Tom Fallon

Paper discussion

### Functional Data Analysis Using a Topological Summary Statistic: The Smooth Euler Characteristic Transform

We introduce a novel statistic, the smooth Euler characteristic transform (SECT), which is designed to integrate shape information into regression models by representing shapes and surfaces as a collection of curves. Its construction is based on theory from topological data analysis (TDA). Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. We provide mathematical properties of this statistic, notably, its injectivity, which is an implication for statistical sufficiency.

We illustrate the utility of the SECT in a radiomics context by showing that the topological quantification of tumors, assayed by magnetic resonance imaging (MRI), are better predictors of clinical outcomes in patients with glioblastoma multiforme (GBM). We show that topological features of tumors captured by the SECT alone explain more of the variance in patient survival than gene expression, volumetric features, and morphometric features.

### Frechet means for distirbutions of persistence diagrams

Paper discussion

### On Characterizing the Capacity of Neural Networks using Algebraic Topology

Paper discussion. Link

### Fibers of Failure - using Mapper to classify prediction failures

The Mapper algorithm is able to produce intrinsic topological models of arbitrary data in high dimensions. Through a statistical adaptation of the Nerve lemma, the algorithm can be seen to reproduce the topology and parts of the geometry of the data source under assumptions of dense sampling and good parameter choices. In this talk, we will describe how by careful choice of the Mapper model parameters, the resulting topological model can be guaranteed to separate input values to the predictive process for prediction error, grouping high-error and low-error regions separately. This approach produces a diagnostic process where local failure modes can be classified, feeding into either a model development process or a local correction term to improve predictive performance. We have successfully applied this approach to temperature prediction in steel furnaces.

### No Meeting

None

### No meeting

No meeting

### Fibres of Faiure

The Mapper algorithm is able to produce intrinsic topological models of arbitrary data in high dimensions. Through a statistical adaptation of the Nerve lemma, the algorithm can be seen to reproduce the topology and parts of the geometry of the data source under assumptions of dense sampling and good parameter choices. In this talk, we will describe how by careful choice of the Mapper model parameters, the resulting topological model can be guaranteed to separate input values to the predictive process for prediction error, grouping high-error and low-error regions separately. This approach produces a diagnostic process where local failure modes can be classified, feeding into either a model development process or a local correction term to improve predictive performance. We have successfully applied this approach to temperature prediction in steel furnaces.

### Inaugural meeting

For our spring inaugural meeting MVJ will refresh our memory on the basics, and we will discuss paper assignments and focus interests for our active seminar participants.

### The Fuglede conjecture holds in the finite vector space \(\mathbb{Z}_p^2\)

We will see that the Fuglede Conjecture holds in $\mathbb{Z}_p^2$, proved by Iosevich/Mayeli/Pakianathan. That is the subsets E of the finite vector space $\mathbb{Z}_p^2$ tiles the space if and only if every function from E to the complex numbers is a linear combination of orthogonal exponential functions. Key ideas of the proof use Fourier transforms of functions from this space, direction sets, and some Galois theory.

### The fiber of the persistence map

The persistence map is the map that sends a function on a topological space to it's collection of persistence diagrams, which are canonical invariants of filtering a space by sublevel sets and taking homology in each degree. Geometrically, a persistence diagram is simply a configuration of points in the plane. In this talk I will study which configurations of points are possible and what the ramification of this map is for the simplest possible case---functions on the interval. Ongoing work and open problems will also be discussed.

### Barcodes: The Persistent Topology of Data

TBD

### Single cell mapper analysis

Next-generation high-throughput sequencing has generated an explosion of available genomic data. That holds a great potential for exploring biological systems with unprecedented resolution at the single-cell level. However, single-cell sequencing yields complex data output and implies technical challenges to traditional computational methods, which are mostly based on clustering and combinatorics. In this talk we will give a general perspective on the challenges and advances in the field of single-cell and associated data analysis problems. We put special emphasis on topological techniques.

### Cancelled

Cancelled

### Persistent Cohomology and human motion

TBD: Discuss persistent cohomology, circular coordinates and applications to motion capture.

### Extracting Insights from Complex Shapes using Topology

TBD

### The Forman gradient: a discrete tool for topology-based data analysis

Morse theory studies the relationships between the topology of a shape and the critical points of a real-valued smooth function defined on it. It has been recognized as an important tool for shape analysis and understanding in several applications, including physics, chemistry, medicine, and geography. Morse theory is defined for smooth functions, but recently a discrete counterpart, called Discrete Morse Theory (DMT), has been proposed in an entirely combinatorial setting. DMT introduces the idea of discrete Morse functions as functions that assign values to all the cells of a complex. Based on a discrete Morse function we can impose a partial pairing on such cells and eventually build a combinatorial gradient, also called Forman gradient. The Forman gradient is a very powerful tool as it provides a compact way to represent data without altering its homology. This talk will cover our contribution in developing computational tools, based on the Forman gradient, for the analysis of 2D and 3D scalar fields. I will describe in detail our compact representation for the Forman gradient defined on simplicial complexes and how it can be adapted to the n-dimensional case. Moreover, I will describe our recent work on multivariate data (i.e., collections of scalar fields), and how we can adapt the Forman gradient for analyzing such data in a computationally efficient way.

### Topology and Data

TBD

### Inaugural meeting

For an inaugural meeting we will discuss paper assignments and focus interests for our active seminar participants. There will be pizza and brownies.