The CUNY Data Science and Applied Topology Reading Group is joint between the Mathematics and Computer Science programmes. We meet Fridays 11.45 -- 12.45 in GC 3209. You can contact us at cunygc@appliedtopology.nyc.

Our plan is to primarily read and discuss seminal papers in data science, in applied topology and in topological data analysis. Each seminar one participant takes the responsibility to present a paper and prepare items for discussion. We expect occasionally to be able to invite external speakers.

## Schedule

Current schedule can be found here.

We will be sending out announcements through a mailing list; you can subscribe here.

## Organizers

- Mikael Vejdemo-Johansson, Computer Science Programme, CUNY Graduate Center; Department of Mathematics, CUNY College of Staten Island
- Azita Mayeli, Mathematics Programme, CUNY Graduate Center; Department of Mathematics, CUNY Queensborough Community College

## Suggested papers

We have compiled a list of papers that might be interesting to present.

# Schedule

### Semester introduction

We will talk about the upcoming semester, look for volunteers to give talks on specific papers and give a refresher overview of persistent homology.

### Topological Structure of Linear Manifold Clustering

In the topological data analysis, the first step is a construction of a simplicial complex from a discrete points set D sampled from some manifold. In this paper, we present an algorithm for the efficient computation of such simplicial complex which utilizes clustering structure, comprised of subspace clusters, of the point set for speeding up a complex construction procedure while keeping relevant topological invariants of the underlying sampled manifold. Experiments show that the proposed construction algorithm provides smaller complexes with less noise which gives a better homological picture than other construction methods as well as an improved construction performance and a topological invariant interpretability on a geometrical level.

### On the Metric Distortion of Embedding Persistence Diagrams into Reproducing Kernel Hilbert Spaces

Persistence Diagrams (PDs) are important feature descriptors in Topological Data Analysis. Due to the nonlinearity of the space of PDs equipped with their diagram distances, most of the recent attempts at using PDs in Machine Learning have been done through kernel methods, i.e., embeddings of PDs into Reproducing Kernel Hilbert Spaces (RKHS), in which all computations can be performed easily. Since PDs enjoy theoretical stability guarantees for the diagram distances, the metric properties of a kernel k, i.e., the relationship between the RKHS distance dk and the diagram distances, are of central interest for understanding if the PD guarantees carry over to the embedding. We study the possibility of embedding PDs into RKHS with bi-Lipschitz maps. In particular, we show that when the RKHS is infinite dimensional, any lower bound must depend on the cardinalities of the PDs, and that when the RKHS is finite dimensional, finding a bi-Lipschitz embedding is impossible, even when restricting the PDs to have bounded cardinalities.

### Multiple hypothesis testing in persistent homology

We propose a general null model for persistent homology barcodes from a point cloud, to test for example acyclicity in simplicial complexes generated from point clouds. One advantage of the null model we propose is efficiency in generating a null model that applies to a broad set of hypothesis testing procedures. The second key idea in this talk is using the null model to address multiple hypothesis testing via control of family-wise error rates and false discovery rates.

### Misinformation Data Science

TBA

### Morse-Witten Theory for Real Operators

Accelerated by applications in mathematical physics from the late 20th century, the interaction of Morse and Hodge Theory in smooth geometry has propelled remarkable advances in geometric topology over the past four decades. Progress in this domain has considerably outpaced parallel investigations in combinatorial topology, where several of the most basic questions regarding spectral analysis of discrete Morse structures remain outstanding. The present talk introduces a discrete Morse-Witten theory for real-linear operators, a direct extension of the Morse-Witten theory for CW complexes pioneered by Forman in the late 1990’s. Time permitting, we will discuss some consequences for spectral analysis of cellular spaces, the surprisingly categorical underpinnings of the Morse-Witten complex, and several future directions. No prior knowledge of Morse-Witten theory will be assumed, smooth or otherwise.

### Obayashi, Hiraoka, Kimura -- Persistence diagrams with linear machine learning models

We will read and discuss the paper Obayashi, Hiraoka, Kimura -- Persistence diagrams with linear machine learning models

### (CANCELLED) How non-invasive functional imaging techniques benefit the research in human’s language system?

Human language is extremely complicated and crucial to one’s life quality. Classical models discovered several major language centers such as Broca’s Area, Wernicke’s area and so on. However, the development of this model was limited by the few aphasic cases studied as well as invasive tools used. Not until non-invasive imaging technology, especially, functional imaging techniques such as functional Magnetic Resonant Imaging (fMRI) and Positron Emission Tomography (PET) emerged, the relationship between brain and its language function starts to unveil itself.

In this talk, I will give an intensive literature review on what we know about language from functional imaging studies, majorly fMRI studies. It will cover the definition of language networks, the famous and popularly employed experimental paradigms developed for language studies, the analyzing tools from Independent Component Analysis (ICA) to graphical tools.

As a conclusion, I will give a brief overview of a few projects I performed during my PhD years: characterizing functional language networks in healthy, cancerous brains and bilingual/ monolingual Spanish-English speakers.

### The tasty low resolution problem space of knitting and mathematics

It's fun to think of good generative patterns and mathematical visualizations for low resolution spaces like knitting. Come hear about the interesting mathematics of what is possible on industrial knitting machines. Learn about skew, pixel resolution constraints, stitch direction, and types of knitting. Find out about some recent research and what it means to mathematical knitting.

### Deep Learning with Topological Signatures

Inferring topological and geometrical information from data can offer an alternative perspective on machine learning problems. Methods from topological data analysis, e.g., persistent homology, enable us to obtain such information, typically in the form of summary representations of topological features. However, such topological signatures often come with an unusual structure (e.g., multisets of intervals) that is highly impractical for most machine learning techniques. While many strategies have been proposed to map these topological signatures into machine learning compatible representations, they suffer from being agnostic to the target learning task. In contrast, we propose a technique that enables us to input topological signatures to deep neural networks and learn a task-optimal representation during training. Our approach is realized as a novel input layer with favorable theoretical properties. Classification experiments on 2D object shapes and social network graphs demonstrate the versatility of the approach and, in case of the latter, we even outperform the state-of-the-art by a large margin.

### Fast estimation of recombination rates using topological data analysis

NOTA BENE: This talk will take place at 10am.

In this talk, I will describe recent work (joint with McGuirl, Miyagi, and Humphreys) that uses topological features to infer recombination rates from genomic data. Building on work of Camara, Levine, and Rabadan, we show that low-dimensional persistent homology contains a great deal of information about recombination. Perhaps most interestingly, we are able to explain the qualitative behavior of various topological features in terms of standard coalescent theory.

### How non-invasive functional imaging techniques benefit the research in human’s language system?

Human language is extremely complicated and crucial to one’s life quality. Classical models discovered several major language centers such as Broca’s Area, Wernicke’s area and so on. However, the development of this model was limited by the few aphasic cases studied as well as invasive tools used. Not until non-invasive imaging technology, especially, functional imaging techniques such as functional Magnetic Resonant Imaging (fMRI) and Positron Emission Tomography (PET) emerged, the relationship between brain and its language function starts to unveil itself.

In this talk, I will give an intensive literature review on what we know about language from functional imaging studies, majorly fMRI studies. It will cover the definition of language networks, the famous and popularly employed experimental paradigms developed for language studies, the analyzing tools from Independent Component Analysis (ICA) to graphical tools.

As a conclusion, I will give a brief overview of a few projects I performed during my PhD years: characterizing functional language networks in healthy, cancerous brains and bilingual/ monolingual Spanish-English speakers.

### Sparse Regularization via Convex Analysis

Sparse approximate solutions to linear equations, which has numerous applications, can be obtained via L1 norm regularization. But the L1 norm tends to underestimate the true values. We introduce a non-convex alternative to the L1 norm. Unlike other non-convex regularizers, the proposed non-convex regularizer maintains the convexity of the objective function to be minimized. This allows one to retain beneficial properties of both convex and non-convex regularization. Although the new regularizer is non-convex, it is defined using tools of convex analysis. For this purpose, we define a generalization of the Moreau envelope and a generalized multivariate Huber function. The resulting optimization problem can be solved by proximal algorithms.

Bio:

Ivan Selesnick works in signal and image processing, wavelet-based signal processing, sparsity techniques, and biomedical signal processing. He is with the Department of Electrical and Computer Engineering at New York University in the Tandon School of Engineering where he is Department Chair. He received the BS, MEE and PhD degrees in Electrical Engineering from Rice University in 1990, 1991 and 1996. He received the Jacobs Excellence in Education Award from Polytechnic University in 2003 and became an IEEE Fellow in 2016. He has been an associate editor for several IEEE Transactions and IEEE Signal Processing Letters.