# Nicolas Tremblay's homepage

I am a CNRS researcher in statistical signal processing and machine learning, currently with the GAIA (Geometrie Apprentissage Information et Algorithmes) team in the GIPSA laboratory, in Grenoble, French Alps.

Research keywords: graph signal processing, spectral clustering, sampling methods, determinantal point processes, random processes on graphs

# Internship positions

Internship positions are available on graph signal processing, determinantal point processes, community detection, machine learning. Please inquire by sending an e-mail at firstname.lastname [at] gipsa-lab.grenoble-inp.fr !

# Current research interests

## Determinantal Point Processes for Coresets

### efficiently creating summaries of large datasets that both preserve diversity and guarantee a low relative error for a given learning task

When faced with a data set too large to be processed all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, and among them “coresets” are especially appealing. A coreset is a (small) weighted sample of the original data that comes with the following guarantee: a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution (based on the so-called “sensitivity” metric), iid random sampling has turned to be one of the most successful methods for building coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to prove general coreset theorems. We apply our results to both the k-means and the linear regression problems, and give extensive empirical evidence that the small additional computational cost of DPP sampling comes with superior performance over its iid counterpart. Of independent interest, we also provide analytical formulas for the sensitivity in the linear regression and 1-means cases. Details may be found here and the code here.

## Revisiting the Bethe-Hessian: Improved Community Detection in Sparse Heterogeneous Graphs

### spectral clustering with the Bethe-Hessian: an improved parametrization to take into account heterogeneous degree distributions in sparse graphs

Spectral clustering is one of the most popular, yet still incompletely understood, methods for community detection on graphs. This article studies spectral clustering based on the Bethe-Hessian matrix $H_r = (r^2 − 1)In + D − rA$ for sparse heterogeneous graphs (following the degree-corrected stochastic block model) in a two-class setting. For a specific value $r = \zeta$, clustering is shown to be insensitive to the degree heterogeneity. We then study the behavior of the informative eigenvector of $H_\zeta$ and, as a result, predict the clustering accuracy. The article concludes with an overview of the generalization to more than two classes along with extensive simulations on synthetic and real networks corroborating our findings. Details may be found here and the code here.

## Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes

### efficient and numerically stable approximate sampling of fixed-size DPP (so-called k-DPPs)

Determinantal Point Processes (DPPs) are popular models for point processes with repulsion. They appear in numerous contexts, from physics to graph theory, and display appealing theoretical properties. On the more practical side of things, since DPPs tend to select sets of points that are some distance apart (repulsion), they have been advocated as a way of producing random subsets with high diversity. DPPs come in two variants: fixed-size and varying-size. A sample from a varying-size DPP is a subset of random cardinality, while in fixed-size “k-DPPs” the cardinality is fixed. The latter makes more sense in many applications, but unfortunately their computational properties are less attractive, since, among other things, inclusion probabilities are harder to compute. In this work we show that as the size of the ground set grows, k-DPPs and DPPs become equivalent, in the sense that fixed-order inclusion probabilities converge. As a by-product, we obtain saddlepoint formulas for inclusion probabilities in k-DPPs. These turn out to be extremely accurate, and suffer less from numerical difficulties than exact methods do. Our results also suggest that k-DPPs and DPPs also have equivalent maximum likelihood estimators. Finally, we obtain results on asymptotic approximations of elementary symmetric polynomials which may be of independent interest. Details may be found here and the code here.

## Determinantal Point Processes for Graph Sampling

### efficiently sampling bandlimited signals using repulsive point processes such as Determinantal Point Processes.

In an effort to improve the existing sampling theorems for graph signals, in terms of number of necessary nodes to sample, robustness to noise of the recovery, and/or complexity of the sampling algorithms; we propose to take a close look at determinantal processes. Building upon recent results linking determinantal processes and random rooted spanning forests on graphs, we investigate how particular random walks on graphs may end up revealing optimal sampling sets. Preliminary results using loop-erased random walks may be found here.

# Past research interests

## A Fast Graph Fourier Transform

### does the Graph Fourier Transform admit a O(NlogN) algorithm like the classical FFT?

The Fast Fourier Transform (FFT) is an algorithm of paramount importance in signal processing as it allows to apply the Fourier transform in O(NlogN) instead of O(N^2) arithmetic operations. Today, in graph signal processing, there are two computational bottlenecks regarding the graph Fourier transform: first, one needs to diagonalize the Laplacian matrix (or adjacency matrix, depending on what model you prefer) which costs O(N^3) operations; then, given a signal, one needs to multiply the obtained eigenvector matrix with the signal costing another O(N^2) operations. To circumvent such issues, we propose a method to obtain approximate graph Fourier transforms that can be applied rapidly and stored efficiently. It is based on a greedy approximate diagonalization of the graph Laplacian matrix, carried out using a modified version of the famous Jacobi eigenvalues algorithm. [PDF] [The Faust Toolbox]

## Compressive Spectral Clustering

### combining graph signal processing and compressed sensing to accelerate spectral clustering

We propose a compressive spectral clustering method that accelerates the classical spectral clustering algorithm by bypassing the exact partial diagonalisation of the Laplacian of the similarity graph and the k-means step at high dimension. It can be summarised as follows:
1) generate a feature vector for each node by filtering O(log(k)) random Gaussian signals on G;
2) sample O(k log(k)) nodes from the full set of nodes;
3) cluster the reduced set of nodes;
4) interpolate the cluster indicator vectors back to the complete graph.
We prove that our method, with a gain in computation time that can reach several orders of magnitude, is in fact an approximation of spectral clustering, for which we are able to control the error. Read more

## Random sampling of bandlimited signals on graphs

How can one sample a k-bandlimited signal on a graph --i.e. a "smooth" signal whose k first graph Fourier coefficients are non-null-- while still being able to recover it perfectly?
The goals here are both to reduce as much as possible the number of samples, while garanteeing robust reconstruction with respect to noise. We propose a random sampling strategy that meets these goals while sampling only O(klog(k)) nodes. Read more

## Subgraph-based Filterbanks for Graph Signals

In order to design a critically-sampled compact-support biorthogonal transform for graph signals, we propose a design based on a partition of the graph in connected subgraphs. Coarsening is achieved by defining one “supernode” for each subgraph and the edges for this coarsened graph derives from the connectivity between the subgraphs. Unlike the “one every two nodes” downsampling on bipartite graphs, this coarsening operation does not have an exact formulation in the graph Fourier domain. Instead, we rely on the local Fourier bases of each subgraph to define filtering operations. Using scalable algorithms to partiton nodes in well connected subgraphs (or communities), our proposition enables filterbank operations (compression, denoising...) in linear time. [PDF] [Code]

## Multiscale community mining in networks

Graph wavelets are recently developed tools that generalize the classical concept of wavelet analysis for signals defined on graphs. We take advantage of this precise notion of scale to develop multiscale community mining tools for networks. Read more

## Graph Empirical Mode Decomposition

Adaptation of the Empirical Mode Decomposition (EMD) algorithm for signals defined on graphs.Read more

## Bootstrapping in networks

In order to obtain statistics on measured observables in complex networks that are usually measured only once and hardly reproducible, one needs to adapt classical bootstrapping techniques to networks. Read more