# The GRANOLA Project :

Determinantal Point Processes on Graphs for Randomized Numerical Linear Algebra

## The project in a few words

Graphs are ubiquitous tools to represent networks, may they be networks modeling data from neurosciences, sociology, molecular biology, chemistry, etc. A cornerstone of the analysis of graphs is the Laplacian matrix L that encodes their structure. From a linear algebra point-of-view, the analysis of L offers fundamental insights on key properties of the graph: the diffusion speed of an information or a disease on a network, the vulnerability of a network to targeted or random attacks, the redundancy or certain parts of the network, the network's structure in more or less independent modules, are all examples of characteristics of a network one may extract from the Laplacian matrix.

Unfortunately, from a computational stand-point, the cost of this analysis is often too expensive in practice. For instance, computing the pseudo-inverse or the SVD of L requires a worst-case cubic time in the size of the graph n and becomes rapidly burdensome as n increases. Among the strategies developed in the state-of-the-art to overcome this challenge, the class of stochastic methods called RandNLA (for Randomized Numerical Linear Algebra) has aroused a great interest in the last decade. The gist of these approaches is to compute an approximation of the solution by following three steps: randomly sample and/or project the matrix L, solve the algebra problem in this reduced dimension, and extrapolate the solution back to the original dimension. On top of its simplicity, a reason for this approach's great success is that, provided a few conditions are met, the approximation error is theoretically very well controlled. In practice, this means that by allowing a small error on the result, RandNLA methods enable to reduce computation costs by several orders of magnitude in some cases.

A paradigm on which the first step of RandNLA methods relies is iid (independent and identically distributed) sampling, that enables to take advantage of powerful concentration theorems to have a precise theoretical control over the approximation error. However, iid sampling is not optimal in the sense that it may provide very redundant samples: an ideal sampling scheme should indeed incorporate some form of repulsion in order to obtain samples as representative as possible of the matrix at hand. The class of determinantal point processes (DPP) are such a class of repulsive process, that have the added benefit of being tractable (for instance, the marginal probabilities at any order are known), which makes them natural candidates to improve the current performance of RandNLA algorithms.

Sampling from these DPPs, however, is necessarily more expensive than iid sampling. In fact, it requires in general a cubic cost in the size of the set to sample. Thankfully, some specific DPPs are much more efficient to sample. It is the case for instance of DPPs defined over graphs such as random spanning forests that sample edges and nodes of a graph in a time quasi-linear in the number of edges. This type of acceleration is regularly observed when it comes to linear algebra problems involving graphs: a classical example is the inversion problem Mx=b, whose worst-case complexity is cubic in the size of M, and for which a good approximation may be obtained in a time quasi-linear in the number of non-zero elements of M provided that M is the Laplacian matrix of a graph.

In a nutshell, this project aims at developing non-iid RandNLA algorithms specifically designed for Laplacian matrices, by taking advantage of the tractable repulsion enabled by DPPs and the efficiency of graph-based algorithms. The originality of our proposal is to leverage the representative power of DPPs for RandNLA in the specific yet promising context of Laplacian matrices, in order to implement these DPPs with very efficient graph-based sampling algorithms.

**Team members. **
This project is a collaboration between several researchers: Pierre-Olivier Amblard (DR CNRS at Gipsa-lab, Grenoble, France), Luca Avena (assistant professor at Leiden University, Netherlands), Simon Barthelmé (CR CNRS, Gipsa-lab, Grenoble, France), Alexandre Gaudillière (CR CNRS at I2M, Marseille, France), Hugo Jaquard (PhD student 2021-2024, Gipsa-lab, France), Andreas Loukas (research scientist at EPFL, Lausanne, Switzerland), Yigit Pilavci (PhD student 2019-2022, Gipsa-lab, Grenoble, France), and myself.

**Support. **
This project is partially supported by the French Agence Nationale de la Recherche (ANR JCJC ANR-21-CE48-0009).

**Timeline. **
This project officially starts on the 1st of January 2022, for a period of 3 years; even though collaborations and ideas underlying the project stem from several years of prior research activities.

## Associated activity

**October 2021. ** Start of Hugo Jaquard's PhD at Gipsa-lab.

## Associated publications

none yet ;-)**Last update of this page: ** 17th of September 2021.