Workshop Program


The schedule is subject to change.

List of talks

Revisiting Scalarization in Multi-Task Learning
Han Zhao (University of Illinois Urbana-Champaign)
Linear scalarization, i.e., combining all loss functions by a weighted sum, has been the default choice in the literature of multi-task learning (MTL) since its inception. In recent years, there has been a surge of interest in developing Specialized Multi-Task Optimizers (SMTOs) that treat MTL as a multi-objective optimization problem. However, it remains open whether there is a fundamental advantage of SMTOs over scalarization. In this talk, I will revisit scalarization from a theoretical perspective. I will be focusing on linear MTL models and studying whether scalarization is capable of fully exploring the Pareto front. Our findings reveal that, in contrast to recent works that claimed empirical advantages of scalarization, when the model is under-parametrized, scalarization is inherently incapable of full exploration, especially for those Pareto optimal solutions that strike the balanced trade-offs between multiple tasks. I will conclude the talk by briefly discussing the extension of our results to general nonlinear neural networks and our recent work on using online Chebyshev scalarization to controllably steer the search of Pareto optimal solutions.
On the sample complexity of semi-supervised multi-objective learning
Fanny Yang (ETHZ)
In multi-objective learning (MOL), several possibly competing prediction tasks must be solved jointly by a single model. Achieving good trade-offs may require a model class G with larger capacity than what is necessary for solving the individual tasks. This, in turn, increases the statistical cost, as reflected in known MOL bounds that depend on the complexity of G. We show that this cost is unavoidable for some losses, even in an idealized semi-supervised setting, where the learner has access to the Bayes-optimal solutions for the individual tasks as well as the marginal distributions over the covariates. On the other hand, for objectives defined with Bregman losses, we prove that the complexity of G may come into play only in terms of unlabeled data. Concretely, we establish sample complexity upper bounds, showing precisely when and how unlabeled data can significantly alleviate the need for labeled data. These rates are achieved by a simple, semi-supervised algorithm via pseudo-labeling.
Regularized Fine-Tuning for Representation Multi-Task Learning: Adaptivity, Minimax Optimality, and Robustness
Yang Feng (NYU)
We study multi-task linear regression through the lens of regularized fine-tuning, where tasks share a latent low-dimensional structure but may deviate from it or include outliers. Unlike classical models that assume a common subspace, we allow each task’s subspace to drift within a similarity radius and permit an unknown fraction of tasks to violate the shared structure. We propose a penalized empirical-risk algorithm and a spectral method that adapt automatically to both the degree of subspace similarity and the proportion of outliers. We establish information-theoretic lower bounds and show that our methods achieve these rates up to constants, with the spectral method attaining exact minimax optimality in the absence of outliers. Moreover, our estimators are robust: they never perform worse than independent single-task regression and yield strict improvements when tasks are moderately similar and outliers are sparse. A thresholding scheme further adapts to unknown intrinsic dimension, and experiments validate the theory.
Semi‑Supervised Learning on Graphs with GNNs
Olga Klopp (ESSEC)
We study semi‑supervised node prediction on graphs where responses arise from a graph‑aware feature operator followed by a smooth regression map. Within a class combining skip‑connected GCN propagation with a fully connected ReLU network, we (i) derive an oracle inequality for population risk under random label masks that separates approximation and estimation error and exposes dependence on the labeled fraction, covering numbers, and a receptive‑field constant; (ii) show skip connections exactly represent multi‑hop polynomial filters, mitigating over‑smoothing; (iii) give covering‑number bounds; and (iv) quantify robustness of our algorithm. These results link classical graph regularization and modern GNN design.
Controlling the False Discovery Rate in Transformational Sparsity via Split Knockoffs
Yuan Yao (HKUST)
Controlling the False Discovery Rate (FDR) with finite-sample guarantees in variable selection procedures is crucial for ensuring trustworthy and reproducible discoveries. Although extensive research has focused on FDR control in sparse linear models, challenges persist when the sparsity constraint is imposed not directly on the parameters but on a linear transformation of them. Examples of such settings include total variation, wavelet transforms, fused LASSO, and trend filtering.
In this talk, we introduce the Split Knockoff method — a data-adaptive approach to FDR control tailored for transformational sparsity. Our method leverages both variable splitting and data splitting: the linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, yielding an orthogonal design that enhances statistical power and facilitates the creation of orthogonal Split Knockoff copies. To overcome the challenge posed by the failure of exchangeability — stemming from heterogeneous noise introduced by the transformation — we develop novel inverse supermartingale structures that ensure provable FDR control even when directional effects are present. We also discuss a generalization to the Model-X framework, which achieves robust FDR control provided that the marginal distribution of the random design is accurately estimated. Finally, we demonstrate the effectiveness of our approach with applications to an Alzheimer's Disease study and the assessment of large language models.
On Unbiased Stochastic Approximation
Ajay Jasra (The Chinese University of Hong Kong, Shenzhen, Shenzhen China)
We consider the problem of estimating parameters of statistical models associated with differential equations. In particular, we assume that the differential equation can only be solved up to a numerical error; for instance, in the case of stochastic differential equations (SDEs), the Euler-Maruyama method is often used, which introduces time-discretization bias. We adopt an optimization-based paradigm where the objective function (the likelihood function) to be maximized is not available analytically. In this talk, we show how, for certain classes of models, a new randomized stochastic approximation scheme can be used to obtain parameter estimators that eliminate the aforementioned numerical error in mathematical expectation, under suitable assumptions. We detail several applications, including partially observed SDEs and Bayesian inverse problems. Mathematical results are presented alongside numerical simulations demonstrating the efficacy of our methodology.
Gaussian approximation of empirical processes
Alexander Giessing (NUS)
In this talk we develop non-asymptotic Gaussian approximation results for the sampling distribution of suprema of empirical processes when the indexing function class \(\mathcal{F}_n\) varies with the sample size \(n\) and may not be Donsker. Prior approximations of this type required upper bounds on the metric entropy of \(\mathcal{F}_n\) and uniform lower bounds on the variance of \(f \in \mathcal{F}_n\) which, both, limited their applicability to high-dimensional inference problems. In contrast, our results hold under simpler conditions on boundedness, continuity, and the strong variance of the approximating Gaussian process. The results are broadly applicable and yield a novel procedure for bootstrapping the distribution of empirical process suprema based on the truncated Karhunen–Loève decomposition of the approximating Gaussian process. We demonstrate the flexibility of this new bootstrap procedure by applying it to three fundamental problems: simultaneous inference on parameter vectors, construction of simultaneous confidence bands for functions in reproducing kernel Hilbert spaces, and inference on shallow neural networks.
Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds
Arnak Dalalyan (ENSAE)
Generative modeling aims to produce new random examples from an unknown target distribution, given access to a finite collection of examples. Among the leading approaches, denoising diffusion probabilistic models (DDPMs) construct such examples by mapping a Brownian motion via a diffusion process driven by an estimated score function. In this work, we first provide empirical evidence that DDPMs are robust to constant-variance noise in the score evaluations. We then establish finite-sample guarantees in Wasserstein-2 distance that exhibit two key features: (i) they characterize and quantify the robustness of DDPMs to noisy score estimates, and (ii) they achieve faster convergence rates than previously known results. Furthermore, we observe that the obtained rates match those known in the Gaussian case, implying their optimality.
The talk is based on a joint work with E. Vardanyan and V. Arsenyan.
Poster session
Faithful Group Shapley Value
Yuan Zhang (Ohio State University)
Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.
LLM-Powered CPI Prediction Inference with Online Text Time Series
Jinchi Lv (USC)
Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text data for improved CPI prediction, an area still largely unexplored. This paper proposes LLM-CPI, an LLM-based approach for CPI prediction inference incorporating online text time series. We collect a large set of high-frequency online texts from a popularly used Chinese social network site and employ LLMs such as ChatGPT and the trained BERT models to construct continuous inflation labels for posts that are related to inflation. Online text embeddings are extracted via LDA and BERT. We develop a joint time series framework that combines monthly CPI data with LLM-generated daily CPI surrogates. The monthly model employs an ARX structure combining observed CPI data with text embeddings and macroeconomic variables, while the daily model uses a VARX structure built on LLM-generated CPI surrogates and text embeddings. We establish the asymptotic properties of the method and provide two forms of constructed prediction intervals. The finite-sample performance and practical advantages of LLM-CPI are demonstrated through both simulation and real data examples. This is a joint work with Yingying Fan, Ao Sun and Yurou Wang.
Missing Data Imputation by Reducing Mutual Information with Rectified Flows
Song Liu (University of Bristol)
This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating superior imputation performance.
Learning the Climate System: Workflows that Connect Physics, Data, and Machine Learning
Tian Zheng (Columbia University)
Machine learning is increasingly used in climate modeling to support system emulation, parameter inference, forecasting, and scientific discovery, addressing challenges such as physical consistency, multi-scale coupling, data sparsity, and integration with existing workflows. In this talk, I will present a series of applied case studies focused on workflow design in climate ML, including surrogate modeling, ML-based parameterization, equation discovery from high-fidelity simulations, probabilistic programming for parameter inference, simulation-based inference in remote sensing, subseasonal forecasting, and physics-informed transfer learning. These examples highlight how ML workflows can be grounded in physical knowledge, shaped by simulation data, and designed to incorporate real-world observations. By unpacking these workflows and their design choices, I will discuss open challenges in building transparent, adaptable, and reproducible ML systems for climate science.
State-of-the-art Confidence Intervals and Confidence Sequences through Testing-by-Betting Algorithms
Francesco Orabona (KAUST)
We consider the problem of constructing optimal confidence intervals and confidence sequences for the mean of a bounded random variable, a key problem in statistics and machine learning when sampling is costly. All the traditional methods, like Hoeffding’s, Bernstein’s, and the Law-of-Iterated-Logarithm (LIL) bounds, while asymptotically optimal, have disappointing performance in the small sample regime. Here, we present the current state-of-the-art approaches for both problems, based on the testing-by-betting framework by Shafer&Vovk. The first one is based on the betting strategy used by the Universal Portfolio Algorithm (Cover and Ordentlich, 1996) and obtains the first never-vacuous confidence sequences satisfying the LIL bound. The second one is a new dynamic betting algorithm that explicitly takes into account the time horizon to achieve the theoretically and numerically tightest confidence intervals.
Anomaly detection using surprisals
Rob Hyndman (Monash University)
I will discuss a probabilistic approach to anomaly detection based on extreme "surprisal values" aka log scores, equal to minus the log density at each observation. The surprisal approach can be used for any collection of data objects, provided a probability density can be defined on the sample space. It can distinguish anomalies from legitimate observations in a heavy tail, and will identify anomalies that are undetected using methods based on distance measures. I will demonstrate the idea in various real data examples including univariate, multivariate and regression contexts, and when exploring more complicated data objects. I will also briefly outline the underlying theory when the density is known, and when it is estimated using a kernel density estimate. In the latter case, an innovative bandwidth selection method is used based on persistent homology.
Conversion theorem and minimax optimality for continuum contextual bandits
Alexandre Tsybakov (CREST, ENSAE, IP Paris)
Abstract: We study the continuum contextual bandit problem, where the learner sequentially receives a side information vector (a context) and has to choose an action in a convex set, minimizing a function depending on the context. The goal is to minimize the dynamic contextual regret, which provides a stronger guarantee than the standard static regret. We propose a meta-algorithm that to any input non-contextual bandit algorithm associates an output contextual bandit algorithm, and we prove a conversion theorem, which allows one to derive a bound on the contextual regret from the static regret of the input algorithm. We apply this strategy to obtain upper bounds on the contextual regret in several major settings (losses that are Lipschitz, convex and Lipschitz, strongly convex and smooth with respect to the action variable). Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear contextual regret for strongly convex and smooth functions in noisy setting. We show that it achieves, up to a logarithmic factor, the minimax optimal rate of the contextual regret as a function of the number of queries. Joint work with Arya Akhavan, Karim Lounici and Massi Pontil.
Adaptive sample splitting for randomization tests
Yao Zhang (NUS)
Randomization tests are widely used to generate finite-sample valid p-values for causal inference on experimental data. However, when applied to subgroup analysis, these tests may lack power due to small subgroup sizes. Incorporating a shared estimator of the conditional average treatment effect (CATE) can substantially improve power across subgroups but requires sample splitting to preserve validity. To this end, we quantify each unit's contribution to estimation and testing using a certainty score, which measures how certain the unit's treatment assignment is given its covariates and outcome. We show that units with higher certainty scores are more valuable for testing but less important for CATE estimation, since their treatment assignments can be accurately imputed. Building on this insight, we propose AdaSplit, a sample-splitting procedure that adaptively allocates units between estimation and testing to maximize their overall contribution across tasks. We evaluate AdaSplit through simulation studies, demonstrating that it yields more powerful randomization tests than baselines that omit CATE estimation or rely on random sample-splitting. Finally, we apply AdaSplit to a blood pressure intervention trial, identifying patient subgroups with significant treatment effects. This is a joint work with Zijun Gao.
Asymptotic FDR control with Model-X knockoffs: is moments matching sufficient?
Yingying Fan (USC)
We propose a unified theoretical framework for studying the robustness of the Model-X knockoffs framework by investigating the asymptotic false discovery rate (FDR) control of the practically implemented approximate knockoffs procedure. This procedure deviates from the Model-X knockoffs framework by substituting the true covariate distribution with a user-specified distribution that can be learned using in-sample observations. By replacing the distributional exchangeability condition of the Model-X knockoff variables with three conditions on the approximate knockoff statistics, we establish that the approximate knockoffs procedure achieves the asymptotic FDR control. Using our unified framework, we further prove that an arguably most popularly used knockoff variable generation method—the Gaussian knockoffs generator based on the first two moments matching—achieves the asymptotic FDR control when the two-moment-based knockoff statistics are employed in the knockoffs inference procedure. For the first time in the literature, our theoretical results justify formally the effectiveness and robustness of the Gaussian knockoffs generator. Simulation and real data examples are conducted to validate the theoretical findings.
Statistical Inference for High-Dimensional and Functional Data via Bootstrapping
Zhenhua Lin (NUS)
Statistical inference in high-dimensional and functional settings is both central and challenging. We develop a set of bootstrap-based procedures for common inferential tasks, including high-dimensional analysis of variance, two-sample homogeneity testing, and hypothesis tests for the mean function and for the slope function in functional linear models. We establish asymptotic validity and consistency of the proposed methods and derive their convergence rates. As a by-product, we uncover a theoretical distinction between FPCA-based estimation and inference for the slope function. Numerical studies show accurate type-I error control and competitive power, especially with limited samples and weak signals.
Leveraging synthetic data in statistical inference
Edgar Dobriban (Wharton)
Synthetic data, for instance generated by foundation models, may offer great opportunities to boost sample sizes in statistical analysis. However, the distribution of synthetic data may not be exactly the same as that of the real data, thus incurring the risk of faulty inferences. Motivated by these observations, we study how to use synthetic or auxiliary data in statistical inference problems ranging from predictive inference (conformal prediction) to hypothesis testing. We develop methods that are able to leverage synthetic or auxiliary data in addition to real data. If the synthetic data distribution is similar to that of the real data, our methods improve precision. At the same time, our methods maintain a guardrail level of coverage even if the synthetic data distribution is arbitrarily bad. We illustrate our methods with a variety of examples ranging from AI to the medical domain.
Random Fields on Dynamic Metric Graphs
Emilio Porcu (Khalifa University)
We consider the problem of time-evolving generalised networks, where (i) the edges connecting the nodes are nonlinear, (ii) stochastic processes are continuously indexed over both vertices and edges and (iii) the topology is allowed to change over time, that is: vertices and edges can disappear at subsequent time instants and edges may change in shape and length. Topological structures satisfying (i) and (ii) are usually represented through special classes of metric graphs, termed graphs with Euclidean edges. We build a rigorous mathematical and statistical framework for time-evolving networks. We consider both cases of linear and circular time, where, for the latter, the generalised network exhibits a periodic structure. Our findings allow to illustrate pros and cons of each setting. Our approach allows to build proper semi-distances for the temporally-evolving topological structures of the networks. Generalised networks become semi-distance spaces whenever equipped with semi-distances. Our final effort is then devoted to guiding the reader through the appropriate choice of classes of functions that allow to build random fields on the time-evolving networks, via their kernels, that are composed with the temporally-evolving semi-distances topological structure.
From Hypothesis Testing to Distribution Estimation
Nikita Zhivotovskiy (Berkeley)
Distinguishing between two distributions based on observed data is a classical problem in statistics and machine learning. But what if we aim to go further—not just test, but actually estimate a distribution close to the true one in, say, Kullback-Leibler divergence? Can we do this knowing only that the true distribution lies in a known class, without structural assumptions on the individual densities? In this talk, I will review classical results and present recent developments on this question. The focus will be on high-probability error bounds that are optimal up to constants in this general setting.
PCS uncertainty quantification for regression and classification
Anthony Ozerov (Berkeley)
Trustworthy uncertainty quantification (UQ) is required for good, safe decision-making when using machine learning models. We discuss new UQ methods for regression and classification under the Predictability-Computability-Stability framework. This involves (1) training different models on the same dataset, (2) screening out those which make poor predictions, (3) creating a bootstrap ensemble, and (4) using calibration data to create prediction intervals or sets achieving a specified coverage level. By combining model selection (screening), uncertainty due to model misspecification and noise (ensembling), and calibration, we obtain predictions on real datasets which are sharper (narrower intervals or smaller sets) and more adaptive to data subgroups than standard one-model conformal prediction. Finally, we discuss one challenge that arose in this work: how to evaluate prediction sets in classification, balancing sharpness and adaptivity. Drawing from well-known scoring rules for probabilistic predictions, we propose new evaluation metrics for prediction sets and show how they can be used to choose between or tune prediction algorithms.
LLM predisposition
Xin Tong (HKU)
We study the faithfulness of LLM-mediated communication by modeling it as a generation-summarization process. Using a novel experimental framework and an adapted benchmark dataset, we introduce a quantitative metric to evaluate faithfulness. Our results reveal significant information distortion in current LLM-mediated communication.
New Statistical Questions in the Age of Large Language Models
Amit Sharma (MSR India)
Statistical analysis has long relied on a division of labor: domain knowledge is provided by subject-matter experts, while inference is guided by formal statistical methods. Large language models (LLMs) blur this boundary by generating domain knowledge–like priors for a problem, offering new opportunities and statistical challenges. I will first demonstrate how LLMs can propose causal mechanisms across fields such as medicine and environmental science, suggest candidate variables, functional forms, and even robustness checks. Unlike expert knowledge, however, LLM-derived priors cannot be assumed valid—they introduce new, structured but unpredictable forms of error.
This motivates a broader statistical question: what would end-to-end inference look like in an LLM-assisted regime? For example, in causal effect estimation, LLMs may provide distributions over causal graphs that can then be used for effect estimation; conversely, given an effect, they may help check or refine the assumptions. Extending this idea, we arrive at the possibility of inference pipelines that move directly from a scientific question to study design to parameter estimation with LLM input at each stage. Such workflows raise new statistical challenges, including how to construct confidence intervals, quantify uncertainty, and calibrate inference when part of the prior comes from an unreliable but informative model. The talk will motivate these open problems with real-world case studies.
Causal Modeling with Stationary Processes
Mathias Drton (TUM)
The ultimate aim of many data analyses is to infer cause-and-effect relationships between random variables of interest. While much of the available methodology for addressing causal questions relies on structural causal models, these models are best suited for systems without feedback loops. Extensions to accommodate feedback have been proposed, but often result in models that are challenging to interpret. In this lecture, we present an alternative approach to graphical causal modeling that considers stationary distributions of multivariate diffusion processes.
Causality-Inspired Distributional Robustness for Nonlinear Models
Peter Buhlmann (ETHZ)
Distributional robustness is a central challenge in predictive modeling, as real-world data often exhibit substantial distribution shifts across environments. Causality offers a principled framework for modeling such distributional perturbations, enabling rigorous guarantees of robustness in nonlinear models through representation learning. We will discuss the framework and its theoretical foundations, and illustrate its applications in perturbation genomics and medical domain adaptation.
From Intrinsic Dimension to Information Imbalance: Nearest-Neighbor Methods for Dimensionality Reduction, Nonparametric Variable Selection, and Causal Discovery
Antonietta Mira (Università della Svizzera italiana)
This talk presents recent advances in nearest-neighbor methods for understanding complex, high-dimensional data. In the first part, we focus on intrinsic dimension (ID), a simple yet powerful descriptor that reveals the effective number of degrees of freedom in a dataset. We show how ID can be estimated adaptively, self-consistently determining both the optimal scale of analysis and the number of variables required to describe the data without significant information loss. Moreover, different IDs may coexist within the same dataset, pointing to subsets of points lying on distinct manifolds and naturally yielding a clustering of the data. Applications range from gene expression and protein folding to pandemic evolution, fMRI, finance, and network data. In the second part, we introduce the concept of information imbalance (II) and its differentiable extension (DII), which provide nonparametric measures of variable informativeness and causal directionality. Applications to synthetic and real-world datasets, including the EU Emission Trading System, highlight the potential of this framework for dimensionality reduction, variable selection, and causal discovery.
Bayesian predictive-based uncertainty quantification
Sonia Petrone (Bocconi)
In the rapid evolution of Statistics and AI, we still feel a tension between the “two cultures” - classic statistical inference versus algorithmic prediction. The Bayesian approach has prediction in its foundations, and may naturally combine both cultures. In a Bayesian predictive approach, one directly reasons on prediction of future observations, bypassing models and parameters, or possibly using them implicitly. In a nutshell, while Statistics traditionally goes from inference to prediction, here one goes from prediction to inference. This approach allows us to regard predictive algorithms - computationally convenient approximations of exact Bayesian solutions, or black-box predictive engines - as Bayesian predictive learning rules, and to provide them with full Bayesian uncertainty quantification. In the talk, I will review basic concepts and recent results, and discuss ongoing directions, such as calibrating the predictive rule for predictive ‘efficiency’ and good inferential properties.
Nonparametric multivariate Hawkes processes in high dimensions
Judith Rousseau (Paris Dauphine University)
Multivariate Hawkes processes form a class of point processes describing self and inter exciting/inhibiting processes. There is now a renewed interest of such processes in applied domains and in machine learning, but there exists only limited theory about inference in such models apart from parametric models. After reviewing results on convergence rates for Bayesian nonparametric approaches to such models, when the dimension K of the process is fixed, I will present some new results on Estimation when K is large.
To be more precise, the intensity function of a linear Hawkes process has the following form: for each dimension \(k \leq K\)
\[\lambda^k(t) = \sum_{\ell \leq K} \int_{0}^{t^-} h_{\ell k}(t-s) dN_s^{\ell} + \nu_k, \quad t \in [0, T]\] where \(N^{\ell}, \ell \leq K\) is the Hawkes process, \(\nu_k \ge 0\) and \(h_{\ell k} \geq 0\). The parameters are the functions \(h_{\ell k}\) and the constants \(\nu_k\), with \(\ell, k \in [K]\).
Estimation of \(\nu, h\) when \(K\) is allowed to grow with \(T\) has attracted recently some interest under sparsity assumptions, i.e. assuming that for each \(k\) only a small number \(s_0\) of functions \(h_{\ell k}\) can be non-null. However, only partial results on the estimation o the parameters have been derived.
In this talk I will show that under sparsity (on \(s_0\)) and stationarity assumptions it is possible to estimate nonparametrically the \(K\) background rates \(\nu_k\) and the \(K_2\) interaction functions \(h_{\ell k}\) when \(K\) is allowed to grow with \(T\) exponentially.
I will provide results under the empirical \(L_1\) loss on the intensity functions and on the direct \(L_1\) loss on the parameters.
I will propose a two-step procedure which allows for the estimation of the parameters at a rate which is not impacted by \(K\) even for very large \(K\)’s.
I will also explain for finite \(K\) the specificities for deriving efficient semi-parametric theory in these models, corresponding to the fact that we allow the functions \(h_{\ell k}\) to be equal to 0; I will then provide conditions for verifying the semiparametric Bernstein von Mises property on functionals in the form \(\Psi(\nu, h) = \Psi(\nu)\) or \(\Psi(\nu, h) = \Psi(h)\), where \(\Psi\) is smooth. I will apply these results to the functionals \(\Psi(\nu, h) = \nu\) and \(\Psi(\nu, h) = \int h_{\ell k}(x)dx\) and deduce from these a sharp result on the estimation of the graph of interaction. The latter is defined as the adjacency matrix \(\Delta\) where \(\delta_{\ell k} = 1\) if and only if \(\rho_{\ell k} = \int h_{\ell k}(x)dx \ge 0\).
Reliable Uncertainty Quantification with Missing Data and Adaptive Conformal Selection
Aymeric Dieuleveut (École Polytechnique)
Conformal prediction provides a flexible framework for constructing predictive sets with distribution-free coverage guarantees. Yet, practical settings often challenge these guarantees, for instance when covariates contain missing values or when multiple valid conformal sets are available. I will discuss recent work addressing these two issues.
The first part focuses on conformal prediction with missing covariates, examining how imputation and missingness patterns affect coverage and how new approaches can achieve validity conditional on missingness. The second part considers the problem of selecting among several conformal sets without compromising validity, through stability-based procedures that extend to online and structured contexts. These results contribute to a better understanding of how conformal methods behave under realistic data and modeling conditions.
Randomization for Algorithmic Fairness
Francesco Bonchi (University)
Algorithmic decision-making has become pervasive in high-stakes domains such as health, education, and employment. This widespread adoption raises crucial concerns about the fairness of the algorithms adopted. In this talk, I will delve into a recent research line that explores individual fairness in combinatorial optimization problems, where many valid solutions may exist to a given problem instance. Our proposal, named distributional max-min fairness, leverages the power of randomization to maximize the expected satisfaction of the most disadvantaged individuals. The talk will highlight applications across fundamental algorithmic challenges, including matching, ranking, and shortest-path queries.
Relevant papers:
[1] D. Garcia-Soriano, F. Bonchi. "Fair-by-design matching" (DAMI 2020)
[2] D. Garcia-Soriano, F. Bonchi. "Maxmin-Fair Ranking: Individual Fairness under Group-Fairness Constraints" (KDD 2021)
[3] A. Ferrara, D. Garcia-Soriano, F. Bonchi. "Beyond Shortest Paths: Node Fairness in Route Recommendation" (VLDB 2025)
Rectifying Conformity Scores for Better Conditional Coverage
Maxim Panov (MBZUAI)
We present a new method for generating confidence sets within the split conformal prediction framework. Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage. The transformation is based on an estimate of the conditional quantile of conformity scores. The resulting method is particularly beneficial for constructing adaptive confidence sets in multi-output problems where standard conformal quantile regression approaches have limited applicability. We develop a theoretical bound that captures the influence of the accuracy of the quantile estimate on the approximate conditional validity, unlike classical bounds for conformal prediction methods that only offer marginal coverage. We experimentally show that our method is highly adaptive to the local data structure and outperforms existing methods in terms of conditional coverage, improving the reliability of statistical inference in various applications.