Banner

Seventh Conference on
Discrete Optimization and Machine Learning

Spring 2025 x Kyoto, Japan

Coordinates

Dates: 12th - 16th May, 2025
Place: Research Institute for Mathematical Sciences (RIMS), Kyoto, Japan
Organizers: Antoine Deza, Kazuhisa Makino, Sebastian Pokutta, Akiyoshi Shioura

Within the 2025 RIMS Project
Advances in Theoretical Research on Mathematical Optimization

DOML 2025

Conference Dinner: May 13 at 6 pm at Camphora (Kyoto University Campus Restaurant)
Yoshidahonmachi, Sakyo Ward, Kyoto, 606-8317, Japan. Located near main gate [map]

Program

May 12

TimeSpeakerTalk Title (for abstract click title or see below)
09:30 - 10:00Registration/Coffee
10:00 - 12:00Session Chair: TBD
10:00 - 10:30Marc PfetschCombinatorial Aspects of Physical Networks
10:30 - 11:00Yao Xie
Variable selection for kernel two-sample tests.


Abstract. We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to distinguish samples from two groups. To solve this problem, we propose a framework based on the kernel maximum mean discrepancy (MMD). Our approach seeks a group of variables with a pre-specified size that maximizes the variance-regularized MMD statistics. This formulation also corresponds to the minimization of asymptotic type-II error while controlling type-I error, as studied in the literature. We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees for different choices of kernel functions. Furthermore, we provide a statistical testing power analysis of our proposed framework. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach. This is a joint work with Jie Wang, and Santanu Day at Georgia Tech.

11:00 - 11:30David Martínez-Rubio
Convergence and Trade-Offs in Riemannian Gradient Descent and Proximal Point


Abstract. Convergence analyses of the two most fundamental algorithms for geodesically convex optimization were incomplete until our work. We design techniques and develop new full analyses that reveal some trade-off between geometric, iterate boundedness and computational efficiency. As a consequence, we can also obtain minmax methods for geodesically convex concave without knowing the initial distance to a saddle point for the first time.

11:30 - 12:00Frédéric Meunier
The Lemke—Howson algorithm for oriented matroids, and applications


Abstract. Lemke and Howson proposed in 1964 a pivot-based algorithm for solving bimatrix games. The year after, Lemke proposed a closely related algorithm for solving a large class of linear complementarity problems. Although Todd extended the Lemke algorithm to oriented matroids in 1984, the extension of the Lemke—Howson algorithm to oriented matroids has not yet been addressed in the literature. Actually, achieving such an extension requires solving a few issues. One of them is to propose a relevant definition of Nash equilibria for oriented matroids. Another one is more challenging, namely finding the counterpart of the following preliminary operation of the classical Lemke—Howson algorithm: the translation of the payoff matrices so as to make them positive. In this work, we show how to overcome these challenges and extend the Lemke—Howson algorithm to oriented matroids. A few applications are discussed, such as a strengthening of a theorem by McLennan and Tourky, a combinatorial interpretation of the orientation of the pivot step in the classical version, and the localization of tropical bimatrix games in the class PPAD. Joint work with Xavier Allamigeon and Stéphane Gaubert.

Lunch Break
13:30 - 15:00Session Chair: TBD
13:30 - 14:00Amir Ali AhmadiTBD
14:00 - 14:30Shmuel Onn
Circuit and Graver Walks and Linear and Integer Programming


Abstract. We show that a circuit walk from a given feasible point of a given linear program to an optimal point can be computed in polynomial time using only linear algebra operations and the solution of the single given linear program. We also show that a Graver walk from a given feasible point of a given integer program to an optimal point is polynomial time computable using an integer programming oracle, but without such an oracle, it is hard to compute such a walk even if an optimal solution to the given program is given as well. Combining our oracle algorithm with recent results on sparse integer programming, we also show that Graver walks from any point are polynomial time computable over matrices of bounded tree-depth and subdeterminants.

14:30 - 15:00Michael Joswig
Hypersimplices, tropical geometry and finite metric spaces


Abstract. The hypersimplex (\Delta(k,n)) is the convex polytope which is the set of those points in the unit cube ([0,1]^n) which satisfy the affine linear equation (\sum x_i=k). We report on known and new results concerning the secondary fan (\Sigma(k, n)), which stratifies the regular subdivisions of the (\Delta(k,n)). For tropical geometry this is relevant because (\Sigma(k,n)) contains the tropical Grassmannian ({\rm TGr}(k,n)) as a subset; we discuss that relationship. Furthermore, in the special case (k = 2) the fan (\Sigma(2,n)) is closely related to the metric fan ({\rm MF}(n)), which forms a natural parameter space for the metric spaces on (n) points. Our analysis of the fans (\Sigma(2,n)) improves known results of Bandelt and Dress concerning structural properties of finite metric spaces, with applications to phylogenetics. The new results are joint work with Laura Casabella and Lars Kastner.

Coffee Break
15:30 - 17:00Session Chair: TBD
15:30 - 16:00Chenglong Bao
Convergence Analysis of Two Stochastic Algorithms


Abstract. In this talk, we will explore convergence analysis for two stochastic optimization algorithms. First, we establish a tight convergence rate for the stochastic proximal point algorithm in solving stochastic composite optimization problems. Second, we present global convergence results for the stochastic gradient descent algorithm with adaptive noise injection, tailored for minimizing nearly convex functions.

16:00 - 16:30Andrea WaltherTBD
16:30 - 17:00Leo LibertiComputation with distance geometry

May 13

TimeSpeakerTalk Title (for abstract click title or see below)
09:30 - 10:00Coffee
10:00 - 12:00Session Chair: TBD
10:00 - 10:30Masashi SugiyamaTBD
10:30 - 11:00Dan Garber
A Linearly Convergent Frank-Wolfe-type Method for Smooth Convex Minimization over the Spectrahedron


Abstract. We consider the problem of minimizing a smooth and convex function over the n-dimensional spectrahedron — the set of real symmetric n×n positive semidefinite matrices with unit trace, which underlies numerous applications in statistics, machine learning and additional domains. Standard first-order methods often require high-rank matrix computations which are prohibitive when the dimension n is large. The well-known Frank-Wolfe method on the other hand, only requires efficient rank-one matrix computations, however suffers from worst-case slow convergence, even under conditions that enable linear convergence rates for standard methods. In this work we present the first Frank-Wolfe-based algorithm that only applies efficient rank-one matrix computations and, assuming quadratic growth and strict complementarity conditions, is guaranteed, after a finite number of iterations, to converges linearly, in expectation, and independently of the ambient dimension.

11:00 - 11:30Lars ScheweDiscrete Optimization for better outage planning of electricity networks
11:30 - 12:00Mourad Baïou
Game theory models and algorithms for trading demands


Abstract. We introduce a new cooperative game theory model that we call production-distribution game. It models efficient sharing principles for practical collaboration in transportation. The originality of our model lies in the fact that the value/strength of a player does not only depend on the individual cost or benefit of the goods she owns but also on her market shares (customers demand). We prove that we can compute the nucleolus efficiently, in a nontrivial, interesting special case. We provide two algorithms to compute the nucleolus: a simple separation algorithm and a fast primal-dual one. We also show that our results can be used to tackle more general versions of the problem.

Lunch Break
13:30 - 15:00Session Chair: TBD
13:30 - 14:00Andrés Gómez
Real-time solutions for a class of a class of mixed-integer optimization problems


Abstract. We consider mixed-integer quadratic optimization problems with banded matrices and indicator variables. These problems arise pervasively in statistical inference problems with time-series data, where the banded matrix captures the temporal relationship of the underlying process. In particular, the problem studied arises in monitoring problems, where the decision-maker wants to detect changes or anomalies. We propose to solve these problems using decision diagrams. In particular we show how to exploit the temporal dependencies to construct diagrams with size polynomial in the number of decision variables. We also describe how to construct the convex hull of the set under study from the decision diagrams, and how to deploy the method online to solve the problems in milliseconds via a shortest path algorithm.

14:00 - 14:30Claudia D'Ambrosio
Surrogate mixed integer nonlinear models by means of regression splines


Abstract. Complex phenomena can be accurately described by means of data-driven mathematical models. However, being able to integrate these models within a mathematical optimization framework can be, in general, very challenging. In fact, many of these data-driven models are `black-box’, in the sense that they do not have an explicit mathematical formula which describes them. In other cases, even if an explicit expression exists, including it into a mathematical optimization model may make solving the problem computationally intractable. We propose to use a special kind of surrogate models, regression splines, to deal with functions of this kind which appear in Mixed Integer Nonlinear Programming (MINLP) problems. The choice of spline functions is not arbitrary. On one hand, they offer a good compromise between accuracy and complexity. On the other hand, their functional form allows us to exploit separability and approximate general non-convex MINLPs by a more tractable subclass of problems.

14:30 - 15:00Axel Parmentier
Primal Dual Algorithm for Empirical Risk Minimization in Combinatorial Optimization Augmented Learning


Abstract. As industry seeks to make its processes more resilient, data driven optimization is gaining momentum in Operation Research. Resilience involves managing uncertainty when optimizing supply chains, while efficiency requires scalable combinatorial optimization algorithms, as most gains from operations research algorithms come from decreasing marginal costs. This underscores the need for scalable algorithms for combinatorial and stochastic optimization problems, whether they are dynamic, tactical, or strategic. Policies encoded as neural networks with a combinatorial optimization layer have demonstrated their efficiency in these settings when trained in a decision-focused manner. Until now, such policies have been trained using supervised learning, often based on Fenchel-Young losses, and considered as heuristics. We instead focus on risk minimization. By leveraging the connection between Fenchel-Young losses and Bregman divergences, we introduce a deep learning-compatible primal-dual algorithm and show that it converges to the optimum of the empirical risk minimization problem in some settings. This new approach has two advantages. First, numerical experiments show that it learns better policies. Second, we prove some generalization properties that provide upper bounds on the optimality gap, which hold in expectation.

Coffee Break
15:30 - 17:00Session Chair: TBD
15:30 - 16:00Akshay Gupte
Machine learning methods for finding good solutions to nonconvex quadratic and multiobjective optimization


Abstract. We consider machine learning methods for selecting optimal parameters for two classes of challenging optimization problems – nonconvex quadratically constrained quadratic programs, and multiobjective mixed integer linear optimization. For both problems we are interested in finding good quality feasible solutions. For QCQPs, we consider adaptive MILP discretizations that were recently developed but for which the parameters governing the discretization are selected and updated in some ad-hoc manner. For the second, we are interested in Pareto solutions that are spread wide apart on the efficient frontier, and can be generated through different (equivalent) scalarizations, each with its own set of parameters.

16:00 - 16:30Tamon Stephen
Hypergraph Transversal Pairs Near the Fredman-Khachiyan Bound


Abstract. The Fredman-Khachiyan algorithm generates the transversal of a hypergraph in incremental quasi-polynomial time. It is a recursive algorithm which focuses on the most frequent vertex in terms of the number of edges in either the hypergraph or its transversal. Hypergraphs where this maximum frequency is low are therefore of interest. This gives a group of related optimization questions for hypergraph-transversal pairs. Here we present some preliminary extremal results. Besides using classic combinatorial constructions, we adapt Wagner’s deep reinforcement learning scheme from graphs to hypergraphs to help find some unexpected examples.This is joint work with Parsa Salimi.

16:30 - 17:00Nir Halman
Packing Squares independently


Abstract. Given a set of squares and a strip of bounded width and infinite height, we consider a square strip packaging problem, which we call the square independent packing problem (SIPP), to minimize the strip height so that all the squares are packed into independent cells separated by horizontal and vertical partitions. For the SIPP, we first investigate efficient solution representations and propose a compact representation that reduces the search space from (O(n!)) to (O(2^n)), with (n) the number of given squares, while guaranteeing that there exists a solution representation that corresponds to an optimal solution. Based on the solution representation, we show that the problem is NP-hard. To solve the SIPP, we propose a dynamic programming method that can be extended to a fully polynomial-time approximation scheme (FPTAS). We also propose three mathematical programming formulations based on different solution representations and confirm the performance of these algorithms through computational experiments. Finally, we discuss several extensions that are relevant to practical applications.

Conference Dinner at Camphora

May 14

TimeSpeakerTalk Title (for abstract click title or see below)
09:30 - 10:00Coffee
10:00 - 12:00Session Chair: TBD
10:00 - 10:30Zhi-Quan Luo
Finite Horizon Optimization


Abstract. In practical scenarios, there is often a strict upper bound on the number of algorithm iterations that can be performed within a given time limit. This raises the question of optimal step size and hyperparameter design for a fixed iteration budget. We present recent advances in effectively addressing this highly non-convex problem for gradient descent and other algorithms. Additionally, we extend the DeepMind work on AlphaTensor and introduce new reductions in the number of operations required for computation of more general matrix expressions. This provides further acceleration of calculations in linear algebra.

10:30 - 11:00Christof SchütteTBD
11:00 - 11:30Pierre-Louis PoirionTBD
11:30 - 12:00David Bremner
Optimizing over the union of linear subspaces


Abstract. We compare some different approaches for cut generation in integer optimization when the feasible region is the intersection of a convex polytope and the (non-convex) union of low dimensional linear subspaces. This arises as a sub-problem in optimizing under symmetry, but is also of potentially more general application.

Lunch Break
13:30 - 15:00Session Chair: TBD
13:30 - 14:00Martin Skutella
Complexity of Injectivity and Verification of ReLU Neural Networks


Abstract. Neural networks with ReLU activation play a key role in modern machine learning. Understanding the functions represented by ReLU networks is a major topic in current research as this enables a better interpretability of learning processes. Injectivity plays a crucial role whenever invertibility of a neural network is necessary, such as, e.g., for inverse problems or generative models. The exact computational complexity of deciding injectivity was recently posed as an open problem (Puthawala et al. [JMLR~2022]). We answer this question by proving coNP-completeness. On the positive side, we show that the problem for a single ReLU layer is still tractable for small input dimension; more precisely, we present a parameterized algorithm which yields fixed-parameter tractability with respect to the input dimension. In addition, we study the network verification problem which is of great importance since neural networks are increasingly used in safety-critical systems. We prove that network verification is coNP-hard for a general class of input domains. Our result thus highlights that the hardness of network verification is intrinsic to the ReLU networks themselves, rather than specific input domains. In this context, we also characterize surjectivity for ReLU networks with one-dimensional output which turns out to be the complement of a basic network verification task. We reveal interesting connections to computational convexity by formulating the surjectivity problem as a zonotope containment problem. This is joint work with Vincent Froese and Moritz Grillo.

14:00 - 14:30Lionel PourninTBD
14:30 - 15:00Yuan ZhouTBD
Coffee Break
15:30 - 17:00Poster Session

May 15

TimeSpeakerTalk Title (for abstract click title or see below)
09:30 - 10:00Coffee
10:00 - 12:00Session Chair: TBD
10:00 - 10:30Swati GuptaTBD
10:30 - 11:00Bartolomeo Stellato
Data-driven analysis and design of convex optimization algorithms


Abstract. We present computational tools to analyze and design first-order methods in parametric convex optimization. First-order methods are popular for their low per-iteration cost and warm-starting capabilities. However, precisely quantifying the number of iterations needed to compute a high-quality solution remains a key challenge, especially in safety-critical applications where real-time execution is crucial. In the first part of the talk, we introduce a numerical framework for verifying the worst-case performance of first-order methods in parametric quadratic optimization. We formulate the verification problem as a mathematical optimization problem that maximizes the fixed-point residual at the last iteration, subject to constraints representing the algorithm used. Our framework can encode many proximal algorithms as variations of two primitive steps: affine steps and element-wise maximum steps. We show that the verification problem is NP-hard, and we construct both mixed-integer linear programming formulations and convex semidefinite programming relaxations. Numerical examples show a significant reduction in pessimism compared to standard worst-case convergence analyses and performance estimation techniques. In the second part of the talk, we present a data-driven approach to analyze the performance of first-order methods using statistical learning theory. We build generalization guarantees for classical optimizers, using sample convergence bounds, and for learned optimizers, using the Probably Approximately Correct (PAC)-Bayes framework. We then use this framework to learn accelerated first-order methods by minimizing the PAC-Bayes bound directly over the key parameters of the algorithms (e.g., gradient steps, and warm-starts). Several numerical experiments show that our approach provides strong generalization guarantees for both classical and learned optimizers with statistical bounds that are very close to the true out of sample performance.

11:00 - 11:30Haoxiang Yang
Learning to Generate Lagrangian Cuts


Abstract. Multistage stochastic mixed-integer programs (MS-MIP) with general mixed-integer state variables are considered computationally challenging to solve. Lagrangian cuts on lifted state spaces can adaptively approximate the nonconvex value functions. However, generating Lagrangian cuts is time consuming as it requires solving Lagrangian relaxation problems iteratively. We develop a learning-to-cut scheme within the cutting-plane algorithm to quickly obtain the dual multiplier close to optimal ones. Comparative analyses reveal that our learning-to-cut algorithm outperforms benchmark algorithms, showcasing the potential of machine learning to tackle the complexities of MS-MIP effectively.

11:30 - 12:00Andrew Lim
The exploration, exploitation robustness tradeoff


Abstract. Parameter uncertainty and model misspecification are fundamental concerns in multi-period data-driven decision making, but little is understood about the tradeoff between exploration, exploitation and robustness. To study this tradeoff, we consider a robust optimal stopping problem with Bayesian learning. We show that an ambiguity-averse decision maker spends less time exploring, even though model uncertainty is reduced by having more data, because the ``learning shock” for each posterior update increases the sensitivity of the expected profit to model misspecification. We also show that a willingness to explore can be restored by introducing a hedging instrument that offsets the learning shock.

Lunch Break
13:30 - 15:30Session Chair: TBD
13:30 - 14:00Dmitrii PasechnikTBD
14:00 - 14:30Ken Kobayashi
Learning decision trees and forests with algorithmic recourse


Abstract. This talk proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. However, such actions do not always exist for models optimized only for predictive performance. In this study, we formulate the task of learning an accurate classification tree to ensure reasonable actions exist for as many instances as possible. Then, we propose an efficient top-down greedy algorithm by leveraging the adversarial training techniques. We also show that our proposed algorithm can be applied to the random forest, which is known as a popular framework for learning tree ensembles.

14:30 - 15:00Christoph SpiegelTBD

May 16

TimeSpeakerTalk Title (for abstract click title or see below)
09:30 - 10:00Coffee
10:00 - 12:00Session Chair: TBD
10:00 - 10:30Jon LeeTBD
10:30 - 11:00Kim-Chuan Toh
A Low-Rank ALM for Doubly Nonnegative Relaxations of Mixed-Binary QP


Abstract. Doubly nonnegative (DNN) programming problems are known to be challenging to solve because of their huge number of $\Omega(n^2)$ constraints and $\Omega(n^2)$ variables. In this work, we introduce RiNNAL, a method for solving DNN relaxations of large-scale mixed-binary quadratic programs by leveraging their solutions’ possible low-rank property. RiNNAL is a globally convergent Riemannian based augmented Lagrangian method (ALM) that penalizes the nonnegative and complementarity constraints while preserving all other constraints as an algebraic variety. After applying the low-rank decomposition to the ALM subproblem, its feasible region becomes an algebraic variety with favorable geometric properties. Our low-rank decomposition model is different from the standard Burer-Monteiro (BM) decomposition model in that we make the crucial step to equivalently reformulate most of the quadratic constraints after the BM decomposition into fewer and more manageable affine constraints. This modification is also important in helping us to alleviate the violation of Slater’s condition for the primal DNN problem. Moreover, we make the crucial step to show that the metric projection onto the algebraic variety, although non-convex, can be transformed into a solvable convex optimization problem under certain regularity conditions. Numerous numerical experiments are conducted to validate the efficiency of the proposed RiNNAL method. Joint work with Di Hou and Tianyun Tang

11:00 - 11:30Yuriy Zinchenko
Can an infeasible MIP solve itself?


Abstract. The analysis of why a specific MIP instance is infeasible formally can be reduced to computing an Irreducible Infeasible Subset (IIS) of the constraints. Unlike the case of LP, for MIP there is no useful duality that can be employed to facilitate such computations. The process of determining an IIS for MIP is typically handled with brute force, e.g., by use of deletion filters and alike, thus rendering IIS determination for a MIP into a much harder computational task. We will discuss one approach to optimizing this process and what components of this approach could make it into the newest version of Gurobi.

11:30 - 12:00Michel de Lara
Geometry of Sparsity-Inducing Norms


Abstract. Sparse optimization seeks an optimal solution with few nonzero entries. To achieve this, it is common to add to the criterion a penalty term proportional to the $\ell_1$-norm, which is recognized as the archetype of sparsity-inducing norms. In this approach, the number of nonzero entries is not controlled a priori. By contrast, in this talk, we focus on finding an optimal solution with at most $k$ nonzero coordinates (in short, $k$-sparse vectors), where $k$ is a given sparsity level/budget. For this purpose, we study the class of generalized $k$-support norms, induced by a given source norm. When added as a penalty term, we provide conditions under which such generalized $k$-support norms promote $k$-sparse solutions. The result follows from an analysis of the exposed faces of closed convex sets generated by $k$-sparse vectors, and of how primal support identification can be deduced from dual information. Finally, we provide geometric properties of the unit balls for the $k$-support norms and their dual norms when the source norms belong to the $\ell_p$-norms family. joint work with Jean-Philippe Chancelier, Michel De Lara, Antoine Deza, Lionel Pournin.

Lunch Break
13:30 - 15:30Session Chair: TBD
13:30 - 14:00Ting Kei Pong
A single-loop proximal-conditional-gradient penalty method


Abstract. We consider the problem of minimizing the separable sum of two proper closed convex functions over a linear coupling constraint. We assume that one of these functions is prox-friendly, while the other function admits efficient (generalized) linear oracles. We propose a single-loop variant of the standard penalty method for this problem, where in each iteration, we successively perform one proximal-gradient step and one conditional-gradient step on the quadratic penalty function, followed by an update of the penalty parameter via an explicit formula. We derive iteration complexity bound and local convergence rate under standard constraint qualifications and Kurdyka-Łojasiewicz assumptions. We illustrate numerically the convergence behavior of our algorithm on minimizing the $L_1$ norm subject to a residual error measured by $L_p$ norm, $1 < p < 2$. This is a joint work with Liaoyuan Zeng and Hao Zhang.

14:00 - 14:30Sai Ganesh Nagarajan
Algorithms and Complexity for Two-Team Polymatrix Zero-Sum Games: Revisiting Constrained Bilinear Minmax Optimization


Abstract. Adversarial multiplayer games are an important object of study in multiagent learning. In particular, polymatrix zero-sum games are a multiplayer setting where Nash equilibria are known to be efficiently computable. Towards understanding the limits of tractability in polymatrix games, we study the computation of Nash equilibria in such games where each pair of players plays either a zero-sum or a coordination game. A setting that is particularly interesting is where players can be grouped into a small number of teams of identical interest [Von Stengel and Koller 1997]. While the three-team version of the problem is known to be PPAD-complete, the complexity for two teams has remained open [Cai and Daskalakis 2011]. In this talk, we show that the two-team version is CLS-hard and we do so by proving the hardness of finding any stationary point in the simplest type of non-convex-concave min-max constrained optimization problem, namely for a class of bilinear polynomial objective functions. Finally, we discuss some algorithms for a restricted version of this problem, namely a team with non-interacting adversaries.

14:30 - 15:00Bruno Lourenço
Faces of homogeneous convex cones and applications to homogeneous chordal PSD matrices


Abstract. A regular convex cone is homogeneous if its automorphism group acts transitively on its interior. Notably, if we have a chordal graph G that is “4-path free”, then the cone of PSD matrices with sparsity pattern given by G is homogeneous. In this case, we say that G is homogeneous chordal. In this talk, we discuss some recent results on the faces of homogeneous convex cones and discuss its applications to PSD matrices where the sparsity pattern is given by a homogeneous chordal graph.

Abstracts

Surrogate mixed integer nonlinear models by means of regression splines
Claudia D'Ambrosio

Complex phenomena can be accurately described by means of data-driven mathematical models. However, being able to integrate these models within a mathematical optimization framework can be, in general, very challenging. In fact, many of these data-driven models are `black-box’, in the sense that they do not have an explicit mathematical formula which describes them. In other cases, even if an explicit expression exists, including it into a mathematical optimization model may make solving the problem computationally intractable. We propose to use a special kind of surrogate models, regression splines, to deal with functions of this kind which appear in Mixed Integer Nonlinear Programming (MINLP) problems. The choice of spline functions is not arbitrary. On one hand, they offer a good compromise between accuracy and complexity. On the other hand, their functional form allows us to exploit separability and approximate general non-convex MINLPs by a more tractable subclass of problems.

Convergence Analysis of Two Stochastic Algorithms
Chenglong Bao

In this talk, we will explore convergence analysis for two stochastic optimization algorithms. First, we establish a tight convergence rate for the stochastic proximal point algorithm in solving stochastic composite optimization problems. Second, we present global convergence results for the stochastic gradient descent algorithm with adaptive noise injection, tailored for minimizing nearly convex functions.

Game theory models and algorithms for trading demands
Mourad Baïou

We introduce a new cooperative game theory model that we call production-distribution game. It models efficient sharing principles for practical collaboration in transportation. The originality of our model lies in the fact that the value/strength of a player does not only depend on the individual cost or benefit of the goods she owns but also on her market shares (customers demand). We prove that we can compute the nucleolus efficiently, in a nontrivial, interesting special case. We provide two algorithms to compute the nucleolus: a simple separation algorithm and a fast primal-dual one. We also show that our results can be used to tackle more general versions of the problem.

Optimizing over the union of linear subspaces
David Bremner

We compare some different approaches for cut generation in integer optimization when the feasible region is the intersection of a convex polytope and the (non-convex) union of low dimensional linear subspaces. This arises as a sub-problem in optimizing under symmetry, but is also of potentially more general application.

A Low-Rank ALM for Doubly Nonnegative Relaxations of Mixed-Binary QP
Kim-Chuan Toh

Doubly nonnegative (DNN) programming problems are known to be challenging to solve because of their huge number of $\Omega(n^2)$ constraints and $\Omega(n^2)$ variables. In this work, we introduce RiNNAL, a method for solving DNN relaxations of large-scale mixed-binary quadratic programs by leveraging their solutions’ possible low-rank property. RiNNAL is a globally convergent Riemannian based augmented Lagrangian method (ALM) that penalizes the nonnegative and complementarity constraints while preserving all other constraints as an algebraic variety. After applying the low-rank decomposition to the ALM subproblem, its feasible region becomes an algebraic variety with favorable geometric properties. Our low-rank decomposition model is different from the standard Burer-Monteiro (BM) decomposition model in that we make the crucial step to equivalently reformulate most of the quadratic constraints after the BM decomposition into fewer and more manageable affine constraints. This modification is also important in helping us to alleviate the violation of Slater’s condition for the primal DNN problem. Moreover, we make the crucial step to show that the metric projection onto the algebraic variety, although non-convex, can be transformed into a solvable convex optimization problem under certain regularity conditions. Numerous numerical experiments are conducted to validate the efficiency of the proposed RiNNAL method. Joint work with Di Hou and Tianyun Tang

Geometry of Sparsity-Inducing Norms
Michel de Lara

Sparse optimization seeks an optimal solution with few nonzero entries. To achieve this, it is common to add to the criterion a penalty term proportional to the $\ell_1$-norm, which is recognized as the archetype of sparsity-inducing norms. In this approach, the number of nonzero entries is not controlled a priori. By contrast, in this talk, we focus on finding an optimal solution with at most $k$ nonzero coordinates (in short, $k$-sparse vectors), where $k$ is a given sparsity level/budget. For this purpose, we study the class of generalized $k$-support norms, induced by a given source norm. When added as a penalty term, we provide conditions under which such generalized $k$-support norms promote $k$-sparse solutions. The result follows from an analysis of the exposed faces of closed convex sets generated by $k$-sparse vectors, and of how primal support identification can be deduced from dual information. Finally, we provide geometric properties of the unit balls for the $k$-support norms and their dual norms when the source norms belong to the $\ell_p$-norms family. joint work with Jean-Philippe Chancelier, Michel De Lara, Antoine Deza, Lionel Pournin.

A Linearly Convergent Frank-Wolfe-type Method for Smooth Convex Minimization over the Spectrahedron
Dan Garber

We consider the problem of minimizing a smooth and convex function over the n-dimensional spectrahedron — the set of real symmetric n×n positive semidefinite matrices with unit trace, which underlies numerous applications in statistics, machine learning and additional domains. Standard first-order methods often require high-rank matrix computations which are prohibitive when the dimension n is large. The well-known Frank-Wolfe method on the other hand, only requires efficient rank-one matrix computations, however suffers from worst-case slow convergence, even under conditions that enable linear convergence rates for standard methods. In this work we present the first Frank-Wolfe-based algorithm that only applies efficient rank-one matrix computations and, assuming quadratic growth and strict complementarity conditions, is guaranteed, after a finite number of iterations, to converges linearly, in expectation, and independently of the ambient dimension.

Machine learning methods for finding good solutions to nonconvex quadratic and multiobjective optimization
Akshay Gupte

We consider machine learning methods for selecting optimal parameters for two classes of challenging optimization problems – nonconvex quadratically constrained quadratic programs, and multiobjective mixed integer linear optimization. For both problems we are interested in finding good quality feasible solutions. For QCQPs, we consider adaptive MILP discretizations that were recently developed but for which the parameters governing the discretization are selected and updated in some ad-hoc manner. For the second, we are interested in Pareto solutions that are spread wide apart on the efficient frontier, and can be generated through different (equivalent) scalarizations, each with its own set of parameters.

Packing Squares independently
Nir Halman

Given a set of squares and a strip of bounded width and infinite height, we consider a square strip packaging problem, which we call the square independent packing problem (SIPP), to minimize the strip height so that all the squares are packed into independent cells separated by horizontal and vertical partitions. For the SIPP, we first investigate efficient solution representations and propose a compact representation that reduces the search space from \(O(n!)\) to \(O(2^n)\), with \(n\) the number of given squares, while guaranteeing that there exists a solution representation that corresponds to an optimal solution. Based on the solution representation, we show that the problem is NP-hard. To solve the SIPP, we propose a dynamic programming method that can be extended to a fully polynomial-time approximation scheme (FPTAS). We also propose three mathematical programming formulations based on different solution representations and confirm the performance of these algorithms through computational experiments. Finally, we discuss several extensions that are relevant to practical applications.

Hypersimplices, tropical geometry and finite metric spaces
Michael Joswig

The hypersimplex \(\Delta(k,n)\) is the convex polytope which is the set of those points in the unit cube \([0,1]^n\) which satisfy the affine linear equation \(\sum x_i=k\). We report on known and new results concerning the secondary fan \(\Sigma(k, n)\), which stratifies the regular subdivisions of the \(\Delta(k,n)\). For tropical geometry this is relevant because \(\Sigma(k,n)\) contains the tropical Grassmannian \({\rm TGr}(k,n)\) as a subset; we discuss that relationship. Furthermore, in the special case \(k = 2\) the fan \(\Sigma(2,n)\) is closely related to the metric fan \({\rm MF}(n)\), which forms a natural parameter space for the metric spaces on \(n\) points. Our analysis of the fans \(\Sigma(2,n)\) improves known results of Bandelt and Dress concerning structural properties of finite metric spaces, with applications to phylogenetics. The new results are joint work with Laura Casabella and Lars Kastner.

Learning decision trees and forests with algorithmic recourse
Ken Kobayashi

This talk proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. However, such actions do not always exist for models optimized only for predictive performance. In this study, we formulate the task of learning an accurate classification tree to ensure reasonable actions exist for as many instances as possible. Then, we propose an efficient top-down greedy algorithm by leveraging the adversarial training techniques. We also show that our proposed algorithm can be applied to the random forest, which is known as a popular framework for learning tree ensembles.

Finite Horizon Optimization
Zhi-Quan Luo

In practical scenarios, there is often a strict upper bound on the number of algorithm iterations that can be performed within a given time limit. This raises the question of optimal step size and hyperparameter design for a fixed iteration budget. We present recent advances in effectively addressing this highly non-convex problem for gradient descent and other algorithms. Additionally, we extend the DeepMind work on AlphaTensor and introduce new reductions in the number of operations required for computation of more general matrix expressions. This provides further acceleration of calculations in linear algebra.

Faces of homogeneous convex cones and applications to homogeneous chordal PSD matrices
Bruno Lourenço

A regular convex cone is homogeneous if its automorphism group acts transitively on its interior. Notably, if we have a chordal graph G that is “4-path free”, then the cone of PSD matrices with sparsity pattern given by G is homogeneous. In this case, we say that G is homogeneous chordal. In this talk, we discuss some recent results on the faces of homogeneous convex cones and discuss its applications to PSD matrices where the sparsity pattern is given by a homogeneous chordal graph.

Convergence and Trade-Offs in Riemannian Gradient Descent and Proximal Point
David Martínez-Rubio

Convergence analyses of the two most fundamental algorithms for geodesically convex optimization were incomplete until our work. We design techniques and develop new full analyses that reveal some trade-off between geometric, iterate boundedness and computational efficiency. As a consequence, we can also obtain minmax methods for geodesically convex concave without knowing the initial distance to a saddle point for the first time.

The Lemke—Howson algorithm for oriented matroids, and applications
Frédéric Meunier

Lemke and Howson proposed in 1964 a pivot-based algorithm for solving bimatrix games. The year after, Lemke proposed a closely related algorithm for solving a large class of linear complementarity problems. Although Todd extended the Lemke algorithm to oriented matroids in 1984, the extension of the Lemke—Howson algorithm to oriented matroids has not yet been addressed in the literature. Actually, achieving such an extension requires solving a few issues. One of them is to propose a relevant definition of Nash equilibria for oriented matroids. Another one is more challenging, namely finding the counterpart of the following preliminary operation of the classical Lemke—Howson algorithm: the translation of the payoff matrices so as to make them positive. In this work, we show how to overcome these challenges and extend the Lemke—Howson algorithm to oriented matroids. A few applications are discussed, such as a strengthening of a theorem by McLennan and Tourky, a combinatorial interpretation of the orientation of the pivot step in the classical version, and the localization of tropical bimatrix games in the class PPAD. Joint work with Xavier Allamigeon and Stéphane Gaubert.

Algorithms and Complexity for Two-Team Polymatrix Zero-Sum Games: Revisiting Constrained Bilinear Minmax Optimization
Sai Ganesh Nagarajan

Adversarial multiplayer games are an important object of study in multiagent learning. In particular, polymatrix zero-sum games are a multiplayer setting where Nash equilibria are known to be efficiently computable. Towards understanding the limits of tractability in polymatrix games, we study the computation of Nash equilibria in such games where each pair of players plays either a zero-sum or a coordination game. A setting that is particularly interesting is where players can be grouped into a small number of teams of identical interest [Von Stengel and Koller 1997]. While the three-team version of the problem is known to be PPAD-complete, the complexity for two teams has remained open [Cai and Daskalakis 2011]. In this talk, we show that the two-team version is CLS-hard and we do so by proving the hardness of finding any stationary point in the simplest type of non-convex-concave min-max constrained optimization problem, namely for a class of bilinear polynomial objective functions. Finally, we discuss some algorithms for a restricted version of this problem, namely a team with non-interacting adversaries.

Real-time solutions for a class of a class of mixed-integer optimization problems
Andrés Gómez

We consider mixed-integer quadratic optimization problems with banded matrices and indicator variables. These problems arise pervasively in statistical inference problems with time-series data, where the banded matrix captures the temporal relationship of the underlying process. In particular, the problem studied arises in monitoring problems, where the decision-maker wants to detect changes or anomalies. We propose to solve these problems using decision diagrams. In particular we show how to exploit the temporal dependencies to construct diagrams with size polynomial in the number of decision variables. We also describe how to construct the convex hull of the set under study from the decision diagrams, and how to deploy the method online to solve the problems in milliseconds via a shortest path algorithm.

The exploration, exploitation robustness tradeoff
Andrew Lim

Parameter uncertainty and model misspecification are fundamental concerns in multi-period data-driven decision making, but little is understood about the tradeoff between exploration, exploitation and robustness. To study this tradeoff, we consider a robust optimal stopping problem with Bayesian learning. We show that an ambiguity-averse decision maker spends less time exploring, even though model uncertainty is reduced by having more data, because the ``learning shock” for each posterior update increases the sensitivity of the expected profit to model misspecification. We also show that a willingness to explore can be restored by introducing a hedging instrument that offsets the learning shock.

Circuit and Graver Walks and Linear and Integer Programming
Shmuel Onn

We show that a circuit walk from a given feasible point of a given linear program to an optimal point can be computed in polynomial time using only linear algebra operations and the solution of the single given linear program. We also show that a Graver walk from a given feasible point of a given integer program to an optimal point is polynomial time computable using an integer programming oracle, but without such an oracle, it is hard to compute such a walk even if an optimal solution to the given program is given as well. Combining our oracle algorithm with recent results on sparse integer programming, we also show that Graver walks from any point are polynomial time computable over matrices of bounded tree-depth and subdeterminants.

Primal Dual Algorithm for Empirical Risk Minimization in Combinatorial Optimization Augmented Learning
Axel Parmentier

As industry seeks to make its processes more resilient, data driven optimization is gaining momentum in Operation Research. Resilience involves managing uncertainty when optimizing supply chains, while efficiency requires scalable combinatorial optimization algorithms, as most gains from operations research algorithms come from decreasing marginal costs. This underscores the need for scalable algorithms for combinatorial and stochastic optimization problems, whether they are dynamic, tactical, or strategic. Policies encoded as neural networks with a combinatorial optimization layer have demonstrated their efficiency in these settings when trained in a decision-focused manner. Until now, such policies have been trained using supervised learning, often based on Fenchel-Young losses, and considered as heuristics. We instead focus on risk minimization. By leveraging the connection between Fenchel-Young losses and Bregman divergences, we introduce a deep learning-compatible primal-dual algorithm and show that it converges to the optimum of the empirical risk minimization problem in some settings. This new approach has two advantages. First, numerical experiments show that it learns better policies. Second, we prove some generalization properties that provide upper bounds on the optimality gap, which hold in expectation.

A single-loop proximal-conditional-gradient penalty method
Ting Kei Pong

We consider the problem of minimizing the separable sum of two proper closed convex functions over a linear coupling constraint. We assume that one of these functions is prox-friendly, while the other function admits efficient (generalized) linear oracles. We propose a single-loop variant of the standard penalty method for this problem, where in each iteration, we successively perform one proximal-gradient step and one conditional-gradient step on the quadratic penalty function, followed by an update of the penalty parameter via an explicit formula. We derive iteration complexity bound and local convergence rate under standard constraint qualifications and Kurdyka-Łojasiewicz assumptions. We illustrate numerically the convergence behavior of our algorithm on minimizing the $L_1$ norm subject to a residual error measured by $L_p$ norm, $1 < p < 2$. This is a joint work with Liaoyuan Zeng and Hao Zhang.

Implicit Online Riemannian Optimization
Christophe Roux
Complexity of Injectivity and Verification of ReLU Neural Networks
Martin Skutella

Neural networks with ReLU activation play a key role in modern machine learning. Understanding the functions represented by ReLU networks is a major topic in current research as this enables a better interpretability of learning processes. Injectivity plays a crucial role whenever invertibility of a neural network is necessary, such as, e.g., for inverse problems or generative models. The exact computational complexity of deciding injectivity was recently posed as an open problem (Puthawala et al. [JMLR~2022]). We answer this question by proving coNP-completeness. On the positive side, we show that the problem for a single ReLU layer is still tractable for small input dimension; more precisely, we present a parameterized algorithm which yields fixed-parameter tractability with respect to the input dimension. In addition, we study the network verification problem which is of great importance since neural networks are increasingly used in safety-critical systems. We prove that network verification is coNP-hard for a general class of input domains. Our result thus highlights that the hardness of network verification is intrinsic to the ReLU networks themselves, rather than specific input domains. In this context, we also characterize surjectivity for ReLU networks with one-dimensional output which turns out to be the complement of a basic network verification task. We reveal interesting connections to computational convexity by formulating the surjectivity problem as a zonotope containment problem. This is joint work with Vincent Froese and Moritz Grillo.

Data-driven analysis and design of convex optimization algorithms
Bartolomeo Stellato

We present computational tools to analyze and design first-order methods in parametric convex optimization. First-order methods are popular for their low per-iteration cost and warm-starting capabilities. However, precisely quantifying the number of iterations needed to compute a high-quality solution remains a key challenge, especially in safety-critical applications where real-time execution is crucial. In the first part of the talk, we introduce a numerical framework for verifying the worst-case performance of first-order methods in parametric quadratic optimization. We formulate the verification problem as a mathematical optimization problem that maximizes the fixed-point residual at the last iteration, subject to constraints representing the algorithm used. Our framework can encode many proximal algorithms as variations of two primitive steps: affine steps and element-wise maximum steps. We show that the verification problem is NP-hard, and we construct both mixed-integer linear programming formulations and convex semidefinite programming relaxations. Numerical examples show a significant reduction in pessimism compared to standard worst-case convergence analyses and performance estimation techniques. In the second part of the talk, we present a data-driven approach to analyze the performance of first-order methods using statistical learning theory. We build generalization guarantees for classical optimizers, using sample convergence bounds, and for learned optimizers, using the Probably Approximately Correct (PAC)-Bayes framework. We then use this framework to learn accelerated first-order methods by minimizing the PAC-Bayes bound directly over the key parameters of the algorithms (e.g., gradient steps, and warm-starts). Several numerical experiments show that our approach provides strong generalization guarantees for both classical and learned optimizers with statistical bounds that are very close to the true out of sample performance.

Hypergraph Transversal Pairs Near the Fredman-Khachiyan Bound
Tamon Stephen

The Fredman-Khachiyan algorithm generates the transversal of a hypergraph in incremental quasi-polynomial time. It is a recursive algorithm which focuses on the most frequent vertex in terms of the number of edges in either the hypergraph or its transversal. Hypergraphs where this maximum frequency is low are therefore of interest. This gives a group of related optimization questions for hypergraph-transversal pairs. Here we present some preliminary extremal results. Besides using classic combinatorial constructions, we adapt Wagner’s deep reinforcement learning scheme from graphs to hypergraphs to help find some unexpected examples.This is joint work with Parsa Salimi.

The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses
Berkant Turan

We explore and formalize backdoor-based watermarks and adversarial defenses, framing them as interactive protocols between two parties. Our findings reveal that for almost any learning task, at least one of the two schemes is feasible. We also uncover a third, essential scheme — transferable attacks — that allows attackers to craft queries that are indistinguishable from the true data distribution, but successfully attack any efficient defender. Using a cryptographic construction with homomorphic encryption, we demonstrate the necessity of transferable attacks and establish their connection to fundamental cryptographic principles.

Variable selection for kernel two-sample tests.
Yao Xie

We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to distinguish samples from two groups. To solve this problem, we propose a framework based on the kernel maximum mean discrepancy (MMD). Our approach seeks a group of variables with a pre-specified size that maximizes the variance-regularized MMD statistics. This formulation also corresponds to the minimization of asymptotic type-II error while controlling type-I error, as studied in the literature. We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees for different choices of kernel functions. Furthermore, we provide a statistical testing power analysis of our proposed framework. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach. This is a joint work with Jie Wang, and Santanu Day at Georgia Tech.

Learning to Generate Lagrangian Cuts
Haoxiang Yang

Multistage stochastic mixed-integer programs (MS-MIP) with general mixed-integer state variables are considered computationally challenging to solve. Lagrangian cuts on lifted state spaces can adaptively approximate the nonconvex value functions. However, generating Lagrangian cuts is time consuming as it requires solving Lagrangian relaxation problems iteratively. We develop a learning-to-cut scheme within the cutting-plane algorithm to quickly obtain the dual multiplier close to optimal ones. Comparative analyses reveal that our learning-to-cut algorithm outperforms benchmark algorithms, showcasing the potential of machine learning to tackle the complexities of MS-MIP effectively.

Can an infeasible MIP solve itself?
Yuriy Zinchenko

The analysis of why a specific MIP instance is infeasible formally can be reduced to computing an Irreducible Infeasible Subset (IIS) of the constraints. Unlike the case of LP, for MIP there is no useful duality that can be employed to facilitate such computations. The process of determining an IIS for MIP is typically handled with brute force, e.g., by use of deletion filters and alike, thus rendering IIS determination for a MIP into a much harder computational task. We will discuss one approach to optimizing this process and what components of this approach could make it into the newest version of Gurobi.