Fifth Conference on
Discrete Optimization and Machine Learning
Fall 2023 x Tokyo, Japan
Coordinates
Dates: 8th  9th August, 2023
Place: National Graduate Institute for Policy Studies (GRIPS), Tokyo, Japan
Organizers: Antoine Deza, Sebastian Pokutta, Takashi Tsuchiya
Venue: Both days will be in Lecture Room M
The campus can be conveniently accessed by public transportation.
Conference Dinner: ようざん (Yōzan)
Tokyo, Minato City, Roppongi, 7 Chome−6−5 地下1階DISSYビル
[map]
Confirmed Participants
 Pierre Ablin
 Salvador Abreu
 JeanFrançois Baffier
 JeanPhilippe Chancelier
 Marco Cuturi
 Antoine Deza
 Daniel Deza
 Arnaud Doucet
 Katsuki Fujisawa
 Dan Garber
 Christoph Graczyk
 Hiroshi Hirai
 Gabriele Iommazzo
 Satoru Iwata
 Ken Kobayashi
 Michel de Lara
 David MartinezRubio
 Kazuhide Nakata
 Frank Nielsen
 Keiyu Nosaka
 PierreLouis Poirion
 Sebastian Pokutta
 Michael Sander
 Akiyoshi Shioura
 Christoph Spiegel
 Masashi Sugiyama
 Akiko Takeda
 Takashi Tsuchiya
 Elias Wirth
 Akiko Yoshise
Conference Photo
Program
August 8
Time  Speaker  Talk Title (for abstract click title or see below) 

09:30  Registration/Coffee  
10:00  Session Chair: Takashi Tsuchiya  
10:00  Akiko Takeda 
Bi/trilevel Optimization Approach for Hyperparameter Tuning

10:30  Pierre Ablin 
Bilevel optimization for Machine Learning

11:00  Gabriele Iommazzo 
Cyclebased formulations in Distance Geometry

11:30  Dan Garber 
Efficiency of Gradient Methods for LowRank Recovery under Strict Complementarity

Lunch Break  
13:30  Session Chair: Akiko Takeda  
13:30  Marco Cuturi 
Sparse Monge Maps

14:00  Arnaud Doucet 
Diffusion Schrödinger Bridge Matching

14:30  David MartinezRubio 
Accelerated and Sparse Algorithms for Approximate Personalized PageRank

Coffee Break  
15:30  Session Chair: Sebastian Pokutta  
15:30  Akiko Yoshise 
Creating Collaborative Data Representations Using Matrix Manifold Optimization

16:00  Elias Wirth 
Pivoting FrankWolfe algorithm

16:30  Hiroshi Hirai 
Finding Hall blockers by matrix scaling

17:00  JeanPhilippe Chancelier 
Topological Conditional Separation

Dinner 
August 9
Time  Speaker  Talk Title (for abstract click title or see below) 

10:00  Session Chair: Masashi Sugiyama  
10:00  Satoru Iwata 
Discrete Optimization and Machine Learning for Design and Discovery of Chemical Reactions

10:30  PierreLouis Poirion 
Random subspace methods for unconstrained nonconvex optimization

11:00  Akiyoshi Shioura 
Characterization and Algorithm for Bivariate MultiUnit Assignment Valuations

11:30  Christoph Spiegel 
Flag Algebras in Additive Combinatorics

Lunch Break  
13:30  Session Chair: Satoru Iwata  
13:30  Masashi Sugiyama 
ImportanceWeighting Approach to Distribution Shift Adaptation

14:00  Salvador Abreu 
Hybrid ConstraintBased Metaheuristics

14:30  Katsuki Fujisawa 
The challenge to Graph500 benchmark: history and results

Coffee Break  
15:30  Session Chair: Dan Garber  
15:30  Michel de Lara 
Hidden Convexity in the $l_0$ Pseudonorm

16:00  Takashi Tsuchiya 
Closing Nonzero Duality Gaps of Singular Semidefinite Programs by Way of Perturbation

16:30  Sebastian Pokutta 
Improved local models and new Bell inequalities via FrankWolfe algorithms

17:00  Antoine Deza 
Kissing Polytopes

Abstracts
Pierre Ablin
Bilevel optimization, the problem of minimizing a value function that involves the argminimum of another function, appears in many areas of machine learning of practical importance, like hyperparameter search or implicit deep learning. This talk aims to describe machine learning problems in which bilevel optimization naturally appears, explain the differences with classical singlelevel optimization, and give an overview of recent algorithmic advances to solve the problem at scale.
Salvador Abreu
We discuss an architecture for constructing problemspecific solvers using multiple hybrid metaheuristics. (to be completed)
JeanPhilippe Chancelier
Pearl’s dseparation is a foundational notion to study conditional independence between random variables. We define the topological conditional separation and we show that it is equivalent to the dseparation, extended beyond acyclic graphs, be they finite or infinite. Joint work with Michel De Lara and Benjamin Heymann.
Marco Cuturi
We propose a new formulation of the Monge problem that results, by design, in sparse transport maps in $R^d$, i.e. such that $T(x)  x$ is a sparse vector in $R^d$.
Antoine Deza
We investigate the following question: how close can two disjoint lattice polytopes contained in a fixed hypercube be? This question stems from various contexts where the minimal distance between such polytopes appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. We also give similar bounds in the case of disjoint rational polytopes whose binary encoding length is prescribed. Joint work with Shmuel Onn, Sebastian Pokutta, and Lionel Pournin.
Arnaud Doucet
Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation or an Ordinary Differential Equation. However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schrödinger bridges (SBs) compute stochastic dynamic mappings which recover entropyregularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schrödinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.
Katsuki Fujisawa
Graph500 is a benchmark for shortest path search performance on very large graphs, requiring the integration of many technologies from algorithms to HPC. Our research team started working on this benchmark in 2011 and has been ranked number one worldwide for 17 terms from 2014 to 2023 using supercomputers such as Fugaku. The latest results show that we achieved a performance of 137 Tera TEPS (Traversed Edges Per Second) using 93% of all nodes on the supercomputer Fugaku, with a supergiant graph of 4.4 trillion nodes and 70.4 trillion edges. In this talk, I explain our efforts and results to Graph500.
Dan Garber
Low rank matrix or tensor recovery is a fundamental task in machine learning and statistics with many important applications. While convex relaxations for these tasks are often well understood and effective from a statistical point of view, they are considered not practical for highdimensional instances since large highrank matrices might be needed to store and factorized, which is computationallyprohibitve. In this talk I will challenge this common belief, by surveying a few results that indicate that under a classical condition known as strict complementarity, gradient methods, at least at the proximity of optimal solutions, require to store and manipulate only lowrank matrices/tensors which is computationallyefficient. Moreover, strict complementarity often implies linear convergence rates for such methods, even when the objective is not strongly convex. I will motivate the strict complementarity assumption both from theoretical and empirical perspectives
Hiroshi Hirai
For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $D_1AD_2$ for some positive diagonal matrices $D_1,D_2$. The Sinkhorn algorithm is a simple iterative algorithm, which repeats rownormalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and columnnormalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively. By this algorithm, $A$ converges to a doubly stochastic matrix in limit if and only if the bipartite graph associated with $A$ has a perfect matching. This property can decide the existence of a perfect matching in a given bipartite graph $G$, which is identified with the $0,1$matrix $A_G$. Linial, Samorodnitsky, and Wigderson showed that $O(n^2 \log n)$ iterations for $A_G$ decide whether $G$ has a perfect matching. Here $n$ is the number of vertices in one of the color classes of $G$. In this paper, we show an extension of this result: If $G$ has no perfect matching, then a polynomial number of the Sinkhorn iterations identifies a Hall blocker—a vertex subset $X$ having neighbors $\Gamma(X)$ with $X > \Gamma(X)$, which is a certificate of the nonexistence of a perfect matching. Specifically, we show that $O(n^2 \log n)$ iterations can identify one Hall blocker, and that further polynomial iterations can also identify all parametric Hall blockers $X$ of maximizing $(1\lambda) X  \lambda \Gamma(X)$ for $\lambda \in [0,1]$. The former result is based on an interpretation of the Sinkhorn algorithm as alternating minimization for geometric programming. The latter is on an interpretation as alternating minimization for KLdivergence (Csiszár and Tusnády 1984, Gietl and Reffel 2013) and its limiting behavior for a nonscalable matrix (Aas 2014). We also relate the Sinkhorn limit with parametric network flow, principal partition of polymatroids, and the DulmageMendelsohn decomposition of a bipartite graph. Joint work with Koyo Hayashi and Keiya Sakabe.
Gabriele Iommazzo
The distance geometry problem asks to find a realization of a given simple edgeweighted graph in a Euclidean space of given dimension $K$, where the edges are realized as straight segments of lengths equal (or as close as possible) to the edge weights. The problem is often modelled as a mathematical programming formulation involving decision variables that determine the position of the vertices in the given Euclidean space. Solution algorithms are generally constructed using local or global nonlinear optimization techniques. We present a new modelling technique for this problem where, instead of deciding vertex positions, the formulations decide the length of the segments representing the edges in each cycle in the graph, projected in every dimension. We compare computational results from protein conformation instances obtained using the new cyclebased formulation and the existing edgebased formulation. While edgebased formulations take less time to reach termination, cyclebased formulations are generally better on solution quality measures.
Satoru Iwata
Recent developments of quantum calculation enable us to explore the potential energy surface to extract chemical reaction
networks. Our current research project AICReDD aims at utilizing this advancement to design and/or discover new chemical
reactions. This naturally requires methods in discrete optimization and machine learning. One important example is how to
select diverse molecules from unexplored areas of chemical space. We have so far developed a new approach for selecting a
subset of diverse molecules from a given molecular list by combining two existing techniques in machine learning and discrete
optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverseselection framework
called submodular function maximization. This method, called SubMoGNN, first trains a GNN with property prediction tasks, and
then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of
molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity
of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done
efficiently by using the greedy algorithm which finds a subset with a guaranteed approximation ratio to the function value.
Maximization of this class of submodular functions has been investigated for the maximum a posteriori (MAP) inference of
determinantal point processes (DPPs), which is crucial for selecting diverse items in many machine learning applications.
Although DPP MAP inference is NPhard, the greedy algorithm often finds highquality solutions, and many researchers have
studied its efficient implementation. Combining a recently developed “fast greedy” algorithm for this problem with a classical
technique called “lazy greedy” for general submodular function maximization, we have devised the currently fastest method in
practice.
Michel de Lara
The socalled $l_0$ pseudonorm counts the number of nonzero components of a vector. In this talk, we review a series of recent results on a class of Capra (Constant Along Primal Rays) conjugacies (and polarities) that reveal hidden convexity in the $l_0$ pseudonorm.
David MartinezRubio
It has recently been shown that ISTA, an unaccelerated optimization firstorder method, presents sparse updates for the $\ell_1$regularized personalized PageRank problem, leading to cheap iteration complexity. In this talk I’ll explain our work on accelerated optimization algorithms for this problem that also perform sparse updates leading to faster convergence for certain parameter regimes. Further, we design a conjugate directions algorithm that achieves an exact solution while exploiting sparsity. Our findings apply beyond PageRank and work for any quadratic objective whose Hessian is a positivedefinite $M$matrix.
PierreLouis Poirion
In this talk, we present a randomized subspace regularized Newton method for a nonconvex function. We show that our method has global convergence under appropriate assumptions, and its convergence rate is the same as that of the full regularized Newton method. Furthermore, we can obtain a local linear convergence rate, under some additional assumptions.
Sebastian Pokutta
In Bell scenarios with two outcomes per party, we algorithmically consider the two sides of the membership problem
for the local polytope: constructing local models and deriving separating hyperplanes, that is, Bell inequalities.
We take advantage of the recent developments in socalled Frank–Wolfe algorithms to significantly increase
the convergence rate of existing methods. As an application, we study the threshold value for the nonlocality
of twoqubit Werner states under projective measurements. Here, we improve on both the upper and lower bounds present
in the literature. Importantly, our bounds are entirely analytical; moreover, they yield refined bounds on the value
of the Grothendieck constant of order three: $1.4367 \leq K_G(3) \leq 1.4546$. We also demonstrate the efficiency
of our approach in multipartite Bell scenarios, and present the first local models for all projective measurements with
visibilities noticeably higher than the entanglement threshold. We make our entire code accessible as a Julia library
called BellPolytopes.jl
.
Akiyoshi Shioura
A multiunit assignment valuation is a function represented by a weighted bipartite graph. In this paper, we provide a characterization of such a function in terms of maximizer sets of perturbed functions. We then present an algorithm that checks whether a given bivariate function is a multiunit assignment valuation, and if the answer is yes, computes a weighted bipartite graph representing the function.
Christoph Spiegel
We explore the Ramsey multiplicity problem for additive structures by extending Razborov’s flag algebra calculus to additive structures in finite field vector spaces.
Masashi Sugiyama
Standard machine learning methods suffer distribution shifts between training and test data. In this talk, I will first give an overview of the classical importance weighting approach to distribution shift adaptation, which consists of an importance estimation step and an importanceweighted training step. Then, I will present a more recent approach that simultaneously estimates the importance weight and trains a predictor. Finally, I will discuss a more challenging scenario of continuous distribution shifts, where the data distributions change continuously over time.
Akiko Takeda
Various machine learning (ML) models are equipped with parameters that need to be prefixed, and such parameters are often called hyperparameters. Needless to say, the prediction performance of ML models significantly depends on the choice of hyperparameters. In this talk, we will show several application examples where bi/trilevel optimization approach is used for hyperparameter selection. One of them is a trilevel hyperparameter learning model in the adversarial setting and we will present its solution method and experimental results.
Takashi Tsuchiya
One of the major difficulties in semidefinite programming is the existence of nonzero duality gaps where strong duality fails. In this talk, we conduct a perturbation analysis of such singular semidefinite programs and discuss hidden continuous structure behind nonzero duality gaps of semidefinite programming. We also discuss its implications to design of SDP algorithms. This is a joint work with Bruno Lourenco, Masakazu Muramatsu and Takayuki Okuno.
Elias Wirth
The FrankWolfe algorithm (FW) is an iterative process for solving constrained convex optimization problems. FW variants possess a range of attractive features, including affineinvariance, easeofimplementation, projectionfree operations, and the representation of the current iterate as a sparse convex combination of vertices of the feasible region. This study focuses on the latter characteristic and presents the cardinalityguaranteed metaalgorithm (CGM), which enhances existing FW variants by ensuring that the iterates consist of convex combinations of at most $n + 1$ vertices, where $n$ denotes the dimension of the ambient space. CGM provides bounds on the size of the active set while preserving the beneficial properties of FW variants, including convergence rate guarantees.
Akiko Yoshise
The tradeoff between performance and privacy is a pain in the neck for centralized machine learning methods. FedDRFilter and Data Collaboration Analysis (DCA) can overcome this diffi culty through Collaborative Data Representation (CDR). We propose an alternative algorithm for CDR creation, utilizing matrix manifold optimization. We devise machine learning models in the DCA setting to evaluate algorithms. The results show that our algorithm outperforms the stateoftheart approach in mean recognition performance within acceptable computation time.
Previous Editions
[ DO x ML 2018  DO x ML 2019  DO x ML 2020  DO x ML 2021 ]