Fifth Conference on
Discrete Optimization and Machine Learning
Fall 2023 x Tokyo, Japan
Coordinates
Dates: 8th - 9th August, 2023
Place: National Graduate Institute for Policy Studies (GRIPS), Tokyo, Japan
Organizers: Antoine Deza, Sebastian Pokutta, Takashi Tsuchiya
Venue: Both days will be in Lecture Room M
The campus can be conveniently accessed by public transportation.
Conference Dinner: ようざん (Yōzan)
Tokyo, Minato City, Roppongi, 7 Chome−6−5 地下1階DISSYビル
[map]
Confirmed Participants
- Pierre Ablin
- Salvador Abreu
- Jean-François Baffier
- Jean-Philippe Chancelier
- Marco Cuturi
- Antoine Deza
- Daniel Deza
- Arnaud Doucet
- Katsuki Fujisawa
- Dan Garber
- Christoph Graczyk
- Hiroshi Hirai
- Gabriele Iommazzo
- Satoru Iwata
- Ken Kobayashi
- Michel de Lara
- David Martinez-Rubio
- Kazuhide Nakata
- Frank Nielsen
- Keiyu Nosaka
- Pierre-Louis Poirion
- Sebastian Pokutta
- Michael Sander
- Akiyoshi Shioura
- Christoph Spiegel
- Masashi Sugiyama
- Akiko Takeda
- Takashi Tsuchiya
- Elias Wirth
- Akiko Yoshise
Conference Photo
Program
August 8
Time | Speaker | Talk Title (for abstract click title or see below) |
---|---|---|
09:30- | Registration/Coffee | |
10:00- | Session Chair: Takashi Tsuchiya | |
10:00- | Akiko Takeda |
Bi/trilevel Optimization Approach for Hyperparameter Tuning
|
10:30- | Pierre Ablin |
Bilevel optimization for Machine Learning
|
11:00- | Gabriele Iommazzo |
Cycle-based formulations in Distance Geometry
|
11:30- | Dan Garber |
Efficiency of Gradient Methods for Low-Rank Recovery under Strict Complementarity
|
Lunch Break | ||
13:30- | Session Chair: Akiko Takeda | |
13:30- | Marco Cuturi |
Sparse Monge Maps
|
14:00- | Arnaud Doucet |
Diffusion Schrödinger Bridge Matching
|
14:30- | David Martinez-Rubio |
Accelerated and Sparse Algorithms for Approximate Personalized PageRank
|
Coffee Break | ||
15:30- | Session Chair: Sebastian Pokutta | |
15:30- | Akiko Yoshise |
Creating Collaborative Data Representations Using Matrix Manifold Optimization
|
16:00- | Elias Wirth |
Pivoting Frank-Wolfe algorithm
|
16:30- | Hiroshi Hirai |
Finding Hall blockers by matrix scaling
|
17:00- | Jean-Philippe Chancelier |
Topological Conditional Separation
|
Dinner |
August 9
Time | Speaker | Talk Title (for abstract click title or see below) |
---|---|---|
10:00- | Session Chair: Masashi Sugiyama | |
10:00- | Satoru Iwata |
Discrete Optimization and Machine Learning for Design and Discovery of Chemical Reactions
|
10:30- | Pierre-Louis Poirion |
Random subspace methods for unconstrained non-convex optimization
|
11:00- | Akiyoshi Shioura |
Characterization and Algorithm for Bivariate Multi-Unit Assignment Valuations
|
11:30- | Christoph Spiegel |
Flag Algebras in Additive Combinatorics
|
Lunch Break | ||
13:30- | Session Chair: Satoru Iwata | |
13:30- | Masashi Sugiyama |
Importance-Weighting Approach to Distribution Shift Adaptation
|
14:00- | Salvador Abreu |
Hybrid Constraint-Based Metaheuristics
|
14:30- | Katsuki Fujisawa |
The challenge to Graph500 benchmark: history and results
|
Coffee Break | ||
15:30- | Session Chair: Dan Garber | |
15:30- | Michel de Lara |
Hidden Convexity in the $l_0$ Pseudonorm
|
16:00- | Takashi Tsuchiya |
Closing Nonzero Duality Gaps of Singular Semidefinite Programs by Way of Perturbation
|
16:30- | Sebastian Pokutta |
Improved local models and new Bell inequalities via Frank-Wolfe algorithms
|
17:00- | Antoine Deza |
Kissing Polytopes
|
Abstracts
Pierre Ablin
Bilevel optimization, the problem of minimizing a value function that involves the arg-minimum of another function, appears in many areas of machine learning of practical importance, like hyperparameter search or implicit deep learning. This talk aims to describe machine learning problems in which bilevel optimization naturally appears, explain the differences with classical single-level optimization, and give an overview of recent algorithmic advances to solve the problem at scale.
Salvador Abreu
We discuss an architecture for constructing problem-specific solvers using multiple hybrid metaheuristics. (to be completed)
Jean-Philippe Chancelier
Pearl’s d-separation is a foundational notion to study conditional independence between random variables. We define the topological conditional separation and we show that it is equivalent to the d-separation, extended beyond acyclic graphs, be they finite or infinite. Joint work with Michel De Lara and Benjamin Heymann.
Marco Cuturi
We propose a new formulation of the Monge problem that results, by design, in sparse transport maps in $R^d$, i.e. such that $T(x) - x$ is a sparse vector in $R^d$.
Antoine Deza
We investigate the following question: how close can two disjoint lattice polytopes contained in a fixed hypercube be? This question stems from various contexts where the minimal distance between such polytopes appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. We also give similar bounds in the case of disjoint rational polytopes whose binary encoding length is prescribed. Joint work with Shmuel Onn, Sebastian Pokutta, and Lionel Pournin.
Arnaud Doucet
Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation or an Ordinary Differential Equation. However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schrödinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schrödinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.
Katsuki Fujisawa
Graph500 is a benchmark for shortest path search performance on very large graphs, requiring the integration of many technologies from algorithms to HPC. Our research team started working on this benchmark in 2011 and has been ranked number one worldwide for 17 terms from 2014 to 2023 using supercomputers such as Fugaku. The latest results show that we achieved a performance of 137 Tera TEPS (Traversed Edges Per Second) using 93% of all nodes on the supercomputer Fugaku, with a supergiant graph of 4.4 trillion nodes and 70.4 trillion edges. In this talk, I explain our efforts and results to Graph500.
Dan Garber
Low rank matrix or tensor recovery is a fundamental task in machine learning and statistics with many important applications. While convex relaxations for these tasks are often well understood and effective from a statistical point of view, they are considered not practical for high-dimensional instances since large high-rank matrices might be needed to store and factorized, which is computationally-prohibitve. In this talk I will challenge this common belief, by surveying a few results that indicate that under a classical condition known as strict complementarity, gradient methods, at least at the proximity of optimal solutions, require to store and manipulate only low-rank matrices/tensors which is computationally-efficient. Moreover, strict complementarity often implies linear convergence rates for such methods, even when the objective is not strongly convex. I will motivate the strict complementarity assumption both from theoretical and empirical perspectives
Hiroshi Hirai
For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $D_1AD_2$ for some positive diagonal matrices $D_1,D_2$. The Sinkhorn algorithm is a simple iterative algorithm, which repeats row-normalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and column-normalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively. By this algorithm, $A$ converges to a doubly stochastic matrix in limit if and only if the bipartite graph associated with $A$ has a perfect matching. This property can decide the existence of a perfect matching in a given bipartite graph $G$, which is identified with the $0,1$-matrix $A_G$. Linial, Samorodnitsky, and Wigderson showed that $O(n^2 \log n)$ iterations for $A_G$ decide whether $G$ has a perfect matching. Here $n$ is the number of vertices in one of the color classes of $G$. In this paper, we show an extension of this result: If $G$ has no perfect matching, then a polynomial number of the Sinkhorn iterations identifies a Hall blocker—a vertex subset $X$ having neighbors $\Gamma(X)$ with $|X| > |\Gamma(X)|$, which is a certificate of the nonexistence of a perfect matching. Specifically, we show that $O(n^2 \log n)$ iterations can identify one Hall blocker, and that further polynomial iterations can also identify all parametric Hall blockers $X$ of maximizing $(1-\lambda) |X| - \lambda |\Gamma(X)|$ for $\lambda \in [0,1]$. The former result is based on an interpretation of the Sinkhorn algorithm as alternating minimization for geometric programming. The latter is on an interpretation as alternating minimization for KL-divergence (Csiszár and Tusnády 1984, Gietl and Reffel 2013) and its limiting behavior for a nonscalable matrix (Aas 2014). We also relate the Sinkhorn limit with parametric network flow, principal partition of polymatroids, and the Dulmage-Mendelsohn decomposition of a bipartite graph. Joint work with Koyo Hayashi and Keiya Sakabe.
Gabriele Iommazzo
The distance geometry problem asks to find a realization of a given simple edge-weighted graph in a Euclidean space of given dimension $K$, where the edges are realized as straight segments of lengths equal (or as close as possible) to the edge weights. The problem is often modelled as a mathematical programming formulation involving decision variables that determine the position of the vertices in the given Euclidean space. Solution algorithms are generally constructed using local or global nonlinear optimization techniques. We present a new modelling technique for this problem where, instead of deciding vertex positions, the formulations decide the length of the segments representing the edges in each cycle in the graph, projected in every dimension. We compare computational results from protein conformation instances obtained using the new cycle-based formulation and the existing edge-based formulation. While edge-based formulations take less time to reach termination, cycle-based formulations are generally better on solution quality measures.
Satoru Iwata
Recent developments of quantum calculation enable us to explore the potential energy surface to extract chemical reaction
networks. Our current research project AICReDD aims at utilizing this advancement to design and/or discover new chemical
reactions. This naturally requires methods in discrete optimization and machine learning. One important example is how to
select diverse molecules from unexplored areas of chemical space. We have so far developed a new approach for selecting a
subset of diverse molecules from a given molecular list by combining two existing techniques in machine learning and discrete
optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework
called submodular function maximization. This method, called SubMo-GNN, first trains a GNN with property prediction tasks, and
then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of
molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity
of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done
efficiently by using the greedy algorithm which finds a subset with a guaranteed approximation ratio to the function value.
Maximization of this class of submodular functions has been investigated for the maximum a posteriori (MAP) inference of
determinantal point processes (DPPs), which is crucial for selecting diverse items in many machine learning applications.
Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have
studied its efficient implementation. Combining a recently developed “fast greedy” algorithm for this problem with a classical
technique called “lazy greedy” for general submodular function maximization, we have devised the currently fastest method in
practice.
Michel de Lara
The so-called $l_0$ pseudonorm counts the number of nonzero components of a vector. In this talk, we review a series of recent results on a class of Capra (Constant Along Primal Rays) conjugacies (and polarities) that reveal hidden convexity in the $l_0$ pseudonorm.
David Martinez-Rubio
It has recently been shown that ISTA, an unaccelerated optimization first-order method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity. In this talk I’ll explain our work on accelerated optimization algorithms for this problem that also perform sparse updates leading to faster convergence for certain parameter regimes. Further, we design a conjugate directions algorithm that achieves an exact solution while exploiting sparsity. Our findings apply beyond PageRank and work for any quadratic objective whose Hessian is a positive-definite $M$-matrix.
Pierre-Louis Poirion
In this talk, we present a randomized subspace regularized Newton method for a non-convex function. We show that our method has global convergence under appropriate assumptions, and its convergence rate is the same as that of the full regularized Newton method. Furthermore, we can obtain a local linear convergence rate, under some additional assumptions.
Sebastian Pokutta
In Bell scenarios with two outcomes per party, we algorithmically consider the two sides of the membership problem
for the local polytope: constructing local models and deriving separating hyperplanes, that is, Bell inequalities.
We take advantage of the recent developments in so-called Frank–Wolfe algorithms to significantly increase
the convergence rate of existing methods. As an application, we study the threshold value for the nonlocality
of two-qubit Werner states under projective measurements. Here, we improve on both the upper and lower bounds present
in the literature. Importantly, our bounds are entirely analytical; moreover, they yield refined bounds on the value
of the Grothendieck constant of order three: $1.4367 \leq K_G(3) \leq 1.4546$. We also demonstrate the efficiency
of our approach in multipartite Bell scenarios, and present the first local models for all projective measurements with
visibilities noticeably higher than the entanglement threshold. We make our entire code accessible as a Julia library
called BellPolytopes.jl
.
Akiyoshi Shioura
A multi-unit assignment valuation is a function represented by a weighted bipartite graph. In this paper, we provide a characterization of such a function in terms of maximizer sets of perturbed functions. We then present an algorithm that checks whether a given bivariate function is a multi-unit assignment valuation, and if the answer is yes, computes a weighted bipartite graph representing the function.
Christoph Spiegel
We explore the Ramsey multiplicity problem for additive structures by extending Razborov’s flag algebra calculus to additive structures in finite field vector spaces.
Masashi Sugiyama
Standard machine learning methods suffer distribution shifts between training and test data. In this talk, I will first give an overview of the classical importance weighting approach to distribution shift adaptation, which consists of an importance estimation step and an importance-weighted training step. Then, I will present a more recent approach that simultaneously estimates the importance weight and trains a predictor. Finally, I will discuss a more challenging scenario of continuous distribution shifts, where the data distributions change continuously over time.
Akiko Takeda
Various machine learning (ML) models are equipped with parameters that need to be prefixed, and such parameters are often called hyperparameters. Needless to say, the prediction performance of ML models significantly depends on the choice of hyperparameters. In this talk, we will show several application examples where bi/trilevel optimization approach is used for hyperparameter selection. One of them is a trilevel hyperparameter learning model in the adversarial setting and we will present its solution method and experimental results.
Takashi Tsuchiya
One of the major difficulties in semidefinite programming is the existence of nonzero duality gaps where strong duality fails. In this talk, we conduct a perturbation analysis of such singular semidefinite programs and discuss hidden continuous structure behind nonzero duality gaps of semidefinite programming. We also discuss its implications to design of SDP algorithms. This is a joint work with Bruno Lourenco, Masakazu Muramatsu and Takayuki Okuno.
Elias Wirth
The Frank-Wolfe algorithm (FW) is an iterative process for solving constrained convex optimization problems. FW variants possess a range of attractive features, including affine-invariance, ease-of-implementation, projection-free operations, and the representation of the current iterate as a sparse convex combination of vertices of the feasible region. This study focuses on the latter characteristic and presents the cardinality-guaranteed meta-algorithm (CGM), which enhances existing FW variants by ensuring that the iterates consist of convex combinations of at most $n + 1$ vertices, where $n$ denotes the dimension of the ambient space. CGM provides bounds on the size of the active set while preserving the beneficial properties of FW variants, including convergence rate guarantees.
Akiko Yoshise
The trade-off between performance and privacy is a pain in the neck for centralized machine learning methods. Fed-DR-Filter and Data Collaboration Analysis (DCA) can overcome this diffi culty through Collaborative Data Representation (CDR). We propose an alternative algorithm for CDR creation, utilizing matrix manifold optimization. We devise machine learning models in the DCA setting to evaluate algorithms. The results show that our algorithm outperforms the state-of-the-art approach in mean recognition performance within acceptable computation time.
Previous Editions
[ DO x ML 2018 | DO x ML 2019 | DO x ML 2020 | DO x ML 2021 ]