Banner

Fifth Conference on
Discrete Optimization and Machine Learning

Fall 2023 x Tokyo, Japan

Coordinates

Dates: 8th - 9th August, 2023
Place: National Graduate Institute for Policy Studies (GRIPS), Tokyo, Japan
Organizers: Antoine Deza, Sebastian Pokutta, Takashi Tsuchiya

Venue: Both days will be in Lecture Room M
The campus can be conveniently accessed by public transportation.

DOML 2023

Conference Dinner: ようざん (Yōzan)
Tokyo, Minato City, Roppongi, 7 Chome−6−5 地下1階DISSYビル
[map]

Conference Photo

DOML 2023

Program

August 8

TimeSpeakerTalk Title (for abstract click title or see below)
09:30-10:00Registration/Coffee
10:00-12:00Session Chair: Takashi Tsuchiya
10:00-10:30Akiko Takeda
Bi/trilevel Optimization Approach for Hyperparameter Tuning


Abstract. Various machine learning (ML) models are equipped with parameters that need to be prefixed, and such parameters are often called hyperparameters. Needless to say, the prediction performance of ML models significantly depends on the choice of hyperparameters. In this talk, we will show several application examples where bi/trilevel optimization approach is used for hyperparameter selection. One of them is a trilevel hyperparameter learning model in the adversarial setting and we will present its solution method and experimental results.

10:30-11:00Pierre Ablin
Bilevel optimization for Machine Learning


Abstract. Bilevel optimization, the problem of minimizing a value function that involves the arg-minimum of another function, appears in many areas of machine learning of practical importance, like hyperparameter search or implicit deep learning. This talk aims to describe machine learning problems in which bilevel optimization naturally appears, explain the differences with classical single-level optimization, and give an overview of recent algorithmic advances to solve the problem at scale.

11:00-11:30Gabriele Iommazzo
Cycle-based formulations in Distance Geometry


Abstract. The distance geometry problem asks to find a realization of a given simple edge-weighted graph in a Euclidean space of given dimension $K$, where the edges are realized as straight segments of lengths equal (or as close as possible) to the edge weights. The problem is often modelled as a mathematical programming formulation involving decision variables that determine the position of the vertices in the given Euclidean space. Solution algorithms are generally constructed using local or global nonlinear optimization techniques. We present a new modelling technique for this problem where, instead of deciding vertex positions, the formulations decide the length of the segments representing the edges in each cycle in the graph, projected in every dimension. We compare computational results from protein conformation instances obtained using the new cycle-based formulation and the existing edge-based formulation. While edge-based formulations take less time to reach termination, cycle-based formulations are generally better on solution quality measures.

11:30-12:00Dan Garber
Efficiency of Gradient Methods for Low-Rank Recovery under Strict Complementarity


Abstract. Low rank matrix or tensor recovery is a fundamental task in machine learning and statistics with many important applications. While convex relaxations for these tasks are often well understood and effective from a statistical point of view, they are considered not practical for high-dimensional instances since large high-rank matrices might be needed to store and factorized, which is computationally-prohibitve. In this talk I will challenge this common belief, by surveying a few results that indicate that under a classical condition known as strict complementarity, gradient methods, at least at the proximity of optimal solutions, require to store and manipulate only low-rank matrices/tensors which is computationally-efficient. Moreover, strict complementarity often implies linear convergence rates for such methods, even when the objective is not strongly convex. I will motivate the strict complementarity assumption both from theoretical and empirical perspectives

Lunch Break
13:30-15:00Session Chair: Akiko Takeda
13:30-14:00Marco Cuturi
Sparse Monge Maps


Abstract. We propose a new formulation of the Monge problem that results, by design, in sparse transport maps in $R^d$, i.e. such that $T(x) - x$ is a sparse vector in $R^d$.

14:00-14:30Arnaud Doucet
Diffusion Schrödinger Bridge Matching


Abstract. Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation or an Ordinary Differential Equation. However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schrödinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schrödinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.

14:30-15:00David Martinez-Rubio
Accelerated and Sparse Algorithms for Approximate Personalized PageRank


Abstract. It has recently been shown that ISTA, an unaccelerated optimization first-order method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity. In this talk I’ll explain our work on accelerated optimization algorithms for this problem that also perform sparse updates leading to faster convergence for certain parameter regimes. Further, we design a conjugate directions algorithm that achieves an exact solution while exploiting sparsity. Our findings apply beyond PageRank and work for any quadratic objective whose Hessian is a positive-definite $M$-matrix.

Coffee Break
15:30-17:30Session Chair: Sebastian Pokutta
15:30-16:00Akiko Yoshise
Creating Collaborative Data Representations Using Matrix Manifold Optimization


Abstract. The trade-off between performance and privacy is a pain in the neck for centralized machine learning methods. Fed-DR-Filter and Data Collaboration Analysis (DCA) can overcome this diffi culty through Collaborative Data Representation (CDR). We propose an alternative algorithm for CDR creation, utilizing matrix manifold optimization. We devise machine learning models in the DCA setting to evaluate algorithms. The results show that our algorithm outperforms the state-of-the-art approach in mean recognition performance within acceptable computation time.

16:00-16:30Elias Wirth
Pivoting Frank-Wolfe algorithm


Abstract. The Frank-Wolfe algorithm (FW) is an iterative process for solving constrained convex optimization problems. FW variants possess a range of attractive features, including affine-invariance, ease-of-implementation, projection-free operations, and the representation of the current iterate as a sparse convex combination of vertices of the feasible region. This study focuses on the latter characteristic and presents the cardinality-guaranteed meta-algorithm (CGM), which enhances existing FW variants by ensuring that the iterates consist of convex combinations of at most $n + 1$ vertices, where $n$ denotes the dimension of the ambient space. CGM provides bounds on the size of the active set while preserving the beneficial properties of FW variants, including convergence rate guarantees.

16:30-17:00Hiroshi Hirai
Finding Hall blockers by matrix scaling


Abstract. For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $D_1AD_2$ for some positive diagonal matrices $D_1,D_2$. The Sinkhorn algorithm is a simple iterative algorithm, which repeats row-normalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and column-normalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively. By this algorithm, $A$ converges to a doubly stochastic matrix in limit if and only if the bipartite graph associated with $A$ has a perfect matching. This property can decide the existence of a perfect matching in a given bipartite graph $G$, which is identified with the $0,1$-matrix $A_G$. Linial, Samorodnitsky, and Wigderson showed that $O(n^2 \log n)$ iterations for $A_G$ decide whether $G$ has a perfect matching. Here $n$ is the number of vertices in one of the color classes of $G$. In this paper, we show an extension of this result: If $G$ has no perfect matching, then a polynomial number of the Sinkhorn iterations identifies a Hall blocker—a vertex subset $X$ having neighbors $\Gamma(X)$ with $|X| > |\Gamma(X)|$, which is a certificate of the nonexistence of a perfect matching. Specifically, we show that $O(n^2 \log n)$ iterations can identify one Hall blocker, and that further polynomial iterations can also identify all parametric Hall blockers $X$ of maximizing $(1-\lambda) |X| - \lambda |\Gamma(X)|$ for $\lambda \in [0,1]$. The former result is based on an interpretation of the Sinkhorn algorithm as alternating minimization for geometric programming. The latter is on an interpretation as alternating minimization for KL-divergence (Csiszár and Tusnády 1984, Gietl and Reffel 2013) and its limiting behavior for a nonscalable matrix (Aas 2014). We also relate the Sinkhorn limit with parametric network flow, principal partition of polymatroids, and the Dulmage-Mendelsohn decomposition of a bipartite graph. Joint work with Koyo Hayashi and Keiya Sakabe.

17:00-17:30Jean-Philippe Chancelier
Topological Conditional Separation


Abstract. Pearl’s d-separation is a foundational notion to study conditional independence between random variables. We define the topological conditional separation and we show that it is equivalent to the d-separation, extended beyond acyclic graphs, be they finite or infinite. Joint work with Michel De Lara and Benjamin Heymann.

Dinner

August 9

TimeSpeakerTalk Title (for abstract click title or see below)
10:00-12:00Session Chair: Masashi Sugiyama
10:00-10:30Satoru Iwata
Discrete Optimization and Machine Learning for Design and Discovery of Chemical Reactions


Abstract. Recent developments of quantum calculation enable us to explore the potential energy surface to extract chemical reaction networks. Our current research project AICReDD aims at utilizing this advancement to design and/or discover new chemical reactions. This naturally requires methods in discrete optimization and machine learning. One important example is how to select diverse molecules from unexplored areas of chemical space. We have so far developed a new approach for selecting a subset of diverse molecules from a given molecular list by combining two existing techniques in machine learning and discrete optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. This method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm which finds a subset with a guaranteed approximation ratio to the function value.
Maximization of this class of submodular functions has been investigated for the maximum a posteriori (MAP) inference of determinantal point processes (DPPs), which is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. Combining a recently developed “fast greedy” algorithm for this problem with a classical technique called “lazy greedy” for general submodular function maximization, we have devised the currently fastest method in practice.

10:30-11:00Pierre-Louis Poirion
Random subspace methods for unconstrained non-convex optimization


Abstract. In this talk, we present a randomized subspace regularized Newton method for a non-convex function. We show that our method has global convergence under appropriate assumptions, and its convergence rate is the same as that of the full regularized Newton method. Furthermore, we can obtain a local linear convergence rate, under some additional assumptions.

11:00-11:30Akiyoshi Shioura
Characterization and Algorithm for Bivariate Multi-Unit Assignment Valuations


Abstract. A multi-unit assignment valuation is a function represented by a weighted bipartite graph. In this paper, we provide a characterization of such a function in terms of maximizer sets of perturbed functions. We then present an algorithm that checks whether a given bivariate function is a multi-unit assignment valuation, and if the answer is yes, computes a weighted bipartite graph representing the function.

11:30-12:00Christoph Spiegel
Flag Algebras in Additive Combinatorics


Abstract. We explore the Ramsey multiplicity problem for additive structures by extending Razborov’s flag algebra calculus to additive structures in finite field vector spaces.

Lunch Break
13:30-15:00Session Chair: Satoru Iwata
13:30-14:00Masashi Sugiyama
Importance-Weighting Approach to Distribution Shift Adaptation


Abstract. Standard machine learning methods suffer distribution shifts between training and test data. In this talk, I will first give an overview of the classical importance weighting approach to distribution shift adaptation, which consists of an importance estimation step and an importance-weighted training step. Then, I will present a more recent approach that simultaneously estimates the importance weight and trains a predictor. Finally, I will discuss a more challenging scenario of continuous distribution shifts, where the data distributions change continuously over time.

14:00-14:30Salvador Abreu
Hybrid Constraint-Based Metaheuristics


Abstract. We discuss an architecture for constructing problem-specific solvers using multiple hybrid metaheuristics. (to be completed)

14:30-15:00Katsuki Fujisawa
The challenge to Graph500 benchmark: history and results


Abstract. Graph500 is a benchmark for shortest path search performance on very large graphs, requiring the integration of many technologies from algorithms to HPC. Our research team started working on this benchmark in 2011 and has been ranked number one worldwide for 17 terms from 2014 to 2023 using supercomputers such as Fugaku. The latest results show that we achieved a performance of 137 Tera TEPS (Traversed Edges Per Second) using 93% of all nodes on the supercomputer Fugaku, with a supergiant graph of 4.4 trillion nodes and 70.4 trillion edges. In this talk, I explain our efforts and results to Graph500.

Coffee Break
15:30-17:30Session Chair: Dan Garber
15:30-16:00Michel de Lara
Hidden Convexity in the $l_0$ Pseudonorm


Abstract. The so-called $l_0$ pseudonorm counts the number of nonzero components of a vector. In this talk, we review a series of recent results on a class of Capra (Constant Along Primal Rays) conjugacies (and polarities) that reveal hidden convexity in the $l_0$ pseudonorm.

16:00-16:30Takashi Tsuchiya
Closing Nonzero Duality Gaps of Singular Semidefinite Programs by Way of Perturbation


Abstract. One of the major difficulties in semidefinite programming is the existence of nonzero duality gaps where strong duality fails. In this talk, we conduct a perturbation analysis of such singular semidefinite programs and discuss hidden continuous structure behind nonzero duality gaps of semidefinite programming. We also discuss its implications to design of SDP algorithms. This is a joint work with Bruno Lourenco, Masakazu Muramatsu and Takayuki Okuno.

16:30-17:00Sebastian Pokutta
Improved local models and new Bell inequalities via Frank-Wolfe algorithms


Abstract. In Bell scenarios with two outcomes per party, we algorithmically consider the two sides of the membership problem for the local polytope: constructing local models and deriving separating hyperplanes, that is, Bell inequalities. We take advantage of the recent developments in so-called Frank–Wolfe algorithms to significantly increase the convergence rate of existing methods. As an application, we study the threshold value for the nonlocality of two-qubit Werner states under projective measurements. Here, we improve on both the upper and lower bounds present in the literature. Importantly, our bounds are entirely analytical; moreover, they yield refined bounds on the value of the Grothendieck constant of order three: $1.4367 \leq K_G(3) \leq 1.4546$. We also demonstrate the efficiency of our approach in multipartite Bell scenarios, and present the first local models for all projective measurements with visibilities noticeably higher than the entanglement threshold. We make our entire code accessible as a Julia library called BellPolytopes.jl.

17:00-17:30Antoine Deza
Kissing Polytopes


Abstract. We investigate the following question: how close can two disjoint lattice polytopes contained in a fixed hypercube be? This question stems from various contexts where the minimal distance between such polytopes appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. We also give similar bounds in the case of disjoint rational polytopes whose binary encoding length is prescribed. Joint work with Shmuel Onn, Sebastian Pokutta, and Lionel Pournin.

Abstracts

Bilevel optimization for Machine Learning
Pierre Ablin

Bilevel optimization, the problem of minimizing a value function that involves the arg-minimum of another function, appears in many areas of machine learning of practical importance, like hyperparameter search or implicit deep learning. This talk aims to describe machine learning problems in which bilevel optimization naturally appears, explain the differences with classical single-level optimization, and give an overview of recent algorithmic advances to solve the problem at scale.

Hybrid Constraint-Based Metaheuristics
Salvador Abreu

We discuss an architecture for constructing problem-specific solvers using multiple hybrid metaheuristics. (to be completed)

Topological Conditional Separation
Jean-Philippe Chancelier

Pearl’s d-separation is a foundational notion to study conditional independence between random variables. We define the topological conditional separation and we show that it is equivalent to the d-separation, extended beyond acyclic graphs, be they finite or infinite. Joint work with Michel De Lara and Benjamin Heymann.

Sparse Monge Maps
Marco Cuturi

We propose a new formulation of the Monge problem that results, by design, in sparse transport maps in $R^d$, i.e. such that $T(x) - x$ is a sparse vector in $R^d$.

Kissing Polytopes
Antoine Deza

We investigate the following question: how close can two disjoint lattice polytopes contained in a fixed hypercube be? This question stems from various contexts where the minimal distance between such polytopes appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. We also give similar bounds in the case of disjoint rational polytopes whose binary encoding length is prescribed. Joint work with Shmuel Onn, Sebastian Pokutta, and Lionel Pournin.

Diffusion Schrödinger Bridge Matching
Arnaud Doucet

Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation or an Ordinary Differential Equation. However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schrödinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schrödinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.

The challenge to Graph500 benchmark: history and results
Katsuki Fujisawa

Graph500 is a benchmark for shortest path search performance on very large graphs, requiring the integration of many technologies from algorithms to HPC. Our research team started working on this benchmark in 2011 and has been ranked number one worldwide for 17 terms from 2014 to 2023 using supercomputers such as Fugaku. The latest results show that we achieved a performance of 137 Tera TEPS (Traversed Edges Per Second) using 93% of all nodes on the supercomputer Fugaku, with a supergiant graph of 4.4 trillion nodes and 70.4 trillion edges. In this talk, I explain our efforts and results to Graph500.

Efficiency of Gradient Methods for Low-Rank Recovery under Strict Complementarity
Dan Garber

Low rank matrix or tensor recovery is a fundamental task in machine learning and statistics with many important applications. While convex relaxations for these tasks are often well understood and effective from a statistical point of view, they are considered not practical for high-dimensional instances since large high-rank matrices might be needed to store and factorized, which is computationally-prohibitve. In this talk I will challenge this common belief, by surveying a few results that indicate that under a classical condition known as strict complementarity, gradient methods, at least at the proximity of optimal solutions, require to store and manipulate only low-rank matrices/tensors which is computationally-efficient. Moreover, strict complementarity often implies linear convergence rates for such methods, even when the objective is not strongly convex. I will motivate the strict complementarity assumption both from theoretical and empirical perspectives

Finding Hall blockers by matrix scaling
Hiroshi Hirai

For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $D_1AD_2$ for some positive diagonal matrices $D_1,D_2$. The Sinkhorn algorithm is a simple iterative algorithm, which repeats row-normalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and column-normalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively. By this algorithm, $A$ converges to a doubly stochastic matrix in limit if and only if the bipartite graph associated with $A$ has a perfect matching. This property can decide the existence of a perfect matching in a given bipartite graph $G$, which is identified with the $0,1$-matrix $A_G$. Linial, Samorodnitsky, and Wigderson showed that $O(n^2 \log n)$ iterations for $A_G$ decide whether $G$ has a perfect matching. Here $n$ is the number of vertices in one of the color classes of $G$. In this paper, we show an extension of this result: If $G$ has no perfect matching, then a polynomial number of the Sinkhorn iterations identifies a Hall blocker—a vertex subset $X$ having neighbors $\Gamma(X)$ with $|X| > |\Gamma(X)|$, which is a certificate of the nonexistence of a perfect matching. Specifically, we show that $O(n^2 \log n)$ iterations can identify one Hall blocker, and that further polynomial iterations can also identify all parametric Hall blockers $X$ of maximizing $(1-\lambda) |X| - \lambda |\Gamma(X)|$ for $\lambda \in [0,1]$. The former result is based on an interpretation of the Sinkhorn algorithm as alternating minimization for geometric programming. The latter is on an interpretation as alternating minimization for KL-divergence (Csiszár and Tusnády 1984, Gietl and Reffel 2013) and its limiting behavior for a nonscalable matrix (Aas 2014). We also relate the Sinkhorn limit with parametric network flow, principal partition of polymatroids, and the Dulmage-Mendelsohn decomposition of a bipartite graph. Joint work with Koyo Hayashi and Keiya Sakabe.

Cycle-based formulations in Distance Geometry
Gabriele Iommazzo

The distance geometry problem asks to find a realization of a given simple edge-weighted graph in a Euclidean space of given dimension $K$, where the edges are realized as straight segments of lengths equal (or as close as possible) to the edge weights. The problem is often modelled as a mathematical programming formulation involving decision variables that determine the position of the vertices in the given Euclidean space. Solution algorithms are generally constructed using local or global nonlinear optimization techniques. We present a new modelling technique for this problem where, instead of deciding vertex positions, the formulations decide the length of the segments representing the edges in each cycle in the graph, projected in every dimension. We compare computational results from protein conformation instances obtained using the new cycle-based formulation and the existing edge-based formulation. While edge-based formulations take less time to reach termination, cycle-based formulations are generally better on solution quality measures.

Discrete Optimization and Machine Learning for Design and Discovery of Chemical Reactions
Satoru Iwata

Recent developments of quantum calculation enable us to explore the potential energy surface to extract chemical reaction networks. Our current research project AICReDD aims at utilizing this advancement to design and/or discover new chemical reactions. This naturally requires methods in discrete optimization and machine learning. One important example is how to select diverse molecules from unexplored areas of chemical space. We have so far developed a new approach for selecting a subset of diverse molecules from a given molecular list by combining two existing techniques in machine learning and discrete optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. This method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm which finds a subset with a guaranteed approximation ratio to the function value.
Maximization of this class of submodular functions has been investigated for the maximum a posteriori (MAP) inference of determinantal point processes (DPPs), which is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. Combining a recently developed “fast greedy” algorithm for this problem with a classical technique called “lazy greedy” for general submodular function maximization, we have devised the currently fastest method in practice.

Hidden Convexity in the $l_0$ Pseudonorm
Michel de Lara

The so-called $l_0$ pseudonorm counts the number of nonzero components of a vector. In this talk, we review a series of recent results on a class of Capra (Constant Along Primal Rays) conjugacies (and polarities) that reveal hidden convexity in the $l_0$ pseudonorm.

Accelerated and Sparse Algorithms for Approximate Personalized PageRank
David Martinez-Rubio

It has recently been shown that ISTA, an unaccelerated optimization first-order method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity. In this talk I’ll explain our work on accelerated optimization algorithms for this problem that also perform sparse updates leading to faster convergence for certain parameter regimes. Further, we design a conjugate directions algorithm that achieves an exact solution while exploiting sparsity. Our findings apply beyond PageRank and work for any quadratic objective whose Hessian is a positive-definite $M$-matrix.

Random subspace methods for unconstrained non-convex optimization
Pierre-Louis Poirion

In this talk, we present a randomized subspace regularized Newton method for a non-convex function. We show that our method has global convergence under appropriate assumptions, and its convergence rate is the same as that of the full regularized Newton method. Furthermore, we can obtain a local linear convergence rate, under some additional assumptions.

Improved local models and new Bell inequalities via Frank-Wolfe algorithms
Sebastian Pokutta

In Bell scenarios with two outcomes per party, we algorithmically consider the two sides of the membership problem for the local polytope: constructing local models and deriving separating hyperplanes, that is, Bell inequalities. We take advantage of the recent developments in so-called Frank–Wolfe algorithms to significantly increase the convergence rate of existing methods. As an application, we study the threshold value for the nonlocality of two-qubit Werner states under projective measurements. Here, we improve on both the upper and lower bounds present in the literature. Importantly, our bounds are entirely analytical; moreover, they yield refined bounds on the value of the Grothendieck constant of order three: $1.4367 \leq K_G(3) \leq 1.4546$. We also demonstrate the efficiency of our approach in multipartite Bell scenarios, and present the first local models for all projective measurements with visibilities noticeably higher than the entanglement threshold. We make our entire code accessible as a Julia library called BellPolytopes.jl.

Characterization and Algorithm for Bivariate Multi-Unit Assignment Valuations
Akiyoshi Shioura

A multi-unit assignment valuation is a function represented by a weighted bipartite graph. In this paper, we provide a characterization of such a function in terms of maximizer sets of perturbed functions. We then present an algorithm that checks whether a given bivariate function is a multi-unit assignment valuation, and if the answer is yes, computes a weighted bipartite graph representing the function.

Flag Algebras in Additive Combinatorics
Christoph Spiegel

We explore the Ramsey multiplicity problem for additive structures by extending Razborov’s flag algebra calculus to additive structures in finite field vector spaces.

Importance-Weighting Approach to Distribution Shift Adaptation
Masashi Sugiyama

Standard machine learning methods suffer distribution shifts between training and test data. In this talk, I will first give an overview of the classical importance weighting approach to distribution shift adaptation, which consists of an importance estimation step and an importance-weighted training step. Then, I will present a more recent approach that simultaneously estimates the importance weight and trains a predictor. Finally, I will discuss a more challenging scenario of continuous distribution shifts, where the data distributions change continuously over time.

Bi/trilevel Optimization Approach for Hyperparameter Tuning
Akiko Takeda

Various machine learning (ML) models are equipped with parameters that need to be prefixed, and such parameters are often called hyperparameters. Needless to say, the prediction performance of ML models significantly depends on the choice of hyperparameters. In this talk, we will show several application examples where bi/trilevel optimization approach is used for hyperparameter selection. One of them is a trilevel hyperparameter learning model in the adversarial setting and we will present its solution method and experimental results.

Closing Nonzero Duality Gaps of Singular Semidefinite Programs by Way of Perturbation
Takashi Tsuchiya

One of the major difficulties in semidefinite programming is the existence of nonzero duality gaps where strong duality fails. In this talk, we conduct a perturbation analysis of such singular semidefinite programs and discuss hidden continuous structure behind nonzero duality gaps of semidefinite programming. We also discuss its implications to design of SDP algorithms. This is a joint work with Bruno Lourenco, Masakazu Muramatsu and Takayuki Okuno.

Pivoting Frank-Wolfe algorithm
Elias Wirth

The Frank-Wolfe algorithm (FW) is an iterative process for solving constrained convex optimization problems. FW variants possess a range of attractive features, including affine-invariance, ease-of-implementation, projection-free operations, and the representation of the current iterate as a sparse convex combination of vertices of the feasible region. This study focuses on the latter characteristic and presents the cardinality-guaranteed meta-algorithm (CGM), which enhances existing FW variants by ensuring that the iterates consist of convex combinations of at most $n + 1$ vertices, where $n$ denotes the dimension of the ambient space. CGM provides bounds on the size of the active set while preserving the beneficial properties of FW variants, including convergence rate guarantees.

Creating Collaborative Data Representations Using Matrix Manifold Optimization
Akiko Yoshise

The trade-off between performance and privacy is a pain in the neck for centralized machine learning methods. Fed-DR-Filter and Data Collaboration Analysis (DCA) can overcome this diffi culty through Collaborative Data Representation (CDR). We propose an alternative algorithm for CDR creation, utilizing matrix manifold optimization. We devise machine learning models in the DCA setting to evaluate algorithms. The results show that our algorithm outperforms the state-of-the-art approach in mean recognition performance within acceptable computation time.

Previous Editions

[ DO x ML 2018 | DO x ML 2019 | DO x ML 2020 | DO x ML 2021 ]