Nonlinear acceleration algorithms improve the performance of iterative methods, such as gradient descent, using the information contained in past iterates. However, their efficiency is still not entirely understood even in the quadratic case. In this paper, we clarify the convergence analysis by giving general properties that share several classes of nonlinear acceleration: Anderson acceleration (and variants), quasi-Newton methods (such as Broyden Type-I or Type-II, SR1, DFP, and BFGS) and Krylov methods (Conjugate Gradient, MINRES, GMRES). In particular, we propose a generic family of algorithms that contains all the previous methods and prove its optimal rate of convergence when minimizing quadratic functions. We also propose multi-secants updates for the quasi-Newton methods listed above. We provide a Matlab code implementing the algorithm.

Extra-gradient with player sampling for provable fast convergence in n-player games. S. Jelassi, C. D. Enrich, D. Scieur, A. Mensch, J. Bruna.

Data-driven model training is increasingly relying on finding Nash equilibria with provable techniques, e.g., for GANs and multi-agent RL. In this paper, we analyse a new extra-gradient method, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits the same rate of convergence as full extra-gradient in non-smooth convex games. We propose an additional variance reduction mechanism for this to hold for smooth convex games. Our approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using cyclic sampling schemes. We demonstrate the efficiency of player sampling on large-scale non-smooth and non-strictly convex games. We show that the joint use of extrapolation and player sampling allows to train better GANs on CIFAR10.

2018

Online Regularized Nonlinear Acceleration. D. Scieur, E. Oyallon, A. d’Aspremont, F. Bach.

Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods. However, RNA cannot accelerate faster multistep algorithms like Nesterov’s method and often diverges in this context. Here, we adapt RNA to overcome these issues, so that our scheme can be used on fast algorithms such as gradient methods with momentum. We show optimal complexity bounds for quadratics and asymptotically optimal rates on general convex minimization problems. Moreover, this new scheme works online, i.e., extrapolated solution estimates can be reinjected at each iteration, significantly improving numerical performance over classical accelerated methods.

Nonlinear acceleration of cnns. D. Scieur, E. Oyallon, A. d’Aspremont, F. Bach.

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training.

2017

Application of Stochastic Dual Dynamic Programming to the Real-Time Dispatch of Storage under Renewable Supply Uncertainty. A. Papavasiliou, Y. Mou, L. Cambier, D. Scieur.

This paper presents a multi-stage stochastic programming formulation of transmission-constrained economic dispatch subject to multi-area renewable production uncertainty, with a focus on optimizing the dispatch of storage in real-time operations. This problem is resolved using stochastic dual dynamic programming. The applicability of the proposed approach is demonstrated on a realistic case study of the German power system calibrated against the solar and wind power integration levels of 2013-2014, with a 24-hour horizon and 15-minute time step. The value of the stochastic solution relative to the cost of a deterministic policy amounts to 1.1%, while the value of perfect foresight relative to the cost of the stochastic programming policy amounts to 0.8%. The relative performance of various alternative real-time dispatch policies is analyzed, and the sensitivity of the results is explored.

The goal of this master thesis will be firstly to analyze with precision the behavior of the cubic regularization of Newton’s method on strongly convex functions. During this analysis we will find that the regularisation does not work as well as expected when the function is also smooth, unlike the gradient method. We will thus propose some variants of the original algorithm in order to have better global performances.

Conference Articles

2019

Nonlinear Acceleration of Primal-Dual Algorithms. R. Bollapragada, D. Scieur, A. d’Aspremont.

In The 22nd International Conference on Artificial Intelligence and Statistics.

We describe a convergence acceleration scheme for multi-step optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization algorithm. Our scheme does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterov’s accelerated method, or primal-dual methods such as Chambolle-Pock. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeix’s conjecture to show that acceleration is controlled by the solution of a Chebyshev problem on the numerical range of a nonsymmetric operator modelling the behavior of iterates near the optimum. Numerical experiments are detailed on image processing and logistic regression problems.

2018

Optimal Management of Storage for Offsetting Solar Power Uncertainty using Multistage Stochastic Programming. T. Kaneda, B. Losseau, A. Papavasiliou, D. Scieur, L. Cambier, P. Henneaux, N. Leemput.

In Proceedings of 20th Power Systems Computation Conference (to appear).

Africa has recently engaged in implementing an aggressive renewable energy integration plan. A major challenge in the deployment of renewable power is the management of excess energy. The use of battery storage has been considered as a technically attractive solution. This paper tackles this operational problem using stochastic dual dynamical programming. We present an open-source MATLAB toolbox for multistage stochastic programming which employs stochastic dual dynamic programming. We use the toolbox in order to compare the stochastic solution to a greedy policy which operates batteries without future foresight as a benchmark. We consider a case study of storage management in Burkina Faso. We quantify the benefits of the stochastic solution and test the sensitivity of our results to the optimization horizon of the stochastic program.

2017

Integration Methods and Accelerated Optimization Algorithms. D. Scieur, V. Roulet, F. Bach, A. d’Aspremont.

In Advances In Neural Information Processing Systems.

We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation. In comparison with recent advances in this vein, the differential equation considered here is the basic gradient flow and we show that multi-step schemes allow integration of this differential equation using larger step sizes, thus intuitively explaining acceleration results.

Nonlinear Acceleration of Stochastic Algorithms. D. Scieur, A. d’Aspremont, F. Bach.

In Advances In Neural Information Processing Systems.

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.

2016

Regularized Nonlinear Acceleration. D. Scieur, A. d’Aspremont, F. Bach.

In Advances In Neural Information Processing Systems.

We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.