Optimizing the Optimizers (NIPS 2016)¶

Barcelona, Spain, December 9 or 10, 2016

Optimization problems in machine learning have aspects that make them more challenging than the traditional settings, like stochasticity, and parameters with side-effects (e.g., the batch size and structure). The field has invented many different approaches to deal with these demands. Unfortunately - and intriguingly - this extra functionality seems to invariably necessitate the introduction of tuning parameters: step sizes, decay rates, cycle lengths, batch sampling distributions, and so on. Such parameters are not present, or at least not as prominent, in classic optimization methods. But getting them right is frequently crucial, and necessitates inconvenient human “babysitting”.

Recent work has increasingly tried to eliminate such fiddle factors, typically by statistical estimation. This also includes automatic selection of external parameters like the batch-size or -structure, which have not traditionally been treated as part of the optimization task. Several different strategies have now been proposed, but they are not always compatible with each other, and lack a common framework that would foster both conceptual and algorithmic interoperability. This workshop aims to provide a forum for the nascent community studying automating parameter-tuning in optimization routines.

Among the questions to be addressed by the workshop are:

Is the prominence of tuning parameters a fundamental feature of stochastic optimization problems? Why do classic optimization methods manage to do well with virtually no free parameters?
In which precise sense can the “optimization of optimization algorithms” be phrased as an inference / learning problem?
Should, and can, parameters be inferred at design-time (by a human), at compile-time (by an external compiler with access to a meta-description of the problem) or run-time (by the algorithm itself)?
What are generic ways to learn parameters of algorithms, and inherent difficulties for doing so? Is the goal to specialize to a particular problem, or to generalize over many problems?

The workshop is organized by Maren Mahsereci, Alex Davies and Philipp Hennig.

Schedule¶

The workshop will be held on Saturday, 10 December, in Area 2

Time		Event	Material
09:00-09:10	—	Opening Remarks
09:10-09:30	—	Matt Hoffman (DeepMind)
09:30-10:00	—	David Duvenaud (U Toronto)	(slides)
10:00-10:30	—	Stephen J Wright (U of Wisconsin)	(slides)
10.30-11.00	—	Coffee Break
11:00-11:30	—	Samantha Hansen (Spotify)	(slides)
11:30-12:00	—	Spotlights	(see below)
12:00-12:45	—	Poster Session
12:45-14:15	—	Lunch Break
14:15-14:40	—	Matteo Pirotta (Politecnico di Milano)
14:40-15:00	—	Ameet Talwalkar (UCLA)	(slides)
15:00-15:30	—	Coffee Break
15:30-15:50	—	Ali Rahimi (Google)
15:50-16.20	—	Mark Schmidt (UBC)
16:20-17:00	—	Panel Discussion

Accepted Papers¶

(in alphabetical order, by first author’s surname)

Ömer Deniz Akyildiz, Víctor Elvira, Jesus Fernandez-Bes, Joaquín Miguez. On the Relationship between Online Optimizers and Recursive Filters]
Matt Bonakdarpour and Panagiotis (Panos) Toulis. Statistical Perspectives of Stochastic Optimization
Anirban Chaudhuri, David Wolpert, Brendan Tracey. Stochastic Optimization and Machine Learning: Cross-Validation for Cross-Entropy Method
Kamil Ciosek and Shimon Whiteson. Off-Environment RL with Rare Events
Guilherme França and José Bento. Tuning Over-Relaxed ADMM
Tobias Glasmachers. Small Stochastic Average Gradient Steps
Ke Li and Jitendra Malik. Learning to Optimize
Ben London. Generalization Bounds for Randomized Learning with Application to Stochastic Gradient Descent
Matteo Pirotta and Marcello Restelli. Cost-Sensitive Approach for Batch Size Optimization