$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$
$P(A|B) \propto P(B|A) P(A)$
$P(H|E) = \frac{P(E|H) P(H)}{P(E)}$
$P(H|E) \propto P(E|H) P(H)$
(for more information refer to MI-SPI)

* ARD stands for automatic relevance determination
Let's denote our observations in $n^{th}$ step by $D_n$
$f \sim $ some random process e.g. $\mathcal{GP}(\mu(x), \mathbf{K}(x,x')) $
$$P(f| D_n) \propto P(D_n | f) P(f) $$So we have a posterior random process representing our beliefs about the function we observe.
What now?
We can query this posterior random process by an acquisition function
Let denote $x^+$ best observed configuration so far
$\qquad\mathrm{PI}(x) = P(f(x) \geq f(x^+) + \xi)$
$\qquad \mathbf{E}\mathrm{I}(x)= \mathbf{E}(\max\{0, f_{t+1}(x)-f(x^+)\}|D_t)$
$\qquad \mathrm{UCB}(x) =\mu(x) + \kappa\sigma(x)$
M = [(x, evaluate(x)) for x in initial_sample]
while not stopping_criterium():
model = fit_model(M)
x = select_point(model)
y = evaluate(x)
M.add((x,y))





















Random Online Adaptive Racing (ROAR)
Sequential Model-based Algorithm Configuration (SMAC)
Tree-structured Parzen Estimator (TPE)
GP EI per second


BROCHU, Eric, Vlad M. CORA a Nando DE FREITAS, 2010. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning.
SNOEK, Jasper, Hugo LAROCHELLE a Ryan P. ADAMS. Practical Bayesian Optimization of Machine Learning Algorithms.
HUTTER, Frank, Holger H. HOOS a Kevin LEYTON-BROWN. Sequential Model-Based Optimization for General Algorithm Configuration.
Machine Learning lectures from UBC in 2013