Algorithms

class blocks.algorithms.AdaDelta(decay_rate=0.95, epsilon=1e-06)[source]

Bases: blocks.algorithms.StepRule

Adapts the step size over time using only first order information.

Parameters:
  • decay_rate (float, optional) – Decay rate in [0, 1]. Defaults to 0.95.
  • epsilon (float, optional) – Stabilizing constant for RMS. Defaults to 1e-6.

Notes

For more information, see [ADADELTA].

[ADADELTA]Matthew D. Zeiler, ADADELTA: An Adaptive Learning Rate Method, arXiv:1212.5701.
compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.AdaGrad(learning_rate=0.002, epsilon=1e-06)[source]

Bases: blocks.algorithms.StepRule

Implements the AdaGrad learning rule.

Parameters:
  • learning_rate (float, optional) – Step size. Default value is set to 0.0002.
  • epsilon (float, optional) – Stabilizing constant for one over root of sum of squares. Defaults to 1e-6.

Notes

For more information, see [ADAGRAD].

[ADAGRAD]Duchi J, Hazan E, Singer Y., Adaptive subgradient methods for online learning and stochastic optimization, http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.Adam(learning_rate=0.002, beta1=0.9, beta2=0.999, epsilon=1e-08, decay_factor=1)[source]

Bases: blocks.algorithms.StepRule

Adam optimizer as described in [King2014].

[King2014]Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980
Parameters:
  • learning_rate (float, optional) – Step size. Default value is set to 0.002.
  • beta1 (float, optional) – Exponential decay rate for the first moment estimates. Default value is set to 0.9.
  • beta2 (float, optional) – Exponential decay rate for the second moment estimates. Default value is set to 0.999.
  • epsilon (float, optional) – Default value is set to 1e-8.
  • decay_factor (float, optional) – Default value is set to 1.
compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.BasicMomentum(momentum=0.0)[source]

Bases: blocks.algorithms.StepRule

Accumulates step with exponential discount.

Parameters:momentum (float, optional) – The momentum coefficient. Defaults to 0.

Notes

This step rule is intended to be used in conjunction with another step rule, _e.g._ Scale. For an all-batteries-included experience, look at Momentum.

compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.BasicRMSProp(decay_rate=0.9, max_scaling=100000.0)[source]

Bases: blocks.algorithms.StepRule

Scales the step size by a running average of the recent step norms.

Parameters:
  • decay_rate (float, optional) – How fast the running average decays, value in [0, 1] (lower is faster). Defaults to 0.9.
  • max_scaling (float, optional) – Maximum scaling of the step size, in case the running average is really small. Needs to be greater than 0. Defaults to 1e5.

Notes

This step rule is intended to be used in conjunction with another step rule, _e.g._ Scale. For an all-batteries-included experience, look at RMSProp.

In general, this step rule should be used _before_ other step rules, because it has normalization properties that may undo their work. For instance, it should be applied first when used in conjunction with Scale.

For more information, see [Hint2014].

compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.CompositeRule(components)[source]

Bases: blocks.algorithms.StepRule

Chains several step rules.

Parameters:components (list of StepRule) – The learning rules to be chained. The rules will be applied in the order as given.
compute_steps(previous_steps)[source]

Build a Theano expression for steps for all parameters.

Override this method if you want to process the steps with respect to all parameters as a whole, not parameter-wise.

Parameters:previous_steps (OrderedDict) – An OrderedDict of (TensorSharedVariable TensorVariable) pairs. The keys are the parameters being trained, the values are the expressions for quantities related to gradients of the cost with respect to the parameters, either the gradients themselves or steps in related directions.
Returns:
  • steps (OrderedDict) – A dictionary of the proposed steps in the same form as previous_steps.
  • updates (list) – A list of tuples representing updates to be performed.
class blocks.algorithms.GradientDescent(cost=None, parameters=None, step_rule=None, gradients=None, known_grads=None, consider_constant=None, **kwargs)[source]

Bases: blocks.algorithms.UpdatesAlgorithm

A base class for all gradient descent algorithms.

By “gradient descent” we mean a training algorithm of the following form:

for batch in data:
    steps = step_rule.compute_steps(parameters,
                                    gradients_wr_parameters)
    for parameter in parameters:
        parameter -= steps[parameter]

Note, that the step is subtracted, not added! This is done in order to make step rule chaining possible.

Parameters:
  • cost (TensorVariable, optional) – The objective to be minimized. Unused if gradients is specified.
  • parameters (list of TensorSharedVariable, optional) – The parameters to be tuned. If not provided, inferred from the keys of gradients (in which case gradients must be an OrderedDict).
  • step_rule (instance of StepRule, optional) – An object encapsulating most of the algorithm’s logic. Its compute_steps method is called to get Theano expression for steps. Note, that the step rule might have a state, e.g. to remember a weighted sum of gradients from previous steps like it is done in gradient descent with momentum. If None, an instance of Scale is created.
  • gradients (OrderedDict or list of 2-tuples, optional) – A dictionary mapping a parameter to an expression for the cost’s gradient with respect to the parameter, or equivalently, a list of (parameter, gradient) tuples. If None, the gradient are taken automatically using theano.gradient.grad().
  • known_grads (dict, optional) – A passthrough to theano.tensor.grad’s known_grads argument. Useful when you know the [approximate] gradients of some sub-expressions and would like Theano to use that information to compute parameter gradients. Only makes sense when gradients is None.
  • consider_constant (list, optional) – A passthrough to theano.tensor.grad’s consider_constant argument. A list of expressions through which gradients will not be backpropagated. Only makes sense when gradients is None.
gradients

OrderedDict – The gradient dictionary.

step_rule

instance of StepRule – The step rule.

Notes

Changing updates attribute or calling add_updates after the initialize method is called will have no effect.

If a cost and parameters are provided, gradients are taken immediately upon construction, and changes to these attributes after construction will have no effect.

gradients must be an OrderedDict if parameters is unspecified because ordinary dictionaries have an unpredictable iteration order due to hash randomization (which is enabled by default since versions 2.7.3 and 3.2.3 of Python). This source of variability, when combined with Theano’s heuristic graph optimizations, can cause serious reproducibility issues.

class blocks.algorithms.Momentum(learning_rate=1.0, momentum=0.0)[source]

Bases: blocks.algorithms.CompositeRule

Accumulates step with exponential discount.

Combines BasicMomentum and Scale to form the usual momentum step rule.

Parameters:
  • learning_rate (float, optional) – The learning rate by which the previous step scaled. Defaults to 1.
  • momentum (float, optional) – The momentum coefficient. Defaults to 0.
learning_rate

SharedVariable – A variable for learning rate.

momentum

SharedVariable – A variable for momentum.

See also

SharedVariableModifier

class blocks.algorithms.RMSProp(learning_rate=1.0, decay_rate=0.9, max_scaling=100000.0)[source]

Bases: blocks.algorithms.CompositeRule

Scales the step size by a running average of the recent step norms.

Combines BasicRMSProp and Scale to form the step rule described in [Hint2014].

[Hint2014](1, 2) Geoff Hinton, Neural Networks for Machine Learning, lecture 6a, http://cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Parameters:
  • learning_rate (float, optional) – The learning rate by which the previous step scaled. Defaults to 1.
  • decay_rate (float, optional) – How fast the running average decays (lower is faster). Defaults to 0.9.
  • max_scaling (float, optional) – Maximum scaling of the step size, in case the running average is really small. Defaults to 1e5.
learning_rate

SharedVariable – A variable for learning rate.

decay_rate

SharedVariable – A variable for decay rate.

See also

SharedVariableModifier

class blocks.algorithms.RemoveNotFinite(scaler=1)[source]

Bases: blocks.algorithms.StepRule

A step rule that skips steps with non-finite elements.

Replaces a step (the parameter update of a single shared variable) which contains non-finite elements (such as inf or NaN) with a step rescaling the parameters.

Parameters:scaler (float, optional) – The scaling applied to the parameter in case the step contains non-finite elements. Defaults to 1, which means that parameters will not be changed.

Notes

This rule should be applied last!

This trick was originally used in the GroundHog framework.

compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.Restrict(step_rule, variables)[source]

Bases: blocks.algorithms.StepRule

Applies a given StepRule only to certain variables.

Example applications include clipping steps on only certain parameters, or scaling a certain kind of parameter’s updates (e.g. adding an additional scalar multiplier to the steps taken on convolutional filters).

Parameters:
  • step_rule (StepRule) – The StepRule to be applied on the given variables.
  • variables (iterable) – A collection of Theano variables on which to apply step_rule. Variables not appearing in this collection will not have step_rule applied to them.
compute_steps(previous_steps)[source]

Build a Theano expression for steps for all parameters.

Override this method if you want to process the steps with respect to all parameters as a whole, not parameter-wise.

Parameters:previous_steps (OrderedDict) – An OrderedDict of (TensorSharedVariable TensorVariable) pairs. The keys are the parameters being trained, the values are the expressions for quantities related to gradients of the cost with respect to the parameters, either the gradients themselves or steps in related directions.
Returns:
  • steps (OrderedDict) – A dictionary of the proposed steps in the same form as previous_steps.
  • updates (list) – A list of tuples representing updates to be performed.
class blocks.algorithms.Scale(learning_rate=1.0)[source]

Bases: blocks.algorithms.StepRule

A step in the direction proportional to the previous step.

If used in GradientDescent alone, this step rule implements steepest descent.

Parameters:learning_rate (float) – The learning rate by which the previous step is multiplied to produce the step.
learning_rate

TensorSharedVariable – The shared variable storing the learning rate used.

compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

class blocks.algorithms.StepClipping(threshold=None)[source]

Bases: blocks.algorithms.StepRule

Rescales an entire step if its L2 norm exceeds a threshold.

When the previous steps are the gradients, this step rule performs gradient clipping.

Parameters:threshold (float, optional) – The maximum permitted L2 norm for the step. The step will be rescaled to be not higher than this quanity. If None, no rescaling will be applied.
threshold

tensor.TensorSharedVariable – The shared variable storing the clipping threshold used.

compute_steps(previous_steps)[source]

Build a Theano expression for steps for all parameters.

Override this method if you want to process the steps with respect to all parameters as a whole, not parameter-wise.

Parameters:previous_steps (OrderedDict) – An OrderedDict of (TensorSharedVariable TensorVariable) pairs. The keys are the parameters being trained, the values are the expressions for quantities related to gradients of the cost with respect to the parameters, either the gradients themselves or steps in related directions.
Returns:
  • steps (OrderedDict) – A dictionary of the proposed steps in the same form as previous_steps.
  • updates (list) – A list of tuples representing updates to be performed.
class blocks.algorithms.StepRule[source]

Bases: object

A rule to compute steps for a gradient descent algorithm.

compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.

compute_steps(previous_steps)[source]

Build a Theano expression for steps for all parameters.

Override this method if you want to process the steps with respect to all parameters as a whole, not parameter-wise.

Parameters:previous_steps (OrderedDict) – An OrderedDict of (TensorSharedVariable TensorVariable) pairs. The keys are the parameters being trained, the values are the expressions for quantities related to gradients of the cost with respect to the parameters, either the gradients themselves or steps in related directions.
Returns:
  • steps (OrderedDict) – A dictionary of the proposed steps in the same form as previous_steps.
  • updates (list) – A list of tuples representing updates to be performed.
class blocks.algorithms.TrainingAlgorithm[source]

Bases: object

Base class for training algorithms.

A training algorithm object has a simple life-cycle. First it is initialized by calling its initialize() method. At this stage, for instance, Theano functions can be compiled. After that the process_batch() method is repeatedly called with a batch of training data as a parameter.

initialize(**kwargs)[source]

Initialize the training algorithm.

process_batch(batch)[source]

Process a batch of training data.

batch

dict – A dictionary of (source name, data) pairs.

class blocks.algorithms.UpdatesAlgorithm(updates=None, theano_func_kwargs=None, on_unused_sources='raise', **kwargs)[source]

Bases: blocks.algorithms.TrainingAlgorithm

Base class for algorithms that use Theano functions with updates.

Parameters:
  • updates (list of tuples or OrderedDict) – The updates that should be performed.
  • theano_func_kwargs (dict, optional) – A passthrough to theano.function for additional arguments. Useful for passing profile or mode arguments to the theano function that will be compiled for the algorithm.
  • on_unused_sources (str, one of 'raise' (default), 'ignore', 'warn') – Controls behavior when not all sources in a batch are used (i.e. there is no variable with a matching name in the inputs of the computational graph of the updates).
updates

list of TensorSharedVariable updates – Updates to be done for every batch. It is required that the updates are done using the old values of optimized parameters.

Notes

Changing updates attribute or calling add_updates after the initialize method is called will have no effect.

add_updates(updates)[source]

Add updates to the training process.

The updates will be done _before_ the parameters are changed.

Parameters:updates (list of tuples or OrderedDict) – The updates to add.
initialize()[source]

Initialize the training algorithm.

process_batch(batch)[source]

Process a batch of training data.

batch

dict – A dictionary of (source name, data) pairs.

updates
class blocks.algorithms.VariableClipping(threshold, axis=None)[source]

Bases: blocks.algorithms.StepRule

Clip the maximum norm of individual variables along certain axes.

This StepRule can be used to implement L2 norm constraints on e.g. the weight vectors of individual hidden units, convolutional filters or entire weight tensors. Combine with Restrict (and possibly CompositeRule), to apply such constraints only to certain variables and/or apply different norm constraints to different variables.

Parameters:
  • threshold (float) – Maximum norm for a given (portion of a) tensor.
  • axis (int or iterable, optional) – An integer single axis, or an iterable collection of integer axes over which to sum in order to calculate the L2 norm. If None (the default), the norm is computed over all elements of the tensor.

Notes

Because of the way the StepRule API works, this particular rule implements norm clipping of the value after update in the following way: it computes parameter - previous_step, scales it to have (possibly axes-wise) norm(s) of at most threshold, then subtracts that value from parameter to yield an ‘equivalent step’ that respects the desired norm constraints. This procedure implicitly assumes one is doing simple (stochastic) gradient descent, and so steps computed by this step rule may not make sense for use in other contexts.

Investigations into max-norm regularization date from [Srebro2005]. The first appearance of this technique as a regularization method for the weight vectors of individual hidden units in feed-forward neural networks may be [Hinton2012].

[Srebro2005]Nathan Srebro and Adi Shraibman. “Rank, Trace-Norm and Max-Norm”. 18th Annual Conference on Learning Theory (COLT), June 2005.
[Hinton2012]Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature detectors”. arXiv:1207.0580.
compute_step(parameter, previous_step)[source]

Build a Theano expression for the step for a parameter.

This method is called by default implementation of compute_steps(), it relieves from writing a loop each time.

Parameters:
  • parameter (TensorSharedVariable) – The parameter.
  • previous_step (TensorVariable) – Some quantity related to the gradient of the cost with respect to the parameter, either the gradient itself or a step in a related direction.
Returns:

  • step (Variable) – Theano variable for the step to take.
  • updates (list) – A list of tuples representing updates to be performed. This is useful for stateful rules such as Momentum which need to update shared variables after itetations.