Managing the computation graph¶
Theano constructs computation graphs of mathematical expressions. Bricks help you build these graphs, but they do more than that. When you apply a brick to a Theano variable, it automatically annotates this Theano variable, in two ways:
- It defines the role this variable plays in the computation graph e.g. it will label weight matrices and biases as parameters, keep track of which variables were the in- and outputs of your bricks, and more.
- It constructs auxiliary variables. These are variables which are not outputs of your brick, but might still be of interest. For example, if you are training a neural network, you might be interested to know the norm of your weight matrices, so Blocks attaches these as auxiliary variables to the graph.
Using annotations¶
The ComputationGraph
class provides an interface to this annotated
graph. For example, let’s say we want to train an autoencoder using weight decay
on some of the layers.
>>> from theano import tensor
>>> x = tensor.matrix('features')
>>> from blocks.bricks import MLP, Logistic, Rectifier
>>> from blocks.initialization import IsotropicGaussian, Constant
>>> mlp = MLP(activations=[Rectifier()] * 2 + [Logistic()],
... dims=[784, 256, 128, 784],
... weights_init=IsotropicGaussian(), biases_init=Constant(0.01))
>>> y_hat = mlp.apply(x)
>>> from blocks.bricks.cost import BinaryCrossEntropy
>>> cost = BinaryCrossEntropy().apply(x, y_hat)
Our Theano computation graph is now defined by our loss, cost
. We initialize
the managed graph.
>>> from blocks.graph import ComputationGraph
>>> cg = ComputationGraph(cost)
We will find that there are many variables in this graph.
>>> print(cg.variables)
[TensorConstant{0}, b, W_norm, b_norm, features, TensorConstant{1.0}, ...]
To apply weight decay, we only need the weights matrices. These have been tagged
with the WEIGHT
role. So let’s create a filter that finds these for us.
>>> from blocks.filter import VariableFilter
>>> from blocks.roles import WEIGHT
>>> print(VariableFilter(roles=[WEIGHT])(cg.variables))
[W, W, W]
Note that the variables in cg.variables
are ordered according to the topological
order of their apply nodes. This means that for a feedforward network the
parameters will be returned in the order of our layers.
But let’s imagine for a second that we are actually dealing with a far more complicated network, and we want to apply weight decay to the parameters of one layer in particular. To do that, we can filter the variables by the bricks that created them.
>>> second_layer = mlp.linear_transformations[1]
>>> from blocks.roles import PARAMETER
>>> var_filter = VariableFilter(roles=[PARAMETER], bricks=[second_layer])
>>> print(var_filter(cg.variables))
[b, W]
Note
There are a variety of different roles that you can filter by. You might have
noted already that there is a hierarchy to many of them: Filtering by
PARAMETER
will also return variables of the child
roles WEIGHT
and BIAS
.
We can also see what auxiliary variables our bricks have created. These might be of interest to monitor during training, for example.
>>> print(cg.auxiliary_variables)
[W_norm, b_norm, W_norm, b_norm, W_norm, b_norm]