Create your own brick¶
This tutorial explains how to create a custom brick, which is useful if you want to group several specific operations (which can be bricks themselves) into a single one so that you can easily reuse it.
The first part of this tutorial lists the requirements and optional components that a brick should/can implement while the second part describes the construction of a simple toy brick.
This tutorial assumes that you are already familiar with bricks and how to use them from a user point of view.
Bricks ingredients and recipe¶
All the bricks in Blocks inherit directly or indirectly from the
Brick
. There is already a rich inheritance hierarchy of
bricks implemented in Blocks and thus, you should consider which brick level
to inherit from. Bear in mind that multiple inheritance is often possible and
advocated whenever it makes sense.
Here are examples of possible bricks to inherit from:
Sequence
: a sequence of bricks.Initializable
: a brick that defines a same initialization scheme (weights and biases) for all its children.Feedforward
: declares an interface for bricks with one input and one output.Linear
: a linear transformation with optional bias. Inherits fromInitializable
andFeedforward
.BaseRecurrent
: the base class for recurrent bricks. Check the tutorial about rnns for more information.- many more!
Let’s say that you want to create a brick from scratch, simply inheriting
from Brick
, then you should consider overwriting the
following methods (strictly speaking, all these methods are optional, check the
docstring of Brick
for a precise description of the
life-cycle of a brick):
Brick.__init__()
: you should pass by argument the attributes of your brick. It is also in this method that you should create the potential “children bricks” that belongs to your brick (in that case, you have to pass the children bricks tosuper().__init__
). The initialization of the attributes can be lazy as described later in the tutorial.apply()
: you need to implement a method that actually implements the operation of the brick, taking as arguments the inputs of the brick and returning its outputs. It can have any name and for simple bricks is often namedapply
. You should decorate it with theapplication()
decorator, as explained in the next section. If you design a recurrent brick, you should instead decorate it with therecurrent()
decorator as explained in the tutorial about rnns.Brick._allocate()
: you should implement this method to allocate the shared variables (often representing parameters) of the brick. In Blocks, by convention, the built-in bricks allocate their shared variables with nan values and we recommend you to do the same.Brick._initialize()
: you should implement this method to initialize the shared variables of your brick. This method is called after the allocation.Brick._push_allocation_config()
: you should consider overwriting this method if you want to change configuration of the children bricks before they allocate their parameters.Brick._push_initialization_config()
: you should consider overwriting this method if you want to change the initialization schemes of the children before they get initialized. If the children bricks need to be initialized with the same scheme, then you should inherit your brick fromInitializable
, which automatically pushes the initialization schemes of your brick (provided as argumentsweights_init
andbiases_init
of the constructor) to the children bricks.get_dim()
: implementing this function is useful if you want to provide a simple way to get the dimensions of the inputs and outputs of the brick.
If you want to inherit from a specific brick, check its docstring to identify the particular methods to overwrite and the attributes to define.
Application methods¶
The apply()
method listed above is probably the most
important method of your brick because it is the one that actually takes
theano tensors as inputs, process them and return output tensors. You should
decorate it with the application()
decorator, which names variables
and register auxiliary variables of the operation you implement.
It is used as follows:
>>> class Foo(Brick):
... @application(inputs=['input1', 'input2'], outputs=['output'])
... def apply(self, input1, input2):
... y = input1 + input2
... return y
In the case above, it will automatically rename the theano tensor variable
input1
to Foo_apply_input1
, input2
to Foo_apply_input2
and the
output of the method to foo_apply_output
. It will also add roles and names
to the tag attributes of the variables, as shown below:
>>> foo = Foo()
>>> i1 = tensor.matrix('i1')
>>> i2 = tensor.matrix('i2')
>>> y = foo.apply(i1, i2)
>>> theano.printing.debugprint(y)
Elemwise{identity} [id A] 'foo_apply_output'
|Elemwise{add,no_inplace} [id B] ''
|Elemwise{identity} [id C] 'foo_apply_input1'
| |i1 [id D]
|Elemwise{identity} [id E] 'foo_apply_input2'
|i2 [id F]
>>> print(y.name)
foo_apply_output
>>> print(y.tag.name)
output
>>> print(y.tag.roles)
[OUTPUT]
Under the hood, the @application
decorator creates an object of class
Application
, named apply
, which becomes an attribute of the
brick class (by opposition to class instances):
>>> print(type(Foo.apply))
<class 'blocks.bricks.base.Application'>
Application properties¶
In the previous examples, the names of the arguments of the application methods
were directly provided as arguments of the @application
decorator because
they were common to all instances of the classes. On the other hand, if these
names need to be defined differently for particular instances of the class,
you should use the apply.property
decorator. Let’s say that we want to
name our attribute inputs with the string self.fancy_name
, then we should
write:
>>> class Foo(Brick):
... def __init__(self, fancy_name):
... self.fancy_name = fancy_name
... @application
... def apply(self, input)
... ...
... @apply.property('inputs')
... def apply_inputs(self):
... # Note that you can use any python code to define the name
... return self.fancy_name
Using application calls¶
You may want to save particular variables defined in the apply
method in
order to use them later, for example to monitor them during training. For that,
you need to pass application_call
as argument of your apply
function
and use the add_auxiliary_variable
function to register your variables of
interest, as shown in this example:
>>> class Foo(Brick):
... @application
... def apply(self, x, application_call):
... application_call.add_auxiliary_variable(x.mean())
... return x + 1
add_auxiliary_variable
annotates the variable x.mean()
as an auxiliary
variable and you can thus later retrieve it with the computational graph
ComputationGraph
and filters VariableFilter
. In the
case of the Foo
Brick defined above, we retrieve x.mean()
as follows:
>>> from blocks.graph import ComputationGraph
>>> x = tensor.fmatrix('x')
>>> y = Foo().apply(x)
>>> cg = ComputationGraph(y)
>>> print(cg.auxiliary_variables)
[mean]
Lazy initialization¶
Instead of forcing the user to provide all the brick attributes as arguments
to the Brick.__init__()
method, you could let him/her specify them
later, after the creation of the brick. To enable this mechanism,
called lazy initialization, you need to decorate the constructor with the
lazy()
decorator:
>>> @lazy(allocation=['attr1', 'attr2'])
... def __init__(self, attr1, attr1)
... ...
This allows the user to specify attr1
and attr2
after the creation of
the brick. For example, the following ChainOfTwoFeedforward
brick is
composed of two Feedforward
bricks for which you do not need to
specify the input_dim
of brick2
directly at its creation.
>>> class ChainOfTwoFeedforward(Feedforward):
... """Two sequential Feedforward bricks."""
... def __init__(self, brick1, brick2, **kwargs):
... self.brick1 = brick1
... self.brick2 = brick2
... children = [self.brick1, self.brick2]
... kwargs.setdefault('children', []).extend(children)
... super(Feedforward, self).__init__(**kwargs)
...
... @property
... def input_dim(self):
... return self.brick1.input_dim
...
... @input_dim.setter
... def input_dim(self, value):
... self.brick1.input_dim = value
...
... @property
... def output_dim(self):
... return self.brick2.output_dim
...
... @output_dim.setter
... def output_dim(self, value):
... self.brick2.output_dim = value
...
... def _push_allocation_config(self):
... self.brick2.input_dim = self.brick1.get_dim('output')
...
... @application
... def apply(self, x):
... return self.brick2.apply(self.brick1.apply(x))
Note how get_dim
is used to retrieve the input_dim
of brick1
. You
can now use a ChainOfTwoFeedforward
brick as follows.
>>> brick1 = Linear(input_dim=3, output_dim=2, use_bias=False,
... weights_init=Constant(2))
>>> brick2 = Linear(output_dim=4, use_bias=False, weights_init=Constant(2))
>>>
>>> seq = ChainOfTwoFeedforward(brick1, brick2)
>>> seq.initialize()
>>> brick2.input_dim
2
Example¶
For the sake of the tutorial, let’s consider a toy operation that takes two batch inputs and multiplies them respectively by two matrices, resulting in two outputs.
The first step is to identify which brick to inherit from. Clearly we are
implementing a variant of the Linear
brick. Contrary to
Linear
, ours has two inputs and two outputs, which means that we can
not inherit from Feedforward
, which requires a single input and a
single output. Our brick will have to manage two shared variables
representing the matrices to multiply the inputs with. As we want to initialize
them with the same scheme, we should inherit from Initializable
,
which automatically push the initialization schemes to the children. The
initialization schemes are provided as arguments weights_init
and biases_init
of the constructor of our brick (in the kwargs
).
>>> class ParallelLinear(Initializable):
... r"""Two linear transformations without biases.
...
... Brick which applies two linear (affine) transformations by
... multiplying its two inputs with two weight matrices, resulting in
... two outputs.
... The two inputs, weights and outputs can have different dimensions.
...
... Parameters
... ----------
... input_dim{1,2} : int
... The dimensions of the two inputs.
... output_dim{1,2} : int
... The dimension of the two outputs.
... """
... @lazy(allocation=['input_dim1', 'input_dim2',
... 'output_dim1', 'output_dim2'])
... def __init__(self, input_dim1, input_dim2, output_dim1, output_dim2,
... **kwargs):
... super(ParallelLinear, self).__init__(**kwargs)
... self.input_dim1 = input_dim1
... self.input_dim2 = input_dim2
... self.output_dim1 = output_dim1
... self.output_dim2 = output_dim2
...
... def __allocate(self, input_dim, output_dim, number):
... W = shared_floatx_nans((input_dim, output_dim),
... name='W'+number)
... add_role(W, WEIGHT)
... self.parameters.append(W)
... self.add_auxiliary_variable(W.norm(2), name='W'+number+'_norm')
...
... def _allocate(self):
... self.__allocate(self.input_dim1, self.output_dim1, '1')
... self.__allocate(self.input_dim2, self.output_dim2, '2')
...
... def _initialize(self):
... W1, W2 = self.parameters
... self.weights_init.initialize(W1, self.rng)
... self.weights_init.initialize(W2, self.rng)
...
... @application(inputs=['input1_', 'input2_'], outputs=['output1',
... 'output2'])
... def apply(self, input1_, input2_):
... """Apply the two linear transformations.
...
... Parameters
... ----------
... input{1,2}_ : :class:`~tensor.TensorVariable`
... The two inputs on which to apply the transformations
...
... Returns
... -------
... output{1,2} : :class:`~tensor.TensorVariable`
... The two inputs multiplied by their respective matrices
...
... """
... W1, W2 = self.parameters
... output1 = tensor.dot(input1_, W1)
... output2 = tensor.dot(input2_, W2)
... return output1, output2
...
... def get_dim(self, name):
... if name == 'input1_':
... return self.input_dim1
... if name == 'input2_':
... return self.input_dim2
... if name == 'output1':
... return self.output_dim1
... if name == 'output2':
... return self.output_dim2
... super(ParallelLinear, self).get_dim(name)
You can test the brick as follows:
>>> input_dim1, input_dim2, output_dim1, output_dim2 = 10, 5, 2, 1
>>> batch_size1, batch_size2 = 1, 2
>>>
>>> x1_mat = 3 * numpy.ones((batch_size1, input_dim1),
... dtype=theano.config.floatX)
>>> x2_mat = 4 * numpy.ones((batch_size2, input_dim2),
... dtype=theano.config.floatX)
>>>
>>> x1 = theano.tensor.matrix('x1')
>>> x2 = theano.tensor.matrix('x2')
>>> parallel1 = ParallelLinear(input_dim1, input_dim2, output_dim1,
... output_dim2, weights_init=Constant(2))
>>> parallel1.initialize()
>>> output1, output2 = parallel1.apply(x1, x2)
>>>
>>> f1 = theano.function([x1, x2], [output1, output2])
>>> f1(x1_mat, x2_mat)
[array([[ 60., 60.]]...), array([[ 40.],
[ 40.]]...)]
One can also create the brick using Linear
children bricks, which
>>> class ParallelLinear2(Initializable):
... def __init__(self, input_dim1, input_dim2, output_dim1, output_dim2,
... **kwargs):
... self.linear1 = Linear(input_dim1, output_dim1,
... use_bias=False, **kwargs)
... self.linear2 = Linear(input_dim2, output_dim2,
... use_bias=False, **kwargs)
... children = [self.linear1, self.linear2]
... kwargs.setdefault('children', []).extend(children)
... super(ParallelLinear2, self).__init__(**kwargs)
...
... @application(inputs=['input1_', 'input2_'], outputs=['output1',
... 'output2'])
... def apply(self, input1_, input2_):
... output1 = self.linear1.apply(input1_)
... output2 = self.linear2.apply(input2_)
... return output1, output2
...
... def get_dim(self, name):
... if name in ['input1_', 'output1']:
... return self.linear1.get_dim(name)
... if name in ['input2_', 'output2']:
... return self.linear2.get_dim(name)
... super(ParallelLinear2, self).get_dim(name)
You can test this new version as follows:
>>> parallel2 = ParallelLinear2(input_dim1, input_dim2, output_dim1,
... output_dim2, weights_init=Constant(2))
>>> parallel2.initialize()
>>> # The weights_init initialization scheme is pushed to the children
>>> # bricks. We can verify it as follows.
>>> w = parallel2.weights_init
>>> w0 = parallel2.children[0].weights_init
>>> w1 = parallel2.children[1].weights_init
>>> print(w == w0 == w1)
True
>>>
>>> output1, output2 = parallel2.apply(x1, x2)
>>>
>>> f2 = theano.function([x1, x2], [output1, output2])
>>> f2(x1_mat, x2_mat)
[array([[ 60., 60.]]...), array([[ 40.],
[ 40.]]...)]
Actually it was not even necessary to create a custom brick for this particular
operation as Blocks has a brick, called :class:Parallel
, which
applies the same prototype brick to several inputs. In our case the prototype
brick we want to apply to our two inputs is a :class:Linear
brick with no
bias:
>>> parallel3 = Parallel(
... prototype=Linear(use_bias=False),
... input_names=['input1_', 'input2_'],
... input_dims=[input_dim1, input_dim2],
... output_dims=[output_dim1, output_dim2], weights_init=Constant(2))
>>> parallel3.initialize()
>>>
>>> output1, output2 = parallel3.apply(x1, x2)
>>>
>>> f3 = theano.function([x1, x2], [output1, output2])
>>> f3(x1_mat, x2_mat)
[array([[ 60., 60.]]...), array([[ 40.],
[ 40.]]...)]