# Create your own brick¶

This tutorial explains how to create a custom brick, which is useful if you want to group several specific operations (which can be bricks themselves) into a single one so that you can easily reuse it.

The first part of this tutorial lists the requirements and optional components that a brick should/can implement while the second part describes the construction of a simple toy brick.

This tutorial assumes that you are already familiar with
*bricks* and how to use them from a user point of view.

## Bricks ingredients and recipe¶

All the bricks in Blocks inherit directly or indirectly from the
`Brick`

. There is already a rich inheritance hierarchy of
bricks implemented in Blocks and thus, you should consider which brick level
to inherit from. Bear in mind that multiple inheritance is often possible and
advocated whenever it makes sense.

Here are examples of possible bricks to inherit from:

`Sequence`

: a sequence of bricks.`Initializable`

: a brick that defines a same initialization scheme (weights and biases) for all its children.`Feedforward`

: declares an interface for bricks with one input and one output.`Linear`

: a linear transformation with optional bias. Inherits from`Initializable`

and`Feedforward`

.`BaseRecurrent`

: the base class for recurrent bricks. Check the*tutorial about rnns*for more information.- many more!

Let’s say that you want to create a brick from scratch, simply inheriting
from `Brick`

, then you should consider overwriting the
following methods (strictly speaking, all these methods are optional, check the
docstring of `Brick`

for a precise description of the
life-cycle of a brick):

`Brick.__init__()`

: you should pass by argument the attributes of your brick. It is also in this method that you should create the potential “children bricks” that belongs to your brick (in that case, you have to pass the children bricks to`super().__init__`

). The initialization of the attributes can be lazy as described later in the tutorial.`apply()`

: you need to implement a method that actually implements the operation of the brick, taking as arguments the inputs of the brick and returning its outputs. It can have any name and for simple bricks is often named`apply`

. You should decorate it with the`application()`

decorator, as explained in the next section. If you design a recurrent brick, you should instead decorate it with the`recurrent()`

decorator as explained in the*tutorial about rnns*.`Brick._allocate()`

: you should implement this method to allocate the shared variables (often representing parameters) of the brick. In Blocks, by convention, the built-in bricks allocate their shared variables with nan values and we recommend you to do the same.`Brick._initialize()`

: you should implement this method to initialize the shared variables of your brick. This method is called after the allocation.`Brick._push_allocation_config()`

: you should consider overwriting this method if you want to change configuration of the children bricks before they allocate their parameters.`Brick._push_initialization_config()`

: you should consider overwriting this method if you want to change the initialization schemes of the children before they get initialized. If the children bricks need to be initialized with the same scheme, then you should inherit your brick from`Initializable`

, which automatically pushes the initialization schemes of your brick (provided as arguments`weights_init`

and`biases_init`

of the constructor) to the children bricks.`get_dim()`

: implementing this function is useful if you want to provide a simple way to get the dimensions of the inputs and outputs of the brick.

If you want to inherit from a specific brick, check its docstring to identify the particular methods to overwrite and the attributes to define.

### Application methods¶

The `apply()`

method listed above is probably the most
important method of your brick because it is the one that actually takes
theano tensors as inputs, process them and return output tensors. You should
decorate it with the `application()`

decorator, which names variables
and register auxiliary variables of the operation you implement.
It is used as follows:

```
>>> class Foo(Brick):
... @application(inputs=['input1', 'input2'], outputs=['output'])
... def apply(self, input1, input2):
... y = input1 + input2
... return y
```

In the case above, it will automatically rename the theano tensor variable
`input1`

to `Foo_apply_input1`

, `input2`

to `Foo_apply_input2`

and the
output of the method to `foo_apply_output`

. It will also add roles and names
to the tag attributes of the variables, as shown below:

```
>>> foo = Foo()
>>> i1 = tensor.matrix('i1')
>>> i2 = tensor.matrix('i2')
>>> y = foo.apply(i1, i2)
>>> theano.printing.debugprint(y)
Elemwise{identity} [id A] 'foo_apply_output'
|Elemwise{add,no_inplace} [id B] ''
|Elemwise{identity} [id C] 'foo_apply_input1'
| |i1 [id D]
|Elemwise{identity} [id E] 'foo_apply_input2'
|i2 [id F]
>>> print(y.name)
foo_apply_output
>>> print(y.tag.name)
output
>>> print(y.tag.roles)
[OUTPUT]
```

Under the hood, the `@application`

decorator creates an object of class
`Application`

, named `apply`

, which becomes an attribute of the
brick class (by opposition to class instances):

```
>>> print(type(Foo.apply))
<class 'blocks.bricks.base.Application'>
```

#### Application properties¶

In the previous examples, the names of the arguments of the application methods
were directly provided as arguments of the `@application`

decorator because
they were common to all instances of the classes. On the other hand, if these
names need to be defined differently for particular instances of the class,
you should use the `apply.property`

decorator. Let’s say that we want to
name our attribute inputs with the string `self.fancy_name`

, then we should
write:

```
>>> class Foo(Brick):
... def __init__(self, fancy_name):
... self.fancy_name = fancy_name
... @application
... def apply(self, input)
... ...
... @apply.property('inputs')
... def apply_inputs(self):
... # Note that you can use any python code to define the name
... return self.fancy_name
```

#### Using application calls¶

You may want to save particular variables defined in the `apply`

method in
order to use them later, for example to monitor them during training. For that,
you need to pass `application_call`

as argument of your `apply`

function
and use the `add_auxiliary_variable`

function to register your variables of
interest, as shown in this example:

```
>>> class Foo(Brick):
... @application
... def apply(self, x, application_call):
... application_call.add_auxiliary_variable(x.mean())
... return x + 1
```

`add_auxiliary_variable`

annotates the variable `x.mean()`

as an auxiliary
variable and you can thus later retrieve it with the computational graph
`ComputationGraph`

and filters `VariableFilter`

. In the
case of the `Foo`

Brick defined above, we retrieve `x.mean()`

as follows:

```
>>> from blocks.graph import ComputationGraph
>>> x = tensor.fmatrix('x')
>>> y = Foo().apply(x)
>>> cg = ComputationGraph(y)
>>> print(cg.auxiliary_variables)
[mean]
```

### Lazy initialization¶

Instead of forcing the user to provide all the brick attributes as arguments
to the `Brick.__init__()`

method, you could let him/her specify them
later, after the creation of the brick. To enable this mechanism,
called lazy initialization, you need to decorate the constructor with the
`lazy()`

decorator:

```
>>> @lazy(allocation=['attr1', 'attr2'])
... def __init__(self, attr1, attr1)
... ...
```

This allows the user to specify `attr1`

and `attr2`

after the creation of
the brick. For example, the following `ChainOfTwoFeedforward`

brick is
composed of two `Feedforward`

bricks for which you do not need to
specify the `input_dim`

of `brick2`

directly at its creation.

```
>>> class ChainOfTwoFeedforward(Feedforward):
... """Two sequential Feedforward bricks."""
... def __init__(self, brick1, brick2, **kwargs):
... self.brick1 = brick1
... self.brick2 = brick2
... children = [self.brick1, self.brick2]
... kwargs.setdefault('children', []).extend(children)
... super(Feedforward, self).__init__(**kwargs)
...
... @property
... def input_dim(self):
... return self.brick1.input_dim
...
... @input_dim.setter
... def input_dim(self, value):
... self.brick1.input_dim = value
...
... @property
... def output_dim(self):
... return self.brick2.output_dim
...
... @output_dim.setter
... def output_dim(self, value):
... self.brick2.output_dim = value
...
... def _push_allocation_config(self):
... self.brick2.input_dim = self.brick1.get_dim('output')
...
... @application
... def apply(self, x):
... return self.brick2.apply(self.brick1.apply(x))
```

Note how `get_dim`

is used to retrieve the `input_dim`

of `brick1`

. You
can now use a `ChainOfTwoFeedforward`

brick as follows.

```
>>> brick1 = Linear(input_dim=3, output_dim=2, use_bias=False,
... weights_init=Constant(2))
>>> brick2 = Linear(output_dim=4, use_bias=False, weights_init=Constant(2))
>>>
>>> seq = ChainOfTwoFeedforward(brick1, brick2)
>>> seq.initialize()
>>> brick2.input_dim
2
```

## Example¶

For the sake of the tutorial, let’s consider a toy operation that takes two batch inputs and multiplies them respectively by two matrices, resulting in two outputs.

The first step is to identify which brick to inherit from. Clearly we are
implementing a variant of the `Linear`

brick. Contrary to
`Linear`

, ours has two inputs and two outputs, which means that we can
not inherit from `Feedforward`

, which requires a single input and a
single output. Our brick will have to manage two shared variables
representing the matrices to multiply the inputs with. As we want to initialize
them with the same scheme, we should inherit from `Initializable`

,
which automatically push the initialization schemes to the children. The
initialization schemes are provided as arguments `weights_init`

and `biases_init`

of the constructor of our brick (in the `kwargs`

).

```
>>> class ParallelLinear(Initializable):
... r"""Two linear transformations without biases.
...
... Brick which applies two linear (affine) transformations by
... multiplying its two inputs with two weight matrices, resulting in
... two outputs.
... The two inputs, weights and outputs can have different dimensions.
...
... Parameters
... ----------
... input_dim{1,2} : int
... The dimensions of the two inputs.
... output_dim{1,2} : int
... The dimension of the two outputs.
... """
... @lazy(allocation=['input_dim1', 'input_dim2',
... 'output_dim1', 'output_dim2'])
... def __init__(self, input_dim1, input_dim2, output_dim1, output_dim2,
... **kwargs):
... super(ParallelLinear, self).__init__(**kwargs)
... self.input_dim1 = input_dim1
... self.input_dim2 = input_dim2
... self.output_dim1 = output_dim1
... self.output_dim2 = output_dim2
...
... def __allocate(self, input_dim, output_dim, number):
... W = shared_floatx_nans((input_dim, output_dim),
... name='W'+number)
... add_role(W, WEIGHT)
... self.parameters.append(W)
... self.add_auxiliary_variable(W.norm(2), name='W'+number+'_norm')
...
... def _allocate(self):
... self.__allocate(self.input_dim1, self.output_dim1, '1')
... self.__allocate(self.input_dim2, self.output_dim2, '2')
...
... def _initialize(self):
... W1, W2 = self.parameters
... self.weights_init.initialize(W1, self.rng)
... self.weights_init.initialize(W2, self.rng)
...
... @application(inputs=['input1_', 'input2_'], outputs=['output1',
... 'output2'])
... def apply(self, input1_, input2_):
... """Apply the two linear transformations.
...
... Parameters
... ----------
... input{1,2}_ : :class:`~tensor.TensorVariable`
... The two inputs on which to apply the transformations
...
... Returns
... -------
... output{1,2} : :class:`~tensor.TensorVariable`
... The two inputs multiplied by their respective matrices
...
... """
... W1, W2 = self.parameters
... output1 = tensor.dot(input1_, W1)
... output2 = tensor.dot(input2_, W2)
... return output1, output2
...
... def get_dim(self, name):
... if name == 'input1_':
... return self.input_dim1
... if name == 'input2_':
... return self.input_dim2
... if name == 'output1':
... return self.output_dim1
... if name == 'output2':
... return self.output_dim2
... super(ParallelLinear, self).get_dim(name)
```

You can test the brick as follows:

```
>>> input_dim1, input_dim2, output_dim1, output_dim2 = 10, 5, 2, 1
>>> batch_size1, batch_size2 = 1, 2
>>>
>>> x1_mat = 3 * numpy.ones((batch_size1, input_dim1),
... dtype=theano.config.floatX)
>>> x2_mat = 4 * numpy.ones((batch_size2, input_dim2),
... dtype=theano.config.floatX)
>>>
>>> x1 = theano.tensor.matrix('x1')
>>> x2 = theano.tensor.matrix('x2')
>>> parallel1 = ParallelLinear(input_dim1, input_dim2, output_dim1,
... output_dim2, weights_init=Constant(2))
>>> parallel1.initialize()
>>> output1, output2 = parallel1.apply(x1, x2)
>>>
>>> f1 = theano.function([x1, x2], [output1, output2])
>>> f1(x1_mat, x2_mat)
[array([[ 60., 60.]]...), array([[ 40.],
[ 40.]]...)]
```

One can also create the brick using `Linear`

children bricks, which

```
>>> class ParallelLinear2(Initializable):
... def __init__(self, input_dim1, input_dim2, output_dim1, output_dim2,
... **kwargs):
... self.linear1 = Linear(input_dim1, output_dim1,
... use_bias=False, **kwargs)
... self.linear2 = Linear(input_dim2, output_dim2,
... use_bias=False, **kwargs)
... children = [self.linear1, self.linear2]
... kwargs.setdefault('children', []).extend(children)
... super(ParallelLinear2, self).__init__(**kwargs)
...
... @application(inputs=['input1_', 'input2_'], outputs=['output1',
... 'output2'])
... def apply(self, input1_, input2_):
... output1 = self.linear1.apply(input1_)
... output2 = self.linear2.apply(input2_)
... return output1, output2
...
... def get_dim(self, name):
... if name in ['input1_', 'output1']:
... return self.linear1.get_dim(name)
... if name in ['input2_', 'output2']:
... return self.linear2.get_dim(name)
... super(ParallelLinear2, self).get_dim(name)
```

You can test this new version as follows:

```
>>> parallel2 = ParallelLinear2(input_dim1, input_dim2, output_dim1,
... output_dim2, weights_init=Constant(2))
>>> parallel2.initialize()
>>> # The weights_init initialization scheme is pushed to the children
>>> # bricks. We can verify it as follows.
>>> w = parallel2.weights_init
>>> w0 = parallel2.children[0].weights_init
>>> w1 = parallel2.children[1].weights_init
>>> print(w == w0 == w1)
True
>>>
>>> output1, output2 = parallel2.apply(x1, x2)
>>>
>>> f2 = theano.function([x1, x2], [output1, output2])
>>> f2(x1_mat, x2_mat)
[array([[ 60., 60.]]...), array([[ 40.],
[ 40.]]...)]
```

Actually it was not even necessary to create a custom brick for this particular
operation as Blocks has a brick, called :class:`Parallel`

, which
applies the same prototype brick to several inputs. In our case the prototype
brick we want to apply to our two inputs is a :class:`Linear`

brick with no
bias:

```
>>> parallel3 = Parallel(
... prototype=Linear(use_bias=False),
... input_names=['input1_', 'input2_'],
... input_dims=[input_dim1, input_dim2],
... output_dims=[output_dim1, output_dim2], weights_init=Constant(2))
>>> parallel3.initialize()
>>>
>>> output1, output2 = parallel3.apply(x1, x2)
>>>
>>> f3 = theano.function([x1, x2], [output1, output2])
>>> f3(x1_mat, x2_mat)
[array([[ 60., 60.]]...), array([[ 40.],
[ 40.]]...)]
```