博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Keras 中 TimeDistributed 和 TimeDistributedDense 理解
阅读量:4613 次
发布时间:2019-06-09

本文共 8001 字,大约阅读时间需要 26 分钟。

From the offical code:

class TimeDistributed(Wrapper):    """This wrapper applies a layer to every temporal slice of an input.    The input should be at least 3D, and the dimension of index one    will be considered to be the temporal dimension.    Consider a batch of 32 samples,    where each sample is a sequence of 10 vectors of 16 dimensions.    The batch input shape of the layer is then `(32, 10, 16)`,    and the `input_shape`, not including the samples dimension, is `(10, 16)`.    You can then use `TimeDistributed` to apply a `Dense` layer    to each of the 10 timesteps, independently:    ```python        # as the first layer in a model        model = Sequential()        model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))        # now model.output_shape == (None, 10, 8)    ```    The output will then have shape `(32, 10, 8)`.    In subsequent layers, there is no need for the `input_shape`:    ```python        model.add(TimeDistributed(Dense(32)))        # now model.output_shape == (None, 10, 32)    ```    The output will then have shape `(32, 10, 32)`.    `TimeDistributed` can be used with arbitrary layers, not just `Dense`,    for instance with a `Conv2D` layer:    ```python        model = Sequential()        model.add(TimeDistributed(Conv2D(64, (3, 3)),                                  input_shape=(10, 299, 299, 3)))    ```    # Arguments        layer: a layer instance.

So - basically the   TimeDistributedDense was introduced first in early versions of Keras in order to apply a  Dense layer stepwise to sequences.   TimeDistributed is a Keras wrapper which makes possible to get any static (non-sequential) layer and apply it in a sequential manner. An example of such usage might be using a e.g. pretrained convolutional layer to a short video clip by applying   TimeDistributed(conv_layer)  where  conv_layer   is applied to each frame of a clip. It produces the sequence of outputs which might be then consumed by next recurrent or   TimeDistributed  layer.

It's good to know that usage of   TimeDistributedDense is depreciated and it's better to use   TimeDistributed(Dense)  .

TimeDistributed 

RNNs are capable of a number of different types of input / output combinations, as seen below

The  TimeDistributedDense  layer allows you to build models that do the one-to-many and many-to-many architectures. This is because the output function for each of the "many" outputs must be the same function applied to each timestep. The  TimeDistributedDense  layers allows you to apply that Dense function across every output over time. This is important because it needs to be the same dense function applied at every time step.

If you didn't not use this, you would only have one final output - and so you use a normal dense layer. This means you are doing either a one-to-one or a many-to-one network, since there will only be one dense layer for the output.

 ======================================================

As fchollet said ,

TimeDistributedDense applies a same Dense (fully-connected) operation to every timestep of a 3D tensor.

But I think you still don't catch the point. The most common scenario for using TimeDistributedDense is using a recurrent NN for tagging task.e.g. POS labeling or slot filling task.

In this kind of task:

For each sample, the input is a sequence (a1,a2,a3,a4...aN) and the output is a sequence (b1,b2,b3,b4...bN) with the same length. bi could be viewed as the label of ai.
Push a1 into a recurrent nn to get output b1. Than push a2 and the hidden output of a1 to get b2...

If you want to model this by Keras, you just need to used a TimeDistributedDense after a RNN or LSTM layer(with return_sequence=True) to make the cost function is calculated on all time-step output. If you don't use TimeDistributedDense ans set the return_sequence of RNN=False, then the cost is calculated on the last time-step output and you could only get the last bN.

I am also new to Keras, but I am trying to use it to do sequence labeling and I find this could only be done by using TimeDistributedDense. If I make something wrong, please correct me.

 ======================================================

It's quite easy to understand . Let's not think in terms of tensors and stuffs for a sec.

It all depends upon the "return_sequences" parameter of the LSTM function.

if return_sequence = false ( by default , it's always false ), then we get LSTM output corresponding only to THE LAST TIME STEP.
Now applying model.add(Dense( )) , what we are doing is connecting only LSTM output at last time step to Dense Layer. (This approach is in encoding the overall sequence into a compact vector .
Now given a sequence of 50 words , my LSTM will only output only one word )

Ques) WHEN NOT TO USE TIMEDISTRIBUTED ?

Ans) In my experience, for encoder decoder model.
if you want to squeeze all your input information into a single vector, we DONT use TIMEDISTRIBUTED.
Only final unrolled layer of LSTM layer will be the output. This final layer will holder the compact information of whole input sequence which is useful for task like classification , summarization etc.
-----------------------------------------------------------However !-----------------------------------------------------------------------

If return_sequence is set True , LSTM outputs at every time step . So , I must use TimeDistributed to ensure that the Dense layer is connected to LSTM output at each TimeStep. Otherwise , error occurs !

Also keep in mind , just like lstm is unrolled , so is the dense layer . i.e dense layer at each time step is the same one . It's not like there are 50 different dense layer for 50 time steps.
There's nothing to get confused.
This time , model will generate a sequence corresponding to length of Timestep. So, given set of 50 input word , LSTM will output 50 output word

Q) WHEN TO USE TIMEDISTRIBUTED ?

A) In case of word generation task (like shakespeare) , where given a sequence of words , we train model predict next set of words .
EXAMPLE : if nth training input to LSTM Network is : 'I want to ' AND output of netwok is "want to eat" . Here , each word ['want','to','eat'] are output of LSTM during each timestep.

 ======================================================

Let's say you have time-series data with NN rows and 700700 columns which you want to feed to a SimpleRNN(200, return_sequence=True) layer in Keras. Before you feed that to the RNN, you need to reshape the previous data to a 3D tensor. So it becomes a N×700×1N×700×1.

 

 The image is taken from 

In RNN, your columns (the "700 columns") is the timesteps of RNN. Your data is processed from t=1 to 700t=1 to 700. After feeding the data to the RNN, now it have 700 outputs which are h1h1 to h700h700, not h1h1to h200h200. Remember that now the shape of your data is N×700×200N×700×200 which is samples (the rows) x timesteps (the columns) x channels.

And then, when you apply a TimeDistributedDense , you're applying a Dense  layer on each timestep, which means you're applying a Dense  layer on each h1h1, h2h2,...,htht respectively. Which means: actually you're applying the fully-connected operation on each of its channels (the "200" one) respectively, from h1h1 to h700h700. The 1st "1×1×2001×1×200" until the 700th "1×1×2001×1×200".

Why are we doing this? Because you don't want to flatten the RNN output.

Why not flattening the RNN output? Because you want to keep each timestep values separate.

Why keep each timestep values separate? Because:

  • you're only want to interacting the values between its own timestep
  • you don't want to have a random interaction between different timesteps and channels.

 

 

参考:

https://datascience.stackexchange.com/questions/10836/the-difference-between-dense-and-timedistributeddense-of-keras

https://github.com/keras-team/keras/blob/master/keras/layers/wrappers.py#L43

https://github.com/keras-team/keras/issues/1029

https://stackoverflow.com/questions/42398645/timedistributed-vs-timedistributeddense-keras

转载于:https://www.cnblogs.com/jins-note/p/10637805.html

你可能感兴趣的文章
connection string for Excel/Access 2010
查看>>
【转】【Python】Python中的__init__.py与模块导入(from import 找不到模块的问题)
查看>>
学习wavenet_vocoder之环境配置
查看>>
常用Maven命令
查看>>
Docker启动mysql的坑2
查看>>
j2ee爬坑行之二 servlet
查看>>
JAVA基础入门(JDK、eclipse下载安装)
查看>>
最基础的applet运用--在applet上画线
查看>>
并不对劲的hdu4777
查看>>
linux使用rz、sz快速上传、下载文件
查看>>
判断数字的正则表达式
查看>>
DOC常用命令(转)
查看>>
php写一个判断是否有cookie的脚本
查看>>
Mac配置Fiddler抓包工具
查看>>
转:Java并发集合
查看>>
Word截图PNG,并压缩图片大小
查看>>
Python项目对接CAS方案
查看>>
mysql产生随机数
查看>>
编程风格
查看>>
熟悉常用的Linux命令
查看>>