ml_things/keras_time_distribution.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "keras_time_distribution.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": []
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "[View in Colaboratory](https://colab.research.google.com/github/gmihaila/deep_learning_toolbox/blob/master/keras_time_distribution.ipynb)"
      ]
    },
    {
      "metadata": {
        "id": "gd0OaMEViQ7c",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Keras Time Distribution\n",
        "\n",
        "A short explanation of the Time Distribution layer\n",
        "\n",
        "The most common scenario for using TimeDistributedDense is using a recurrent NN for tagging task.e.g. POS labeling or slot filling task.\n",
        "\n",
        "In this kind of task:\n",
        "For each sample, the input is a sequence (a1,a2,a3,a4...aN) and the output is a sequence (b1,b2,b3,b4...bN) with the same length. bi could be viewed as the label of ai.\n",
        "Push a1 into a recurrent nn to get output b1. Than push a2 and the hidden output of a1 to get b2...\n",
        "\n",
        "If you want to model this by Keras, you just need to used a TimeDistributedDense after a RNN or LSTM layer(with return_sequence=True) to make the cost function is calculated on all time-step output. If you don't use TimeDistributedDense ans set the return_sequence of RNN=False, then the cost is calculated on the last time-step output and you could only get the last bN.\n",
        "\n",
        "RNNs are capable of a number of different types of input / output combinations, as seen below:\n",
        "\n",
        "![alt text](https://camo.githubusercontent.com/feb6f5f4f5f39c0d4e6f5657f6224db45258fe81/68747470733a2f2f6b617270617468792e6769746875622e696f2f6173736574732f726e6e2f64696167732e6a706567)\n",
        "\n",
        "The TimeDistributedDense layer allows you to build models that do the one-to-many and many-to-many architectures. This is because the output function for each of the \"many\" outputs must be the same function applied to each timestep. The TimeDistributedDense layers allows you to apply that Dense function across every output over time. This is important because it needs to be the same dense function applied at every time step.\n",
        "\n",
        "If you didn't not use this, you would only have one final output - and so you use a normal dense layer. This means you are doing either a one-to-one or a many-to-one network, since there will only be one dense layer for the output."
      ]
    },
    {
      "metadata": {
        "id": "StmoOUNbiNcK",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "TimeDistributed(Dense(n_units, activation='softmax'))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "kFVfcI66jDxH",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "ref: https://github.com/keras-team/keras/issues/1029 "
      ]
    }
  ]
}