{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "---\n", "---\n", "# Sequnce Modeling: Recurrent and Recursive Nets\n", "---\n", "---\n", "\n", "![image.png](img/chengjun.png)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Recurrent neural networks**, or RNNs (Rumelhart et al., 1986a), are a family of neural networks for processing sequential data. \n", "\n", "Much as a convolutional network is a neural network that is specialized for processing a grid of values $X$ such as an image, a recurrent neural network is a neural network that is specialized for processing a sequence of values $x^{(1)}$, . . . , $x^{(τ)}$. \n", "\n", "\n", "Most recurrent networks can also process sequences of *variable length*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**RNN Applications: series of data**\n", "\n", "- Time series prediction\n", "- Language modeling (text generation)\n", "- Text sentiment analysis \n", "- Named entity recognition\n", "- Translation\n", "- Speech recognition \n", "- Anomaly detection in time series \n", "- Music composition\n", "- ...\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl42.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl43.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![rnn.gif](img/rnn.gif)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl44.png)\n", "\n", "http://karpathy.github.io/2015/05/21/rnn-effectiveness/" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl45.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl46.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## RNN in a Nutshell\n", "\n", "![image.png](img/dl47.png)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:15:44.625106Z", "start_time": "2020-08-14T02:15:43.921157Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "from torch.autograd import Variable\n", "\n", "# One hot encoding for each char in 'hello'\n", "h = [1, 0, 0, 0]\n", "e = [0, 1, 0, 0]\n", "l = [0, 0, 1, 0]\n", "o = [0, 0, 0, 1]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:16:18.224895Z", "start_time": "2020-08-14T02:16:18.217153Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# One cell RNN input_dim (4) -> output_dim (2). sequence: 5\n", "cell = nn.RNN(input_size=4, hidden_size=2, batch_first=True)\n", "\n", "# (num_layers * num_directions, batch, hidden_size) whether batch_first=True or False\n", "hidden = Variable(torch.randn(1, 1, 2))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:17:19.798441Z", "start_time": "2020-08-14T02:17:19.770322Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sequence input size torch.Size([1, 5, 4]) out size torch.Size([1, 5, 2])\n" ] } ], "source": [ "# Propagate input through RNN\n", "# Input: (batch, seq_len, input_size) when batch_first=True\n", "inputs = Variable(torch.Tensor([h, e, l, l, o]))\n", "# Propagate input through RNN\n", "# Input: (batch, seq_len, input_size) when batch_first=True\n", "inputs = inputs.view(1, 5, -1)\n", "out, hidden = cell(inputs, hidden)\n", "print(\"sequence input size\", inputs.size(), \"out size\", out.size())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:17:59.151950Z", "start_time": "2020-08-14T02:17:59.143847Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "batch input size torch.Size([3, 5, 4]) out size torch.Size([3, 5, 2])\n" ] } ], "source": [ "# One cell RNN input_dim (4) -> output_dim (2). sequence: 5, batch 3\n", "# 3 batches 'hello', 'eolll', 'lleel'\n", "# rank = (3, 5, 4)\n", "inputs = Variable(torch.Tensor([[h, e, l, l, o],\n", " [e, o, l, l, l],\n", " [l, l, e, e, l]]))\n", "\n", "# hidden : (num_layers * num_directions, batch, hidden_size) whether batch_first=True or False\n", "hidden = Variable(torch.randn(1, 3, 2))\n", "\n", "# Propagate input through RNN\n", "# Input: (batch, seq_len, input_size) when batch_first=True\n", "# B x S x I\n", "out, hidden = cell(inputs, hidden)\n", "print(\"batch input size\", inputs.size(), \"out size\", out.size())" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T06:33:22.833157Z", "start_time": "2020-06-01T06:33:22.828395Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "tensor([[[1., 0., 0., 0.],\n", " [0., 1., 0., 0.],\n", " [0., 0., 1., 0.],\n", " [0., 0., 1., 0.],\n", " [0., 0., 0., 1.]],\n", "\n", " [[0., 1., 0., 0.],\n", " [0., 0., 0., 1.],\n", " [0., 0., 1., 0.],\n", " [0., 0., 1., 0.],\n", " [0., 0., 1., 0.]],\n", "\n", " [[0., 0., 1., 0.],\n", " [0., 0., 1., 0.],\n", " [0., 1., 0., 0.],\n", " [0., 1., 0., 0.],\n", " [0., 0., 1., 0.]]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inputs" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T06:34:41.312481Z", "start_time": "2020-06-01T06:34:41.307833Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "tensor([[[0.4196, 0.8705],\n", " [0.3543, 0.2878],\n", " [0.5785, 0.2704]]], grad_fn=)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hidden" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T06:33:14.483945Z", "start_time": "2020-06-01T06:33:14.477237Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "tensor([[[-0.4925, 0.2061],\n", " [-0.0136, 0.5975],\n", " [ 0.5275, 0.0605],\n", " [ 0.1702, 0.2726],\n", " [ 0.4196, 0.8705]],\n", "\n", " [[-0.6582, 0.6415],\n", " [ 0.6723, 0.6984],\n", " [ 0.4791, 0.5166],\n", " [ 0.4216, 0.3644],\n", " [ 0.3543, 0.2878]],\n", "\n", " [[ 0.4837, -0.5505],\n", " [-0.1786, 0.0664],\n", " [-0.1517, 0.7002],\n", " [ 0.2128, 0.7917],\n", " [ 0.5785, 0.2704]]], grad_fn=)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "out" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl48.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl49.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl50.png)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:20:13.089851Z", "start_time": "2020-08-14T02:20:13.079932Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import sys\n", "import torch\n", "import torch.nn as nn\n", "from torch.autograd import Variable\n", "\n", "torch.manual_seed(777) # reproducibility\n", "# 0 1 2 3 4\n", "idx2char = ['h', 'i', 'e', 'l', 'o']\n", "\n", "# Teach hihell -> ihello\n", "x_data = [0, 1, 0, 2, 3, 3] # hihell\n", "one_hot_lookup = [[1, 0, 0, 0, 0], # 0\n", " [0, 1, 0, 0, 0], # 1\n", " [0, 0, 1, 0, 0], # 2\n", " [0, 0, 0, 1, 0], # 3\n", " [0, 0, 0, 0, 1]] # 4\n", "\n", "y_data = [1, 0, 2, 3, 3, 4] # ihello\n", "x_one_hot = [one_hot_lookup[x] for x in x_data]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:20:42.148489Z", "start_time": "2020-08-14T02:20:42.138397Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "tensor([1, 0, 2, 3, 3, 4])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# As we have one batch of samples, we will change them to variables only once\n", "inputs = Variable(torch.Tensor(x_one_hot))\n", "labels = Variable(torch.LongTensor(y_data))\n", "\n", "num_classes = 5\n", "input_size = 5 # one-hot size\n", "hidden_size = 5 # output from the RNN. 5 to directly predict one-hot\n", "batch_size = 1 # one sentence\n", "sequence_length = 1 #Note: One by one\n", "num_layers = 1 # one-layer rnn\n", "labels" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:22:10.188154Z", "start_time": "2020-08-14T02:22:10.183466Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Model(nn.Module):\n", " def __init__(self):\n", " super(Model, self).__init__()\n", " self.rnn = nn.RNN(input_size=input_size,\n", " hidden_size=hidden_size, batch_first=True)\n", " def forward(self, hidden, x):\n", " # Reshape input (batch first)\n", " x = x.view(batch_size, sequence_length, input_size)\n", " # Propagate input through RNN\n", " # Input: (batch, seq_len, input_size)\n", " # hidden: (num_layers * num_directions, batch, hidden_size)\n", " out, hidden = self.rnn(x, hidden)\n", " return hidden, out.view(-1, num_classes)\n", " def init_hidden(self):\n", " # Initialize hidden and cell states\n", " # (num_layers * num_directions, batch, hidden_size)\n", " return Variable(torch.zeros(num_layers, batch_size, hidden_size))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:22:18.046504Z", "start_time": "2020-08-14T02:22:18.042001Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model(\n", " (rnn): RNN(5, 5, batch_first=True)\n", ")\n" ] } ], "source": [ "# Instantiate RNN model\n", "model = Model()\n", "print(model)\n", "\n", "# Set loss and optimizer function\n", "# CrossEntropyLoss = LogSoftmax + NLLLoss\n", "criterion = nn.CrossEntropyLoss()\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.1)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:22:24.405144Z", "start_time": "2020-08-14T02:22:24.048391Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "predicted string: llllll, epoch: 1, loss: 10.155\n", "predicted string: llllll, epoch: 2, loss: 9.137\n", "predicted string: llllll, epoch: 3, loss: 8.355\n", "predicted string: llllll, epoch: 4, loss: 7.577\n", "predicted string: llllll, epoch: 5, loss: 6.876\n", "predicted string: lhelll, epoch: 6, loss: 6.327\n", "predicted string: ihelll, epoch: 7, loss: 6.014\n", "predicted string: ihelll, epoch: 8, loss: 5.787\n", "predicted string: ihelll, epoch: 9, loss: 5.477\n", "predicted string: ihelll, epoch: 10, loss: 5.274\n", "predicted string: ihelll, epoch: 11, loss: 5.041\n", "predicted string: ihello, epoch: 12, loss: 4.827\n", "predicted string: ihello, epoch: 13, loss: 4.676\n", "predicted string: ihello, epoch: 14, loss: 4.550\n", "predicted string: ihello, epoch: 15, loss: 4.430\n", "predicted string: ihello, epoch: 16, loss: 4.305\n", "predicted string: ihello, epoch: 17, loss: 4.164\n", "predicted string: ihelll, epoch: 18, loss: 4.003\n", "predicted string: ihelll, epoch: 19, loss: 3.860\n", "predicted string: ihelll, epoch: 20, loss: 3.879\n", "predicted string: ihelll, epoch: 21, loss: 3.768\n", "predicted string: ihelll, epoch: 22, loss: 3.642\n", "predicted string: ihelll, epoch: 23, loss: 3.599\n", "predicted string: ihello, epoch: 24, loss: 3.577\n", "predicted string: ihello, epoch: 25, loss: 3.544\n", "predicted string: ihello, epoch: 26, loss: 3.498\n", "predicted string: ihello, epoch: 27, loss: 3.439\n", "predicted string: ihello, epoch: 28, loss: 3.371\n", "predicted string: ihello, epoch: 29, loss: 3.303\n", "predicted string: ihello, epoch: 30, loss: 3.240\n", "predicted string: ihello, epoch: 31, loss: 3.162\n", "predicted string: ihello, epoch: 32, loss: 3.147\n", "predicted string: ihello, epoch: 33, loss: 3.178\n", "predicted string: ihello, epoch: 34, loss: 3.116\n", "predicted string: ihello, epoch: 35, loss: 3.042\n", "predicted string: ihello, epoch: 36, loss: 3.020\n", "predicted string: ihello, epoch: 37, loss: 3.015\n", "predicted string: ihello, epoch: 38, loss: 2.998\n", "predicted string: ihello, epoch: 39, loss: 2.977\n", "predicted string: ihello, epoch: 40, loss: 2.966\n", "predicted string: ihello, epoch: 41, loss: 2.961\n", "predicted string: ihello, epoch: 42, loss: 2.950\n", "predicted string: ihello, epoch: 43, loss: 2.930\n", "predicted string: ihello, epoch: 44, loss: 2.904\n", "predicted string: ihello, epoch: 45, loss: 2.888\n", "predicted string: ihello, epoch: 46, loss: 2.888\n", "predicted string: ihello, epoch: 47, loss: 2.879\n", "predicted string: ihello, epoch: 48, loss: 2.860\n", "predicted string: ihello, epoch: 49, loss: 2.857\n", "predicted string: ihello, epoch: 50, loss: 2.859\n", "predicted string: ihello, epoch: 51, loss: 2.852\n", "predicted string: ihello, epoch: 52, loss: 2.840\n", "predicted string: ihello, epoch: 53, loss: 2.834\n", "predicted string: ihello, epoch: 54, loss: 2.834\n", "predicted string: ihello, epoch: 55, loss: 2.824\n", "predicted string: ihello, epoch: 56, loss: 2.817\n", "predicted string: ihello, epoch: 57, loss: 2.817\n", "predicted string: ihello, epoch: 58, loss: 2.814\n", "predicted string: ihello, epoch: 59, loss: 2.808\n", "predicted string: ihello, epoch: 60, loss: 2.805\n", "predicted string: ihello, epoch: 61, loss: 2.805\n", "predicted string: ihello, epoch: 62, loss: 2.801\n", "predicted string: ihello, epoch: 63, loss: 2.796\n", "predicted string: ihello, epoch: 64, loss: 2.795\n", "predicted string: ihello, epoch: 65, loss: 2.793\n", "predicted string: ihello, epoch: 66, loss: 2.789\n", "predicted string: ihello, epoch: 67, loss: 2.786\n", "predicted string: ihello, epoch: 68, loss: 2.786\n", "predicted string: ihello, epoch: 69, loss: 2.783\n", "predicted string: ihello, epoch: 70, loss: 2.780\n", "predicted string: ihello, epoch: 71, loss: 2.780\n", "predicted string: ihello, epoch: 72, loss: 2.778\n", "predicted string: ihello, epoch: 73, loss: 2.776\n", "predicted string: ihello, epoch: 74, loss: 2.775\n", "predicted string: ihello, epoch: 75, loss: 2.774\n", "predicted string: ihello, epoch: 76, loss: 2.772\n", "predicted string: ihello, epoch: 77, loss: 2.770\n", "predicted string: ihello, epoch: 78, loss: 2.769\n", "predicted string: ihello, epoch: 79, loss: 2.768\n", "predicted string: ihello, epoch: 80, loss: 2.766\n", "predicted string: ihello, epoch: 81, loss: 2.765\n", "predicted string: ihello, epoch: 82, loss: 2.764\n", "predicted string: ihello, epoch: 83, loss: 2.763\n", "predicted string: ihello, epoch: 84, loss: 2.762\n", "predicted string: ihello, epoch: 85, loss: 2.761\n", "predicted string: ihello, epoch: 86, loss: 2.759\n", "predicted string: ihello, epoch: 87, loss: 2.759\n", "predicted string: ihello, epoch: 88, loss: 2.758\n", "predicted string: ihello, epoch: 89, loss: 2.757\n", "predicted string: ihello, epoch: 90, loss: 2.756\n", "predicted string: ihello, epoch: 91, loss: 2.755\n", "predicted string: ihello, epoch: 92, loss: 2.754\n", "predicted string: ihello, epoch: 93, loss: 2.753\n", "predicted string: ihello, epoch: 94, loss: 2.752\n", "predicted string: ihello, epoch: 95, loss: 2.751\n", "predicted string: ihello, epoch: 96, loss: 2.750\n", "predicted string: ihello, epoch: 97, loss: 2.750\n", "predicted string: ihello, epoch: 98, loss: 2.749\n", "predicted string: ihello, epoch: 99, loss: 2.748\n", "predicted string: ihello, epoch: 100, loss: 2.747\n", "Learning finished!\n" ] } ], "source": [ "# Train the model\n", "for epoch in range(100):\n", " optimizer.zero_grad()\n", " loss = 0\n", " hidden = model.init_hidden()\n", "\n", " sys.stdout.write(\"predicted string: \")\n", " for input, label in zip(inputs, labels):\n", " # print(input.size(), label.size())\n", " hidden, output = model(hidden, input)\n", " val, idx = output.max(1)\n", " sys.stdout.write(idx2char[idx.data[0]])\n", " #label_one_hot = one_hot_lookup[label]\n", " loss += criterion(output, label.view(-1))\n", "\n", " print(\", epoch: %d, loss: %1.3f\" % (epoch + 1, loss.item()))\n", "\n", " loss.backward()\n", " optimizer.step()\n", "\n", "print(\"Learning finished!\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Unfolding one to n sequences\n", "\n", "![image.png](img/dl51.png)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T08:25:01.865952Z", "start_time": "2020-06-01T08:25:01.860982Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "idx2char = ['h', 'i', 'e', 'l', 'o']\n", "# Teach hihell -> ihello\n", "x_data = [[0, 1, 0, 2, 3, 3]] # hihell\n", "\n", "# Note: x_one_hot is changed to 1, 6, 5\n", "x_one_hot = [[[1, 0, 0, 0, 0], # h 0\n", " [0, 1, 0, 0, 0], # i 1\n", " [1, 0, 0, 0, 0], # h 0\n", " [0, 0, 1, 0, 0], # e 2\n", " [0, 0, 0, 1, 0], # l 3\n", " [0, 0, 0, 1, 0]]] # l 3\n", "\n", "y_data = [1, 0, 2, 3, 3, 4] # ihello" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T08:25:02.313130Z", "start_time": "2020-06-01T08:25:02.309602Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# As we have one batch of samples, we will change them to variables only once\n", "inputs = Variable(torch.Tensor(x_one_hot))\n", "labels = Variable(torch.LongTensor(y_data))\n", "\n", "num_classes = 5\n", "input_size = 5 # one-hot size\n", "hidden_size = 5 # output from the LSTM. 5 to directly predict one-hot\n", "batch_size = 1 # one sentence\n", "sequence_length = 6 # Note: |ihello| == 6\n", "num_layers = 1 # one-layer rnn" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-08-14T02:24:12.800185Z", "start_time": "2020-08-14T02:24:12.794329Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class RNN(nn.Module):\n", " def __init__(self, num_classes, input_size, hidden_size, num_layers):\n", " super(RNN, self).__init__()\n", " self.num_classes = num_classes\n", " self.num_layers = num_layers\n", " self.input_size = input_size\n", " self.hidden_size = hidden_size\n", " self.sequence_length = sequence_length\n", " self.rnn = nn.RNN(input_size=5, hidden_size=5, batch_first=True)\n", " def forward(self, x):\n", " # Initialize hidden and cell states\n", " # (num_layers * num_directions, batch, hidden_size) for batch_first=True\n", " h_0 = Variable(torch.zeros(\n", " self.num_layers, x.size(0), self.hidden_size))\n", " # Reshape input\n", " x.view(x.size(0), self.sequence_length, self.input_size)\n", " # Propagate input through RNN\n", " # Input: (batch, seq_len, input_size)\n", " # h_0: (num_layers * num_directions, batch, hidden_size)\n", " out, _ = self.rnn(x, h_0) \n", " return out.view(-1, num_classes)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T08:25:03.394452Z", "start_time": "2020-06-01T08:25:03.389212Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RNN(\n", " (rnn): RNN(5, 5, batch_first=True)\n", ")\n" ] } ], "source": [ "# Instantiate RNN model\n", "rnn = RNN(num_classes, input_size, hidden_size, num_layers)\n", "print(rnn)\n", "\n", "# Set loss and optimizer function\n", "# CrossEntropyLoss = LogSoftmax + NLLLoss\n", "criterion = torch.nn.CrossEntropyLoss()\n", "optimizer = torch.optim.Adam(rnn.parameters(), lr=0.1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T08:25:04.341055Z", "start_time": "2020-06-01T08:25:04.173109Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch: 1, loss: 1.544\n", "Predicted string: lellll\n", "epoch: 2, loss: 1.315\n", "Predicted string: lillll\n", "epoch: 3, loss: 1.169\n", "Predicted string: lieloo\n", "epoch: 4, loss: 1.041\n", "Predicted string: liello\n", "epoch: 5, loss: 0.942\n", "Predicted string: lhello\n", "epoch: 6, loss: 0.865\n", "Predicted string: ihello\n", "epoch: 7, loss: 0.801\n", "Predicted string: ihelll\n", "epoch: 8, loss: 0.752\n", "Predicted string: ihelll\n", "epoch: 9, loss: 0.721\n", "Predicted string: ihelll\n", "epoch: 10, loss: 0.693\n", "Predicted string: ihelll\n", "epoch: 11, loss: 0.667\n", "Predicted string: ihelll\n", "epoch: 12, loss: 0.648\n", "Predicted string: ihelll\n", "epoch: 13, loss: 0.631\n", "Predicted string: ihelll\n", "epoch: 14, loss: 0.617\n", "Predicted string: ihelll\n", "epoch: 15, loss: 0.605\n", "Predicted string: ihelll\n", "epoch: 16, loss: 0.595\n", "Predicted string: ihelll\n", "epoch: 17, loss: 0.588\n", "Predicted string: ihelll\n", "epoch: 18, loss: 0.582\n", "Predicted string: ihelll\n", "epoch: 19, loss: 0.574\n", "Predicted string: ihelll\n", "epoch: 20, loss: 0.568\n", "Predicted string: ihelll\n", "epoch: 21, loss: 0.564\n", "Predicted string: ihelll\n", "epoch: 22, loss: 0.561\n", "Predicted string: ihelll\n", "epoch: 23, loss: 0.557\n", "Predicted string: ihelll\n", "epoch: 24, loss: 0.554\n", "Predicted string: ihelll\n", "epoch: 25, loss: 0.552\n", "Predicted string: ihelll\n", "epoch: 26, loss: 0.550\n", "Predicted string: ihelll\n", "epoch: 27, loss: 0.547\n", "Predicted string: ihelll\n", "epoch: 28, loss: 0.545\n", "Predicted string: ihelll\n", "epoch: 29, loss: 0.543\n", "Predicted string: ihelll\n", "epoch: 30, loss: 0.542\n", "Predicted string: ihelll\n", "epoch: 31, loss: 0.540\n", "Predicted string: ihelll\n", "epoch: 32, loss: 0.539\n", "Predicted string: ihelll\n", "epoch: 33, loss: 0.538\n", "Predicted string: ihelll\n", "epoch: 34, loss: 0.537\n", "Predicted string: ihelll\n", "epoch: 35, loss: 0.536\n", "Predicted string: ihelll\n", "epoch: 36, loss: 0.535\n", "Predicted string: ihelll\n", "epoch: 37, loss: 0.535\n", "Predicted string: ihelll\n", "epoch: 38, loss: 0.534\n", "Predicted string: ihelll\n", "epoch: 39, loss: 0.533\n", "Predicted string: ihelll\n", "epoch: 40, loss: 0.533\n", "Predicted string: ihelll\n", "epoch: 41, loss: 0.533\n", "Predicted string: ihelll\n", "epoch: 42, loss: 0.532\n", "Predicted string: ihelll\n", "epoch: 43, loss: 0.532\n", "Predicted string: ihelll\n", "epoch: 44, loss: 0.531\n", "Predicted string: ihelll\n", "epoch: 45, loss: 0.531\n", "Predicted string: ihelll\n", "epoch: 46, loss: 0.531\n", "Predicted string: ihelll\n", "epoch: 47, loss: 0.530\n", "Predicted string: ihelll\n", "epoch: 48, loss: 0.530\n", "Predicted string: ihelll\n", "epoch: 49, loss: 0.530\n", "Predicted string: ihelll\n", "epoch: 50, loss: 0.529\n", "Predicted string: ihelll\n", "epoch: 51, loss: 0.529\n", "Predicted string: ihelll\n", "epoch: 52, loss: 0.529\n", "Predicted string: ihelll\n", "epoch: 53, loss: 0.529\n", "Predicted string: ihelll\n", "epoch: 54, loss: 0.529\n", "Predicted string: ihelll\n", "epoch: 55, loss: 0.528\n", "Predicted string: ihelll\n", "epoch: 56, loss: 0.528\n", "Predicted string: ihelll\n", "epoch: 57, loss: 0.528\n", "Predicted string: ihelll\n", "epoch: 58, loss: 0.528\n", "Predicted string: ihelll\n", "epoch: 59, loss: 0.528\n", "Predicted string: ihelll\n", "epoch: 60, loss: 0.528\n", "Predicted string: ihelll\n", "epoch: 61, loss: 0.527\n", "Predicted string: ihelll\n", "epoch: 62, loss: 0.527\n", "Predicted string: ihelll\n", "epoch: 63, loss: 0.527\n", "Predicted string: ihelll\n", "epoch: 64, loss: 0.527\n", "Predicted string: ihelll\n", "epoch: 65, loss: 0.527\n", "Predicted string: ihelll\n", "epoch: 66, loss: 0.527\n", "Predicted string: ihelll\n", "epoch: 67, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 68, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 69, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 70, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 71, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 72, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 73, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 74, loss: 0.526\n", "Predicted string: ihelll\n", "epoch: 75, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 76, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 77, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 78, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 79, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 80, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 81, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 82, loss: 0.525\n", "Predicted string: ihelll\n", "epoch: 83, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 84, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 85, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 86, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 87, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 88, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 89, loss: 0.524\n", "Predicted string: ihelll\n", "epoch: 90, loss: 0.523\n", "Predicted string: ihello\n", "epoch: 91, loss: 0.523\n", "Predicted string: ihello\n", "epoch: 92, loss: 0.523\n", "Predicted string: ihello\n", "epoch: 93, loss: 0.522\n", "Predicted string: ihello\n", "epoch: 94, loss: 0.522\n", "Predicted string: ihello\n", "epoch: 95, loss: 0.521\n", "Predicted string: ihello\n", "epoch: 96, loss: 0.520\n", "Predicted string: ihello\n", "epoch: 97, loss: 0.518\n", "Predicted string: ihello\n", "epoch: 98, loss: 0.514\n", "Predicted string: ihello\n", "epoch: 99, loss: 0.510\n", "Predicted string: ihello\n", "epoch: 100, loss: 0.513\n", "Predicted string: ihello\n", "Learning finished!\n" ] } ], "source": [ "# Train the model\n", "for epoch in range(100):\n", " outputs = rnn(inputs)\n", " optimizer.zero_grad()\n", " loss = criterion(outputs, labels)\n", " loss.backward()\n", " optimizer.step()\n", " _, idx = outputs.max(1)\n", " idx = idx.data.numpy()\n", " result_str = [idx2char[c] for c in idx.squeeze()]\n", " print(\"epoch: %d, loss: %1.3f\" % (epoch + 1, loss.item()))\n", " print(\"Predicted string: \", ''.join(result_str))\n", "\n", "print(\"Learning finished!\")\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## RNN with Embeddings\n", "\n", "![image.png](img/dl52.png)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T09:50:31.516078Z", "start_time": "2020-06-01T09:50:31.510366Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Lab 12 RNN\n", "import torch\n", "import torch.nn as nn\n", "from torch.autograd import Variable\n", "\n", "torch.manual_seed(777) # reproducibility\n", "\n", "idx2char = ['h', 'i', 'e', 'l', 'o']\n", "\n", "# Teach hihell -> ihello\n", "x_data = [[0, 1, 0, 2, 3, 3]] # hihell\n", "y_data = [1, 0, 2, 3, 3, 4] # ihello\n", "\n", "# As we have one batch of samples, we will change them to variables only once\n", "inputs = Variable(torch.LongTensor(x_data))\n", "labels = Variable(torch.LongTensor(y_data))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T09:50:32.208432Z", "start_time": "2020-06-01T09:50:32.205454Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "num_classes = 5\n", "input_size = 5\n", "# Note: add embedding size\n", "embedding_size = 10 # embedding size\n", "hidden_size = 5 # output from the LSTM. 5 to directly predict one-hot\n", "batch_size = 1 # one sentence\n", "sequence_length = 6 # |ihello| == 6\n", "num_layers = 1 # one-layer rnn" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T09:56:09.319605Z", "start_time": "2020-06-01T09:56:09.314098Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Model(nn.Module):\n", " def __init__(self):\n", " super(Model, self).__init__()\n", " self.embedding = nn.Embedding(input_size, embedding_size) \n", " # input_size: 有几个不同的词汇;embedding_size: 把这些词汇embed到几个维度的空间里\n", " self.rnn = nn.RNN(input_size=embedding_size,\n", " hidden_size=5, batch_first=True)\n", " self.fc = nn.Linear(hidden_size, num_classes)\n", "\n", " def forward(self, x):\n", " # Initialize hidden and cell states\n", " # (num_layers * num_directions, batch, hidden_size)\n", " h_0 = Variable(torch.zeros(\n", " num_layers, x.size(0), hidden_size))\n", "\n", " emb = self.embedding(x)ngth, -1)\n", "\n", " # Propagate embedding through RNN\n", " # Input: (batch, seq_len, embeddi\n", " emb = emb.view(batch_size, sequence_leng_size)\n", " # h_0: (num_layers * num_directions, batch, hidden_size)\n", " out, _ = self.rnn(emb, h_0)\n", " return self.fc(out.view(-1, num_classes))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T09:56:09.992643Z", "start_time": "2020-06-01T09:56:09.987817Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model(\n", " (embedding): Embedding(5, 10)\n", " (rnn): RNN(10, 5, batch_first=True)\n", " (fc): Linear(in_features=5, out_features=5, bias=True)\n", ")\n" ] } ], "source": [ "# Instantiate RNN model\n", "model = Model()\n", "print(model)\n", "\n", "# Set loss and optimizer function\n", "# CrossEntropyLoss = LogSoftmax + NLLLoss\n", "criterion = torch.nn.CrossEntropyLoss()\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.1)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2020-06-01T09:56:10.972110Z", "start_time": "2020-06-01T09:56:10.780673Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch: 1, loss: 0.007\n", "Predicted string: ihello\n", "epoch: 2, loss: 0.007\n", "Predicted string: ihello\n", "epoch: 3, loss: 0.007\n", "Predicted string: ihello\n", "epoch: 4, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 5, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 6, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 7, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 8, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 9, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 10, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 11, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 12, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 13, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 14, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 15, loss: 0.005\n", "Predicted string: ihello\n", "epoch: 16, loss: 0.005\n", "Predicted string: ihello\n", "epoch: 17, loss: 0.005\n", "Predicted string: ihello\n", "epoch: 18, loss: 0.005\n", "Predicted string: ihello\n", "epoch: 19, loss: 0.005\n", "Predicted string: ihello\n", "epoch: 20, loss: 0.006\n", "Predicted string: ihello\n", "epoch: 21, loss: 0.034\n", "Predicted string: ihello\n", "epoch: 22, loss: 0.261\n", "Predicted string: iheloo\n", "epoch: 23, loss: 0.422\n", "Predicted string: ihelll\n", "epoch: 24, loss: 0.605\n", "Predicted string: iheloo\n", "epoch: 25, loss: 0.812\n", "Predicted string: iheloo\n", "epoch: 26, loss: 0.713\n", "Predicted string: iheloo\n", "epoch: 27, loss: 0.533\n", "Predicted string: iheloo\n", "epoch: 28, loss: 0.341\n", "Predicted string: iheloo\n", "epoch: 29, loss: 0.242\n", "Predicted string: iheloo\n", "epoch: 30, loss: 0.306\n", "Predicted string: ihelll\n", "epoch: 31, loss: 0.429\n", "Predicted string: ihelll\n", "epoch: 32, loss: 0.501\n", "Predicted string: ihelll\n", "epoch: 33, loss: 0.511\n", "Predicted string: ihelll\n", "epoch: 34, loss: 0.465\n", "Predicted string: ihelll\n", "epoch: 35, loss: 0.370\n", "Predicted string: ihelll\n", "epoch: 36, loss: 0.215\n", "Predicted string: ihelll\n", "epoch: 37, loss: 0.272\n", "Predicted string: ihelli\n", "epoch: 38, loss: 0.272\n", "Predicted string: iheloo\n", "epoch: 39, loss: 0.266\n", "Predicted string: iheloo\n", "epoch: 40, loss: 0.276\n", "Predicted string: iheloo\n", "epoch: 41, loss: 0.271\n", "Predicted string: iheloo\n", "epoch: 42, loss: 0.222\n", "Predicted string: iheloo\n", "epoch: 43, loss: 0.164\n", "Predicted string: iheloo\n", "epoch: 44, loss: 0.115\n", "Predicted string: ihello\n", "epoch: 45, loss: 0.082\n", "Predicted string: ihello\n", "epoch: 46, loss: 0.067\n", "Predicted string: ihello\n", "epoch: 47, loss: 0.064\n", "Predicted string: ihello\n", "epoch: 48, loss: 0.068\n", "Predicted string: ihello\n", "epoch: 49, loss: 0.074\n", "Predicted string: ihello\n", "epoch: 50, loss: 0.071\n", "Predicted string: ihello\n", "Learning finished!\n" ] } ], "source": [ "# Train the model\n", "for epoch in range(50):\n", " outputs = model(inputs)\n", " optimizer.zero_grad()\n", " loss = criterion(outputs, labels)\n", " loss.backward()\n", " optimizer.step()\n", " _, idx = outputs.max(1)\n", " idx = idx.data.numpy()\n", " result_str = [idx2char[c] for c in idx.squeeze()]\n", " print(\"epoch: %d, loss: %1.3f\" % (epoch + 1, loss.item()))\n", " print(\"Predicted string: \", ''.join(result_str))\n", "\n", "print(\"Learning finished!\")\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Under the hood: RNN\n", "\n", "![image.png](img/dl53.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl54.png)\n", "\n", "https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl55.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "## Under the hood: LSTMs \n", "\n", "Long Short Term Memory networks \n", "\n", "![image.png](img/dl56.png)\n", "\n", "http://colah.github.io/posts/2015-08-Understanding-LSTMs/" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl57.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl58.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl59.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl60.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl61.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Under the hood: GRU\n", "\n", "![image.png](img/dl62.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](img/dl63.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "\n", "**Practical PyTorch: Classifying Names with a Character-Level RNN**\n", "\n", "We will train a basic character-level RNN to classify words. It reads words as a series of characters - outputting a prediction and \"hidden state\" at each step, feeding its previous hidden state into each next step. We take the final prediction to be the output, i.e. which class the word belongs to. \n", "\n", "Specifically, we'll train on a few thousand surnames from 18 languages of origin, and predict which language a name is from based on the spelling.\n", "\n", "https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise: Sentiment analysis on movie reviews\n", "\n", "The sentiment labels are:\n", "\n", "```\n", "0 - negative\n", "1 - somewhat negative\n", "2 - neutral\n", "3 - somewhat positive\n", "4 - positive\n", "```\n", "\n", "https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![image.png](img/chengjun2.png)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }