<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>DL-With-PyTorch | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/dl-with-pytorch/</link><atom:link href="https://haobin-tan.netlify.app/tags/dl-with-pytorch/index.xml" rel="self" type="application/rss+xml"/><description>DL-With-PyTorch</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 27 Oct 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>DL-With-PyTorch</title><link>https://haobin-tan.netlify.app/tags/dl-with-pytorch/</link></image><item><title>Deep Learning with PyTorch</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/</link><pubDate>Sun, 18 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/</guid><description>&lt;h2 id="book">Book&lt;/h2>
&lt;p>&lt;a href="https://pytorch.org/deep-learning-with-pytorch">Deep Learning with PyTorch&lt;/a>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/deep-learning-thumbnail.png" alt="img" style="zoom:50%;" />
&lt;h2 id="code">Code&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://github.com/deep-learning-with-pytorch/dlwpt-code">Github repo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Almost all of our example notebooks contain the following boilerplate in the first cell (some lines may be missing in early chapters)&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="o">%&lt;/span>&lt;span class="n">matplotlib&lt;/span> &lt;span class="n">inline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">matplotlib&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">pyplot&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">plt&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">numpy&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">np&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.nn&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">nn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.nn.functional&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">F&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.optim&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">optim&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_printoptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">edgeitems&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">manual_seed&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">123&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Convention&lt;/p>
&lt;ul>
&lt;li>Variables named with a &lt;code>_t&lt;/code> suffix are tensors stored in CPU memory,&lt;/li>
&lt;li>Variables named with a &lt;code>_g&lt;/code> suffix are tensors in GPU memory&lt;/li>
&lt;li>Variables named with a &lt;code>_a&lt;/code> suffix are NumPy arrays.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Pretrained Networks</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch2/</link><pubDate>Sun, 18 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch2/</guid><description>&lt;h2 id="pretrained-network-for-object-recognition">Pretrained Network for Object Recognition&lt;/h2>
&lt;h3 id="use-pretrained-network-in-torchvision">Use pretrained network in &lt;code>TorchVision&lt;/code>&lt;/h3>
&lt;p>The &lt;a href="https://github.com/pytorch/vision">&lt;strong>TorchVision&lt;/strong> project&lt;/a>&lt;/p>
&lt;ul>
&lt;li>contains a few of the best-performing neural network architectures for computer vision, such as
&lt;ul>
&lt;li>AlexNet (&lt;a href="http://mng.bz/lo6z">http://mng.bz/lo6z&lt;/a>)&lt;/li>
&lt;li>ResNet (&lt;a href="https://arxiv.org/pdf/1512.03385.pdf">https://arxiv.org/pdf/1512.03385.pdf&lt;/a>)&lt;/li>
&lt;li>Inception v3 (&lt;a href="https://arxiv.org/pdf/1512.00567.pdf">https://arxiv.org/pdf/1512.00567.pdf&lt;/a>)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>has easy access to datasets like ImageNet and other utilities for getting up to speed with computer vision applications in PyTorch.&lt;/li>
&lt;/ul>
&lt;p>The predefined models can be found in &lt;code>torchvision.models&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">torchvision&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">models&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">dir&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">models&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-gdscript3" data-lang="gdscript3">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;AlexNet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;DenseNet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;GoogLeNet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;GoogLeNetOutputs&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;Inception3&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;InceptionOutputs&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;MNASNet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;MobileNetV2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;ResNet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;ShuffleNetV2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;SqueezeNet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;VGG&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;_GoogLeNetOutputs&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;_InceptionOutputs&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__builtins__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__cached__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__doc__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__file__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__loader__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__name__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__package__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__path__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;__spec__&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;_utils&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;alexnet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;densenet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;densenet121&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;densenet161&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;densenet169&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;densenet201&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;detection&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;googlenet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;inception&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;inception_v3&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mnasnet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mnasnet0_5&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mnasnet0_75&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mnasnet1_0&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mnasnet1_3&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mobilenet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;mobilenet_v2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;quantization&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnet101&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnet152&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnet18&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnet34&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnet50&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnext101_32x8d&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;resnext50_32x4d&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;segmentation&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;shufflenet_v2_x0_5&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;shufflenet_v2_x1_0&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;shufflenet_v2_x1_5&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;shufflenet_v2_x2_0&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;shufflenetv2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;squeezenet&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;squeezenet1_0&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;squeezenet1_1&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;utils&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg11&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg11_bn&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg13&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg13_bn&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg16&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg16_bn&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg19&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;vgg19_bn&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;video&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;wide_resnet101_2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;wide_resnet50_2&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>
&lt;p>The &lt;strong>capitalized names&lt;/strong> (e.g. ResNet) refer to Python classes that implement a number of popular models. They differ in their architecture—that is, in the arrangement of the operations occurring between the input and the output.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>E.g.: create an instance of the &lt;code>AlexNet&lt;/code> class.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># create an instance of AlexNet class&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">alexnet&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">models&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AlexNet&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>But wait! If we did that, we would be feeding data through the whole network to produce &amp;hellip; garbage!!! &amp;#x1f622;&lt;/p>
&lt;p>&lt;strong>That’s because the network is uninitialized: its weights, the numbers by which inputs are added and multiplied, have not been trained on anything—the network itself is a blank (or rather, random) slate.&lt;/strong> We’d need to either train it from scratch or load weights from prior training.&lt;/p>
&lt;p>To use models with predefined numbers of layers and units and optionally download and load pretrained weights into them, we need to use the &lt;strong>lowercase name&lt;/strong> in &lt;code>models&lt;/code> module.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>The &lt;strong>lowercase names&lt;/strong> are convenience functions that return models instantiated from those classes, sometimes with different parameter sets.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>For instance, &lt;code>resnet101&lt;/code> returns an instance of ResNet with 101 layers, &lt;code>resnet18&lt;/code> has 18 layers, and so on.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create an instance of the network and pass an argument that will instruct the function to download the weights of &lt;code>resnet101&lt;/code> trained on the ImageNet dataset, with 1.2 million images and 1,000 categories:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">resnet&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">models&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">resnet101&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">pretrained&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="load-and-show-an-image-from-the-local-filesystem">Load and show an image from the local filesystem&lt;/h3>
&lt;p>Use Pillow (&lt;a href="https://pillow.readthedocs.io/en/stable)">https://pillow.readthedocs.io/en/stable)&lt;/a>, an image-manipulation module for Python:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">PIL&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Image&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assume that the variable IMG_PATH holds the path of the image&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Image&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">IMG_PATH&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span> &lt;span class="c1"># show the image inline&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="set-eval-mode-before-inference">Set &lt;code>eval&lt;/code> mode before inference&lt;/h3>
&lt;p>In order to do inference, we need to put the network in &lt;code>eval&lt;/code> mode:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">resnet&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">eval&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;em>(If we forget to do that, some pretrained models, like batch normalization and dropout, will not produce meaningful answers, just because of the way they work internally.)&lt;/em>&lt;/p>
&lt;h3 id="retrieve-image-label">Retrieve image label&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>load a text file listing the labels in the same order they were presented to the network during training&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Pick out the label at the index that produced the highest score from the network.&lt;/strong>&lt;/li>
&lt;/ol>
&lt;p>(Almost all models meant for image recognition have output in a form similar to that)&lt;/p>
&lt;h2 id="torch-hub">Torch Hub&lt;/h2>
&lt;p>Torch Hub is &lt;strong>a mechanism through which authors can publish a model on GitHub, with or without pretrained weights, and expose it through an interface that PyTorch understands.&lt;/strong> This makes loading a pretrained model from a third party as easy as loading a TorchVision model.&lt;/p>
&lt;p>All it takes is to place a file named &lt;strong>hubconf.py&lt;/strong> in the root directory of the GitHub repository. An example is &lt;a href="https://github.com/pytorch/vision">TorchVision&lt;/a>, we can notice that it contains a &lt;strong>hubconf.py&lt;/strong>.&lt;/p>
&lt;p>Torch Hub is quite new, and there are only a few models published this way. We can get at them by Googling “github.com hubconf.py.”&lt;/p></description></item><item><title>PyTorch Tensor</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch3/</link><pubDate>Mon, 19 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch3/</guid><description>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="the-world-as-floating-point-numbers">The world as floating-point numbers&lt;/h2>
&lt;p>Neural networks transform floating-point representations into other floating- point representations. The starting and ending representations are typically human interpretable, but the intermediate representations are less so.&lt;/p>
&lt;p>To handle and store data, PyTorch introduces a undamental data structure: the &lt;strong>tensor&lt;/strong>. In the context of deep learning, tensors refer to the generalization of vectors and matrices to an arbitrary number of dimensions&lt;/p>
&lt;img src="https://drek4537l1klr.cloudfront.net/stevens2/Figures/CH03_F02_Stevens2_GS.png">
&lt;h2 id="tensors-multidimensional-arrays">Tensors: Multidimensional arrays&lt;/h2>
&lt;p>Another name for tensor is &lt;strong>multidimensional array&lt;/strong>. Compared to NumPy arrays, PyTorch tensors have a few superpowers, such as&lt;/p>
&lt;ul>
&lt;li>the ability to perform very fast operations on graphical processing units (GPUs)&lt;/li>
&lt;li>distribute operations on multiple devices or machines&lt;/li>
&lt;li>keep track of the graph of computations that created them.&lt;/li>
&lt;/ul>
&lt;h3 id="tensor-construction">Tensor construction&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>From python list:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">list&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">9&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([0, 1, 3, 3, 4, 5, 6, 7, 8])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Use constuctors from PyTorch&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;h3 id="the-essence-of-tensors">The essence of tensors&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-18%2023.24.14.png">
&lt;ul>
&lt;li>Python lists or tuples of numbers are collections of Python objects that are &lt;em>individually&lt;/em> allocated in memory.&lt;/li>
&lt;li>PyTorch tensors or NumPy arrays are views over (typically) &lt;em>contiguous&lt;/em> memory blocks containing &lt;em>unboxed&lt;/em> C numeric types rather than Python objects.&lt;/li>
&lt;/ul>
&lt;h3 id="indexing-tensors">Indexing tensors&lt;/h3>
&lt;p>Use range indxing notation just as in standard python lists.&lt;/p>
&lt;h2 id="tensor-element-types">Tensor element types&lt;/h2>
&lt;h3 id="specifying-the-numeric-type-with-dtype">Specifying the numeric type with &lt;code>dtype&lt;/code>&lt;/h3>
&lt;p>The dtype argument to tensor constructors (that is, functions like &lt;code>tensor&lt;/code>, &lt;code>zeros&lt;/code>, and &lt;code>ones&lt;/code>) specifies the numerical data (d) type that will be contained in the tensor. &lt;strong>The default data type for tensors is 32-bit floating-point.&lt;/strong>&lt;/p>
&lt;p>E.g.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">double_points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dtype&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">double&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="typical-dtype">Typical &lt;code>dtype&lt;/code>&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Computations happening in neural networks typically executed with &lt;strong>32-bit floating-point&lt;/strong> precision.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Tensors can be used as indexes in other tensors. In this case, PyTorch expects indexing tensors to have a &lt;strong>64-bit integer (&lt;code>int64&lt;/code>)&lt;/strong> data type.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Predicates on tensors, such as &lt;code>points &amp;gt; 1.0&lt;/code>, produce &lt;code>bool&lt;/code> tensors indicating whether each individual element satisfies the condition.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="casting-dtype">Casting &lt;code>dtype&lt;/code>&lt;/h3>
&lt;p>Cast the tensor to the right type using the corresponding casting method.&lt;/p>
&lt;p>For example, cast &lt;code>torch.int&lt;/code> to &lt;code>torch.double&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zeros&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dtype&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">double&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Or use the more convenient &lt;code>to&lt;/code> method:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">double&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">When mixing input types in operations, the inputs are converted to the larger type automatically.&lt;/span>
&lt;/div>
&lt;h2 id="the-tensor-api">The Tensor API&lt;/h2>
&lt;p>First, &lt;strong>the vast majority of operations on and between tensors are available in the &lt;code>torch&lt;/code> module and can also be called as methods of a tensor object&lt;/strong>. There is no difference between the two forms; they can be used interchangeably.&lt;/p>
&lt;p>Example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a_transpose&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transpose&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># call from the torch module&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">a_transpose&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([3, 2]), torch.Size([2, 3]))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a_transpose&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">a&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transpose&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># method of the tensor object&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">a_transpose&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([3, 2]), torch.Size([2, 3]))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The online docs (&lt;a href="http://pytorch.org/docs">http://pytorch.org/docs&lt;/a>) are exhaustive and well organized, with the tensor operations divided into groups:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Creation&lt;/strong> ops: Functions for constructing a tensor, like &lt;code>ones&lt;/code> and &lt;code>from_numpy&lt;/code>&lt;/li>
&lt;li>&lt;strong>Indexing, slicing, joining, mutating&lt;/strong> ops: Functions for changing the shape, stride, or content of a tensor, like &lt;code>transpose&lt;/code>&lt;/li>
&lt;li>&lt;strong>Math&lt;/strong> ops: Functions for manipulating the content of the tensor through computations
&lt;ul>
&lt;li>&lt;strong>Pointwise&lt;/strong> ops: Functions for obtaining a new tensor by applying a function to each element independently, like &lt;code>abs&lt;/code> and &lt;code>cos&lt;/code>&lt;/li>
&lt;li>&lt;strong>Reduction&lt;/strong> ops: Functions for computing aggregate values by iterating through tensors, like &lt;code>mean&lt;/code>, &lt;code>std&lt;/code>, and &lt;code>norm&lt;/code>&lt;/li>
&lt;li>&lt;strong>Comparison&lt;/strong> ops: Functions for evaluating numerical predicates over tensors, like &lt;code>equal&lt;/code> and &lt;code>max&lt;/code>&lt;/li>
&lt;li>&lt;strong>Spectral&lt;/strong> ops: Functions for transforming in and operating in the frequency domain, like &lt;code>stft&lt;/code> and &lt;code>hamming_window&lt;/code>&lt;/li>
&lt;li>Other operations: Special functions operating on vectors, like &lt;code>cross&lt;/code>, or matrices, like &lt;code>trace&lt;/code>&lt;/li>
&lt;li>&lt;strong>BLAS&lt;/strong> and &lt;strong>LAPACK&lt;/strong> operations—Functions following the Basic Linear Algebra Subprograms (BLAS) specification for scalar, vector-vector, matrix-vector, and matrix-matrix operations&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Random sampling&lt;/strong>: Functions for generating values by drawing randomly from probability distributions, like &lt;code>randn&lt;/code> and &lt;code>normal&lt;/code>&lt;/li>
&lt;li>&lt;strong>Serialization&lt;/strong>: Functions for saving and loading tensors, like load and save&lt;/li>
&lt;li>Parallelism: Functions for controlling the number of threads for parallel CPU execution, like set_num_threads&lt;/li>
&lt;/ul>
&lt;h2 id="tensors-scenic-views-of-storage">Tensors: Scenic views of storage&lt;/h2>
&lt;p>Values in tensors are allocated in contiguous chunks of memory managed by &lt;code>torch.Storage&lt;/code> instances.&lt;/p>
&lt;ul>
&lt;li>A &lt;strong>&lt;code>storage&lt;/code>&lt;/strong> is a one-dimensional array of numerical data: that is, a contiguous block of memory containing numbers of a given type&lt;/li>
&lt;li>A PyTorch &lt;code>Tensor&lt;/code> instance is a &lt;strong>view&lt;/strong> of such a Storage instance that is capable of indexing into that storage using an offset and per-dimension strides.&lt;/li>
&lt;/ul>
&lt;p>Multiple tensors can index the same storage even if they index into the data differently. For example:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-19%2015.46.59.png">
&lt;p>The underlying memory is allocated &lt;strong>only once&lt;/strong>. So creating alternate tensor-views of the data can be done quickly regardless of the size of the data managed by the &lt;code>Storage&lt;/code> instance.&amp;#x1f44f;&lt;/p>
&lt;h3 id="indexing-into-storage">Indexing into &lt;code>storage&lt;/code>&lt;/h3>
&lt;p>The storage for a given tensor is accessible using the &lt;code>.storage&lt;/code> property:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([[&lt;/span>&lt;span class="mf">4.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">5.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">3.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">2.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">]])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">storage&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl"> 4.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 1.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 5.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 3.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 2.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 1.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[torch.FloatStorage of size 6]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Even though the tensor reports itself as having three rows and two columns, the storage under the hood is a &lt;strong>contiguous array of size 6&lt;/strong>. In this sense, the tensor just knows how to translate a pair of indices into a location in the storage.&lt;/p>
&lt;p>Changing the value of a storage leads to changing the content of its referring tensor:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[4., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [5., 3.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [2., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_storage&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">2.0&lt;/span> &lt;span class="c1"># change the value of an element of a storage&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[2., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [5., 3.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [2., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="modifying-stored-values-in-place-operations">Modifying stored values: In-place operations&lt;/h3>
&lt;p>Methods &lt;strong>with trailing underscore&lt;/strong> in their name, like &lt;code>zero_&lt;/code>, indicates that the method operates &lt;strong>in place&lt;/strong> by modifying the input instead of creating a new output tensor and returning it.&lt;/p>
&lt;p>Any method &lt;strong>without the trailing underscore&lt;/strong> leaves the source tensor &lt;strong>unchanged&lt;/strong> and instead returns a &lt;strong>new&lt;/strong> tensor.&lt;/p>
&lt;p>Example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># in-place zeroing a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[0., 0.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [0., 0.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [0., 0.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="-tensor-metadata-size-offset-and-stride">🧐 Tensor metadata: Size, offset, and stride&lt;/h2>
&lt;p>In order to index into a storage, tensors rely on a few pieces of information that, together with their storage, unequivocally define them:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>size/shpae&lt;/strong>: a tuple indicating how many elements across each dimension the tensor represents.&lt;/li>
&lt;li>&lt;strong>(storage) offset&lt;/strong>: index in the storage corresponding to the first element in the tensor.&lt;/li>
&lt;li>&lt;strong>stride&lt;/strong>: number of elements in the storage that need to be skipped over to obtain the next element along each dimension.&lt;/li>
&lt;/ul>
&lt;img src="https://miro.medium.com/max/3916/1*pEDjDU4TgEJvtVFOhseIuA.png">
&lt;p>Example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([[&lt;/span>&lt;span class="mf">4.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">5.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">3.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">2.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">]])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">second_point&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>
&lt;p>Size/Shape&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">second_point&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">size&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">second_point&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Offset&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">second_point&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">storage_offset&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Stride&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">second_point.stride()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;p>This indirection between Tensor and Storage makes some operations inexpensive, like transposing a tensor or extracting a subtensor, because they do not lead to memory reallocations. &amp;#x1f44d; Instead, they consist of allocating a new Tensor object with a different value for size, storage offset, or stride.&lt;/p>
&lt;h3 id="cloning-a-tensor">Cloning a tensor&lt;/h3>
&lt;ul>
&lt;li>Use &lt;code>.clone()&lt;/code>&lt;/li>
&lt;li>Changing the cloned tensor won&amp;rsquo;t change the original tensor&lt;/li>
&lt;/ul>
&lt;h3 id="transposing-without-copying">Transposing without copying&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>For two-dimensional tensors, we can use &lt;code>t&lt;/code> function, a a shorthand alternative to &lt;code>transpose&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([[&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">]])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[3, 1, 2],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [4, 1, 7]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">t&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_t&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[3, 4],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1, 1],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [2, 7]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>These two tensors share the same storage&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">id&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">storage&lt;/span>&lt;span class="p">())&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nb">id&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">points_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">storage&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">True
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>They differ only in shape and stride:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Increasing the first index by one in &lt;code>points&lt;/code>—for example, going from points &lt;code>[0,0]&lt;/code> to points &lt;code>[1,0]&lt;/code>—will skip along the storage by two elements; while increasing the second index—from points &lt;code>[0,0]&lt;/code> to points &lt;code>[0,1]&lt;/code>—will skip along the storage by one. (In other words, the storage holds the elements in the tensor &lt;strong>sequentially row by row.&lt;/strong>)&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">points.shape, points.stride()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([2, 3]), (3, 1))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>The transpose from &lt;code>points&lt;/code> into &lt;code>points_t&lt;/code> looks like this:&lt;/p>
&lt;img src="https://drek4537l1klr.cloudfront.net/stevens2/v-12/Figures/p1ch3_transpose.png">
&lt;p>We change the order of the elements in the stride. After that, increasing the row (the first index of the tensor) will skip along the storage by one, just like when we were moving along columns in &lt;code>points&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">points_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stride&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([3, 2]), (1, 3))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This is the very definition of transposing. &lt;strong>No new memory is allocated&lt;/strong>: transposing is obtained only by creating a new Tensor instance with different stride ordering than the original.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="transposing-in-higher-dimensions">Transposing in higher dimensions&lt;/h3>
&lt;p>We can transpose a multidimensional array by specifying the two dimensions along which transposing should occur:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">some_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">some_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">torch.Size([3, 4, 5])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">transpose_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">some_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transpose&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">transpose_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">torch.Size([5, 4, 3])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="moving-tensors-between-cpu-and-gpu">Moving tensors between CPU and GPU&lt;/h2>
&lt;h3 id="managing-a-tensors-device-attribute">Managing a tensor’s &lt;code>device&lt;/code> attribute&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Create a tensor on the GPU by specifying the corresponding argument to the constructor:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># create a tensor on the GPU &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_gpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([[&lt;/span>&lt;span class="mf">4.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">5.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">3.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">2.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">]],&lt;/span> &lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;cuda&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Move tensor between CPU and GPU using the &lt;code>to&lt;/code> method:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([[&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">]])&lt;/span> &lt;span class="c1"># tensor on CPU&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_gpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;cuda&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># copy the tensor from CPU to GPU&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_cpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points_gpu&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;cpu&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># copy the tensor from GPU to CPU&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>If our machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">point_gpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;cuda:0&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>We can also use the shorthand methods &lt;code>cpu&lt;/code> and &lt;code>cuda&lt;/code> instead of the to method &lt;code>to&lt;/code> achieve the same goal:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">a&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a_gpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">a&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">cuda&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># cpu -&amp;gt; gpu(cuda:0)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a_gpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">a&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">cuda&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># explicitly specify which GPU&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">a_cpu&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">a_gpu&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">cpu&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># gpu -&amp;gt; cpu&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;h2 id="numpy-interoperability">NumPy interoperability&lt;/h2>
&lt;p>PyTorch tensors can be converted to NumPy arrays and vice versa very efficiently:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Pytorch tensor &amp;ndash;&amp;gt; Numpy array: &lt;code>numpy()&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ones&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># pytorch tensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_np&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">numpy&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># numpy array&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_np&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">array([[1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.]], dtype=float32)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>‼️ Note:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The returned array shares the &lt;strong>same&lt;/strong> underlying buffer with the tensor storage. This means the numpy method can be effectively executed at basically no cost, as long as the data sits in CPU RAM.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>It also means modifying the NumPy array will lead to a change in the originating tensor.&lt;/strong> If the tensor is allocated on the GPU, PyTorch will make a copy of the content of the tensor into a NumPy array allocated on the CPU.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_np&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">][&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">2&lt;/span> &lt;span class="c1"># changing an element of np array will also change tensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[1., 2., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Numpy array &amp;ndash;&amp;gt; Pytorch tensor: &lt;code>from_numpy()&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">from_numpy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">points_np&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([[1., 2., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>It aso use thesaem buffer-sharing strategy. I.e. Modifying the PyTorch tensor will lead to a change in the originating Numpy array:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">][&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">3&lt;/span> &lt;span class="c1"># change element of tensor will also change np array&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">points_np&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">array([[1., 2., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 3., 1., 1.],
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [1., 1., 1., 1.]], dtype=float32)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;h2 id="serializing-tensors">Serializing tensors&lt;/h2>
&lt;p>If the data inside is valuable, we will want to save it to a file and load it back at some point. After all, we don’t want to have to retrain a model from scratch every time we start running our program.&lt;/p>
&lt;p>PyTorch uses &lt;code>pickle&lt;/code> under the hood to serialize the tensor object, plus dedicated serialization code for the storage.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Save &lt;code>points &lt;/code> tensor to an &lt;strong>ourpoints.t&lt;/strong> file&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assuming the PATH variable holds the path of ourpoints.t file&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">save&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">points&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">PATH&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Load &lt;code>points&lt;/code> back:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PATH&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul></description></item><item><title>Real-world Data Representation Using Tensors</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch4/</link><pubDate>Wed, 21 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch4/</guid><description>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="images">Images&lt;/h2>
&lt;p>An image is represented as a collection of scalars arranged in a regular grid with a height and a width (in pixels).&lt;/p>
&lt;ul>
&lt;li>&lt;strong>grayscale&lt;/strong> image: single scalar per grid point (the pixel)&lt;/li>
&lt;li>&lt;strong>multi-color&lt;/strong> image: multiple scalars per grid point, which would typically represent different colors.
&lt;ul>
&lt;li>The most common way to encode color into numbers is &lt;strong>RGB&lt;/strong>, where a color is defined by three numbers representing the intensity of red, green, and blue.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="loading-an-image-file">Loading an image file&lt;/h3>
&lt;p>Loading a PNG image using the &lt;code>imageio&lt;/code> module:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">imageio&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Assume tha PATH variable holds the path of the image&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img_arr&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">imageio&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">imread&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PATH&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>At this point, &lt;code>img_arr&lt;/code> (of shape H x W x C) is a NumPy array-like object with three dimensions:&lt;/p>
&lt;ul>
&lt;li>two spatial dimensions, height (H) and width (W)&lt;/li>
&lt;li>a third dimension corresponding to the red, green, and blue channels (C)&lt;/li>
&lt;/ul>
&lt;h3 id="change-the-layout-to-pytorch-supported-layout">Change the layout to PyTorch supported layout&lt;/h3>
&lt;p>&lt;strong>PyTorch modules dealing with image data require tensors to be laid out as C × H × W : channels, height, and width, respectively.&lt;/strong>&lt;/p>
&lt;p>We can use the tensor’s &lt;code>permute&lt;/code> method with the old dimensions for each new dimension to get to an appropriate layout. Given an input tensor H × W × C as obtained previously, we get a proper layout by having channel 2 first and then channels 0 and 1:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">from_numpy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img_arr&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># np arr -&amp;gt; torch tensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">permute&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># adjust to pytorch required layout&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">Note: the &lt;code>permute()&lt;/code> operation does NOT make a copy of the tensor data. Instead, &lt;code>out&lt;/code> uses the &lt;strong>same&lt;/strong> underlying storage as &lt;code>img&lt;/code> and only plays with the size and stride information at the tensor level.&lt;/span>
&lt;/div>
&lt;h4 id="create-a-dataset-of-multiple-images">&lt;strong>Create a dataset of multiple images&lt;/strong>&lt;/h4>
&lt;p>To create a dataset of multiple images to use as an input for our neural networks, we store the images in a &lt;strong>batch&lt;/strong> along the &lt;strong>first&lt;/strong> dimension to obtain an &lt;strong>N&lt;/strong> × C × H × W tensor.&lt;/p>
&lt;p>How to do this?&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Pre-allocate&lt;/strong> a tensor of appropriate size.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">batch&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zeros&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">batch_size&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">256&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">256&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dtype&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">uint8&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>&lt;code>dtype=torch.uint8&lt;/code>: we’re expecting each color to be represented as an &lt;strong>8-bit integer&lt;/strong>, as in most photographic formats from standard consumer cameras.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Fill it with images loaded from a directory&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Now we can load all PNG images from an input directory and store them in the tensor:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">os&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assume data_dir is our input directory &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">filenames&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">name&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">name&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">listdir&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_dir&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">splitext&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">)[&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s1">&amp;#39;.png&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">filename&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">enumerate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">filenames&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_arr&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">imageio&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">imread&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">join&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_dir&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">filename&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">from_numpy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img_arr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">img_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">permute&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">img_t&lt;/span>&lt;span class="p">[:&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># just keep the first three channels (RGB)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">batch&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">img_t&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="normalizing-the-data">Normalizing the data&lt;/h3>
&lt;p>&lt;strong>Neural networks exhibit the best training performance when the input data ranges roughly from 0 to 1, or from -1 to 1.&lt;/strong>&lt;/p>
&lt;p>So a typical thing we’ll want to do is&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Cast a tensor to floating-point&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Normalize the values of the pixels&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>It depends on what range of the input we decide should lie between 0 and 1 (or -1 and 1)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>One possibility is to just divide the values of the pixels by 255 (the maximum representable number in 8-bit unsigned)&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">batch&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">batch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">float&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># cast to floating point tensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">batch&lt;/span> &lt;span class="o">/=&lt;/span> &lt;span class="mf">255.0&lt;/span> &lt;span class="c1"># normalize&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Another possibility for normalization is to compute the mean and standard deviation of the input data and scale it so that &lt;strong>the output has zero mean and unit standard deviation across each channel&lt;/strong>:
&lt;/p>
$$
\forall x \in \text{dataset}: \quad x:= \frac{x - \text{mean}}{\text{standard deviation}}
$$
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_channels&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">batch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># shpae is: N x C x H x W&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">c&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_channels&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">mean&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">mean&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">batch&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">std&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">std&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">batch&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">batch&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">batch&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">mean&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">std&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">In working with images, it is good practice to &lt;strong>compute the mean and standard deviation on all the training data in advance&lt;/strong> and then subtract and divide by these fixed, precomputed quantities.&lt;/span>
&lt;/div>
&lt;h2 id="tabular-data">Tabular data&lt;/h2>
&lt;p>&lt;strong>Spreadsheet, CSV file, or database: a table containing one row per sample (or record), where columns contain one piece of information about our sample.&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>There’s &lt;strong>no&lt;/strong> meaning to the order in which samples appear in the table (sch a table is a collection of &lt;strong>independent&lt;/strong> samples)&lt;/li>
&lt;li>Tabular data is typically &lt;strong>not homogeneous&lt;/strong>: different columns don’t have the same type.&lt;/li>
&lt;/ul>
&lt;p>PyTorch tensors, on the other hand, are homogeneous. Information in PyTorch is typically encoded as a number, typically floating-point (though integer types and Boolean are supported as well).&lt;/p>
&lt;h3 id="continuous-ordinal-and-categorical-values">Continuous, ordinal, and categorical values&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Type of values&lt;/th>
&lt;th>Have order?&lt;/th>
&lt;th>Have numerical meaning?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>categorical&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ordinal&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>✅&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>continuous&lt;/td>
&lt;td>✅&lt;/td>
&lt;td>✅&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>&lt;em>continuous&lt;/em> values&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>strictly ordered&lt;/p>
&lt;/li>
&lt;li>
&lt;p>a difference between various values has a strict meaning&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;p>&lt;em>Stating that package A is 2 kilograms heavier than package B, or that package B came from 100 miles farther away than A has a fixed meaning, regardless of whether package A is 3 kilograms or 10, or if B came from 200 miles away or 2,000.&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The literature actually divides continuous values further&lt;/p>
&lt;ul>
&lt;li>&lt;em>&lt;strong>ratio scale&lt;/strong>&lt;/em>: it makes sense to say something is twice as heavy or three times farther away&lt;/li>
&lt;li>&lt;em>&lt;strong>interval scale&lt;/strong>&lt;/em>: The time of day, does have the notion of difference, but it is not reasonable to claim that 6:00 is twice as late as 3:00&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;em>ordinal&lt;/em> values&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The strict ordering we have with continuous values remains, but the fixed relationship between values no longer applies.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example:&lt;/p>
&lt;p>&lt;em>Ordering a small, medium, or large drink, with small mapped to the value 1, medium 2, and large 3. The large drink is bigger than the medium, in the same way that 3 is bigger than 2, &lt;strong>but it doesn’t tell us anything about how much bigger&lt;/strong>.&lt;/em>&lt;/p>
&lt;p>&lt;em>If we were to convert our 1, 2, and 3 to the actual volumes (say, 8, 12, and 24 fluid ounces), then they would switch to being interval values.&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We can’t “do math” on the values outside of ordering them (&lt;em>trying to average large = 3 and small = 1 does not result in a medium drink!&lt;/em>)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;em>categorical&lt;/em> values&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>have neither ordering nor numerical meaning to their values. These are often just enumerations of possibilities assigned arbitrary numbers.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;p>&lt;em>Assigning water to 1, coffee to 2, soda to 3, and milk to 4. There’s no real logic to placing water first and milk last; they simply need distinct values to dif- ferentiate them. We could assign coffee to 10 and milk to –3, and there would be no significant change&lt;/em>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="loading-tabular-data">Loading tabular data&lt;/h3>
&lt;p>Python offers several options for quickly loading a CSV file. Three popular options are&lt;/p>
&lt;ul>
&lt;li>The &lt;code>csv&lt;/code> module that ships with Python&lt;/li>
&lt;li>NumPy&lt;/li>
&lt;li>Pandas (most time- and memory-efficient)&lt;/li>
&lt;/ul>
&lt;p>Since PyTorch has excellent NumPy interoperability, we’ll go with that.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">csv&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assume PATH variable holds the csv file&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">tabular_data_numpy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">loadtxt&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PATH&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dtype&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">float32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="c1"># type of the np arr should be&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">delimiter&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;;&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="c1"># delimiter used to separate values in each orw&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">skiprows&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="c1"># the first line should not be read since it contains the col names&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Convert the numpy array to pytorch tensor:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">tabular_data_tensor&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">from_numpy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tabular_data_numpy&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Get the names of each column&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">col_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">next&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">csv&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PATH&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">delimiter&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;;&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="one-hot-encoding">One-hot encoding&lt;/h3>
&lt;p>Assume that we use 1 to 10 to represent the score/class. We could build a &lt;strong>one-hot&lt;/strong> encoding of the scores: encode each of the 10 scores in a vector of 10 elements, with all elements set to 0 but one, at a different index for each score. &lt;em>For example, a score of 1 could be mapped onto the vector &lt;code>(1,0,0,0,0,0,0,0,0,0)&lt;/code>, a score of 5 onto &lt;code>(0,0,0,0,1,0,0,0,0,0)&lt;/code>, and so on.&lt;/em> Note that there&amp;rsquo;s no implied ordering or distance (i.e. they are categorical values) when we use one-hot encoding.&lt;/p>
&lt;p>We can achieve one-hot encoding using the &lt;code>scatter_&lt;/code> method, which fills the tensor with values from a source tensor along the indices provided as arguments:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assume that we already have the score tensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">score&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([6, 6, ..., 7, 6])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">score&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">torch.Size([4898])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">score_onehot&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zeros&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">score&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># in our case: score.shape[0] = 4898&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">score_onehot&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter_&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">score&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="mf">1.0&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;blockquote>
&lt;p>&lt;code>scatter_(dim, index, src)&lt;/code>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>dim&lt;/code>: The dimension along which the following two arguments are specified&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>index&lt;/code>: A column tensor indicating the indices of the elements to scatter&lt;/p>
&lt;ul>
&lt;li>
&lt;p>required to have the &lt;strong>same&lt;/strong> number of dimensions as the tensor we scatter into.&lt;/p>
&lt;p>Since &lt;code>score_onehot&lt;/code> has two dimensions (4,898 × 10), we need to add an extra dummy dimension to &lt;code>score&lt;/code> using &lt;code>unsqueeze&lt;/code>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>src&lt;/code>: A tensor containing the elements to scatter or a single scalar to scatter (1, in&lt;/p>
&lt;p>this case)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>In other words, the previous invocation reads, “For each row, take the index of the score label (which coincides with the score in our case) and use it as the column index to set the value 1.0.” The end result is a tensor encoding categorical information.&lt;/p>
&lt;/blockquote>
&lt;h3 id="when-to-categorise">When to categorise?&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Categorical&lt;/strong>: losing the ordering part, and hoping that maybe our model will pick it up during train- ing if we only have a few categories&lt;/li>
&lt;li>&lt;strong>Continuous&lt;/strong>: introducing an arbitrary notion of distance&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-10-20%2022.44.02.png" alt="截屏2020-10-20 22.44.02" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="text">Text&lt;/h2>
&lt;p>🎯 Goal: turn text into tensors of numbers that a neural network can process.&lt;/p>
&lt;h3 id="converting-text-to-numbers">Converting text to numbers&lt;/h3>
&lt;p>There are two particularly intuitive levels at which networks operate on text:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>character&lt;/strong> level: processing one character at a time&lt;/li>
&lt;li>&lt;strong>word&lt;/strong> level: individual words are the finest-grained entities to be seen by the network.&lt;/li>
&lt;/ul>
&lt;p>The technique with which we encode text information into tensor form is the same whether we operate at the character level or the word level.&lt;/p>
&lt;h3 id="one-hot-encoding-characters">One-hot-encoding characters&lt;/h3>
&lt;p>First we will load the text:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assume PATH variable holds the txt file&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PATH&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">encoding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;utf8&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">f&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">text&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">f&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Encoding&lt;/strong> of the character: Every written character is represented by a code (a sequence of bits of appropriate length so that each character can be uniquely identified).&lt;/p>
&lt;p>We are going to one-hot encode our characters. Depending on the task at hand, we could&lt;/p>
&lt;ul>
&lt;li>make all of the characters lowercase, to reduce the number of different characters in our encoding&lt;/li>
&lt;li>screen out punctuation, numbers, or other characters that aren’t relevant to our expected kinds of text.&lt;/li>
&lt;/ul>
&lt;p>At this point, we need to parse through the characters in the text and provide a one-hot encoding for each of them: Each character will be represented by a vector of length equal to the number of different characters in the encoding. This vector will contain &lt;strong>all zeros except a one at the index corresponding to the location of the character in the encoding.&lt;/strong>&lt;/p>
&lt;p>For the sake of simplicity, we first split our text into a list of lines and pick an arbitrary line to focus on:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># split text into a list of lines&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">line&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">lines&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">200&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># pick arbitrary line&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">letter_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zeros&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">line&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="mi">128&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 128 hardcoded due to the limits of ASCII&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">letter&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">enumerate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">line&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">lower&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">strip&lt;/span>&lt;span class="p">()):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The text uses directional double quotes, which are not valid ASCII, &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># so we screen them out here.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">letter_index&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">ord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">letter&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="nb">ord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">letter&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="mi">128&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">letter_t&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">][&lt;/span>&lt;span class="n">letter_index&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="one-hot-encoding-whole-words">One-hot encoding whole words&lt;/h3>
&lt;p>We’ll define a helper function&lt;code>clean_words&lt;/code>, which takes text and returns it in lowercase and stripped of punctuation.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">clean_words&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">input_str&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">punctuation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;.,;:&amp;#34;!?”“_-&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">word_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_str&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">lower&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">replace&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">word_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">strip&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">punctuation&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">word&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">word_list&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">word_list&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When we call it on our “Impossible, Mr. Bennet” line, we get the following:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">words_in_line&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">clean_words&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">line&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">line&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">words_in_line&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(&amp;#39;“Impossible, Mr. Bennet, impossible, when I am not acquainted with him&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [&amp;#39;impossible&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;mr&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;bennet&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;impossible&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;when&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;i&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;am&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;not&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;acquainted&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;with&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;him&amp;#39;])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, let&amp;rsquo;s build a mapping of all words in &lt;code>text&lt;/code> to indexes in our encoding:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">sorted&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">clean_words&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">word2index_dict&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">enumerate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">word_list&lt;/span>&lt;span class="p">)}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>word2index_dict&lt;/code> is now a dictionary with words as keys and an integer as a value. We will use it to efficiently find the index of a word as we one-hot encode it. For example, let&amp;rsquo;s look up the index of word &amp;ldquo;possible&amp;rdquo;:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">word2index_dict&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;possible&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">10421
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Let&amp;rsquo;s see how can we one-hot encode the words of sentence &amp;ldquo;Impossible, Mr. Bennet, impossible, when I am not acquainted with him&amp;rdquo;:&lt;/p>
&lt;ol>
&lt;li>create an empty tensor&lt;/li>
&lt;li>assign the one-hot-encoded values of the word in the sentence&lt;/li>
&lt;/ol>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># create an empty tensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zeros&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">words_in_line&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">word2index_dict&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assign the one-hot-encoded values of the word in the sentence&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">word&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">enumerate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">words_in_line&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">word_index&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">word2index_dict&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">word_t&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">][&lt;/span>&lt;span class="n">word_index&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">2&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">word_index&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">4&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl"> 0 6925 impossible
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 1 8832 mr
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 2 1906 bennet
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 3 6925 impossible
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 4 14844 when
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 5 6769 i
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 6 714 am
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 7 9198 not
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 8 312 acquainted
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> 9 15085 with
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">10 6387 him
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">torch.Size([11, 15514])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The choice between character-level and word-level encoding leaves us to make a trade-off&lt;/p>
&lt;ul>
&lt;li>In many languages, &lt;strong>there are significantly fewer characters than words&lt;/strong>: representing characters has us representing just a few classes, while representing words requires us to represent a very large number of classes&lt;/li>
&lt;li>On the other hand, words convey much more meaning than individual characters, so a representation of words is considerably more informative by itself.&lt;/li>
&lt;/ul>
&lt;h3 id="text-embeddings">Text embeddings&lt;/h3>
&lt;p>&lt;strong>Embedding&lt;/strong> is to find an effective way to map individual words into a fixed number (let&amp;rsquo;s say, 100) dimensional space in a way that facilitates downstream learning. &lt;strong>An ideal solution would be to generate the embedding in such a way that words used in &lt;em>similar&lt;/em> contexts mapped to &lt;em>nearby&lt;/em> regions of the embedding.&lt;/strong>&lt;/p>
&lt;p>Embeddings are often generated using neural networks, trying to predict a word from nearby words (the context) in a sentence. In this case, we could start from one-hot-encoded words and use a (usually rather shallow) neural network to generate the embedding. Once the embedding was available, we could use it for downstream tasks.&lt;/p>
&lt;p>One interesting aspect of the resulting embeddings is that similar words end up not only clustered together, but also having consistent spatial relationships with other words. For example, if we were to take the embedding vector for &lt;em>apple&lt;/em> and begin to add and subtract the vectors for other words, we could begin to perform analogies like &lt;em>apple&lt;/em> - &lt;em>red&lt;/em> - &lt;em>sweet&lt;/em> + &lt;em>yellow&lt;/em> + &lt;em>sour&lt;/em> and end up with a vector very similar to the one for &lt;em>lemon&lt;/em>.&lt;/p></description></item><item><title>The Mechanics of Learning</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch5/</link><pubDate>Sat, 24 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch5/</guid><description>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="learning-is-just-parameter-estimation">Learning is just parameter estimation&lt;/h2>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-10-22%2011.17.44.png" alt="截屏2020-10-22 11.17.44" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>Given
&lt;ul>
&lt;li>input data&lt;/li>
&lt;li>corresponding desired outputs (ground truth)&lt;/li>
&lt;li>initial values for the weights&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>The model is fed input data (forward pass)&lt;/li>
&lt;li>A measure of the error is evaluated by comparing the resulting outputs to the ground truth&lt;/li>
&lt;li>In order to optimize the parameter of the model (its &lt;strong>weights&lt;/strong>)
&lt;ul>
&lt;li>The change in the error following a unit change in weights (that is, the gradient of the error with respect to the parameters) is computed using the chain rule for the derivative of a composite function (backward pass)&lt;/li>
&lt;li>The value of the weights is then updated in the direction that leads to a decrease in the error&lt;/li>
&lt;li>The procedure is repeated until the error, evaluated on unseen data, falls below an acceptable level.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="a-simple-linear-model">A simple linear model&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">t_c = w * t_u + b
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>&lt;code>w&lt;/code>: weight, tells us how much a given input influence the outputs.&lt;/li>
&lt;li>&lt;code>b&lt;/code>: bias, tells us what the output would be if inputs were zero.&lt;/li>
&lt;/ul>
&lt;p>Now we need to estimate &lt;code>w&lt;/code> and &lt;code>b&lt;/code>, the parameters in our model, based on the data we have. We must do it so that temperatures we obtain from running the unknown temperatures &lt;code>t_u&lt;/code> through the model are close to temperatures we actually measured in Celsius (&lt;code>t_c&lt;/code>). That sounds like fitting a line through a set of measurements!&lt;/p>
&lt;p>Let’s flesh it out again:&lt;/p>
&lt;ul>
&lt;li>we have a model with some unknown parameters, and we need to estimate those parameters so that the error between predicted outputs and measured values is as low as possible.&lt;/li>
&lt;li>We need to exactly define a measure of the error. Such a measure, which we refer to as the &lt;strong>loss function&lt;/strong>, should be high if the error is high and should ideally be as low as possible for a perfect match.&lt;/li>
&lt;li>Our optimization process should therefore aim at finding &lt;code>w&lt;/code> and &lt;code>b&lt;/code> so that the loss function is at a minimum.&lt;/li>
&lt;/ul>
&lt;h2 id="modeling-with-pytorch">Modeling with PyTorch&lt;/h2>
&lt;p>We can define the model as a python function:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">b&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> t_u: input tensor
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> w: weight parameter
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> b: bias parameter
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">w&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">t_u&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">b&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For loss function we choose &lt;strong>Mean Square Loss&lt;/strong> (building a tensor of differences, taking their square element-wise, and finally producing a scalar loss function by averaging all of the elements in the resulting tensor):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_p&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">t_c&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">squared_diffs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">t_p&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">**&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">squared_diffs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">mean&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="down-along-the-gradient">Down along the gradient&lt;/h2>
&lt;p>We’ll optimize the loss function with respect to the parameters using the &lt;strong>gradient descent&lt;/strong> algorithm, which is actually a very simple idea and scales up surprisingly well to large neural network models with mil- lions of parameters.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">params&lt;/span> &lt;span class="o">-=&lt;/span> &lt;span class="n">learning_rate&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="pytorchs-autograd">PyTorch&amp;rsquo;s &lt;code>autograd&lt;/code>&lt;/h3>
&lt;p>PyTorch provides a mechanisam called &lt;code>autograd&lt;/code>: PyTorch tensors can remember where they come from, in terms of the operations and parent tensors that originated them, and they can automatically provide the chain of derivatives of such operations with respect to their inputs. This means&lt;/p>
&lt;ul>
&lt;li>we won’t need to derive our model by hand &amp;#x1f44f;&lt;/li>
&lt;li>given a forward expression, no matter how nested, PyTorch will automatically provide the gradient of that expression with respect to its input parameters &amp;#x1f44f;&lt;/li>
&lt;/ul>
&lt;h4 id="applying-autograd">Applying &lt;code>autograd&lt;/code>&lt;/h4>
&lt;p>In order to activate &lt;code>autograd&lt;/code>, we need to initialize the parameters tensor with &lt;code>requires_grad=True&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">params&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mf">1.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">requires_grad&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="using-the-grad-attribute">Using the &lt;code>grad&lt;/code> attribute&lt;/h4>
&lt;p>&lt;code>requires_grad=True&lt;/code> is telling PyTorch to track the entire family tree of tensors resulting from operations on &lt;code>params&lt;/code>. In other words, &lt;strong>any tensor that will have &lt;code>params&lt;/code> as an ancestor will have access to the chain of functions that were called to get from &lt;code>params&lt;/code> to that tensor.&lt;/strong> In case these functions are differentiable (and most PyTorch tensor operations will be), the value of the derivative will be &lt;em>automatically&lt;/em> populated as a &lt;code>grad&lt;/code> attribute of the &lt;code>params&lt;/code> tensor.&lt;/p>
&lt;p>In general, all PyTorch tensors have an attribute named &lt;code>grad&lt;/code>. Normally, it’s &lt;code>None&lt;/code> at the beginning:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="kc">None&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">True=
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>All we have to do to populate it is to start with a tensor with &lt;code>requires_grad&lt;/code> set to &lt;code>True&lt;/code>, then call the model and compute the loss, and then call &lt;code>backward()&lt;/code> on the &lt;code>loss&lt;/code> tensor:&lt;/strong>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>At this point, the &lt;code>grad&lt;/code> attribute of &lt;code>params&lt;/code> contains the derivatives of the &lt;code>loss&lt;/code> with respect to each element of params.&lt;/p>
&lt;p>What happened under the hood?&lt;/p>
&lt;p>When we compute our &lt;code>loss&lt;/code> while the parameters &lt;code>w&lt;/code> and &lt;code>b&lt;/code> require gradients, in addition to performing the actual computation, PyTorch creates the autograd graph with the operations (in black circles) as nodes:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-22%2023.20.57.png" alt="截屏2020-10-22 23.20.57" style="zoom:100%;" />
&lt;p>When we call &lt;code>loss.backward()&lt;/code>, PyTorch traverses this graph in the reverse direction to compute the gradients:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-22%2023.22.02.png" alt="截屏2020-10-22 23.22.02" style="zoom:100%;" />
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-yellow-100 dark:bg-yellow-900">
&lt;span class="pr-3 pt-1 text-red-400">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="M12 9v3.75m-9.303 3.376c-.866 1.5.217 3.374 1.948 3.374h14.71c1.73 0 2.813-1.874 1.948-3.374L13.949 3.378c-.866-1.5-3.032-1.5-3.898 0zM12 15.75h.007v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>Note! Calling &lt;code>backward&lt;/code> will lead derivatives to accumulate at leaf nodes. We need to *&lt;strong>zero the gradient explicitly*&lt;/strong> after using it for parameter updates. We can do this easily using the inplace &lt;code>zero_&lt;/code> method:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/span>
&lt;/div>
&lt;p>Now our &lt;code>autograd&lt;/code>-enabled training code looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">learning_rate&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># forward pass&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># backward pass&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># update params&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">no_grad&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">params&lt;/span> &lt;span class="o">-=&lt;/span> &lt;span class="n">learning_rate&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">grad&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># logging&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">500&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Epoch &lt;/span>&lt;span class="si">%d&lt;/span>&lt;span class="s1">, Loss &lt;/span>&lt;span class="si">%f&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loss&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">params&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="pytorchs-optimizers">PyTorch&amp;rsquo;s optimizers&lt;/h2>
&lt;p>There are several optimization strategies and tricks that can assist convergence, especially when models get complicated. The &lt;code>torch&lt;/code> module has an &lt;code>optim&lt;/code> submodule where we can find classes implementing different optimization algorithms.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.optim&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">optim&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">dir&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">optim&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">[&amp;#39;ASGD&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Adadelta&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Adagrad&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Adam&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;AdamW&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Adamax&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;LBFGS&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Optimizer&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RMSprop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Rprop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;SGD&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;SparseAdam&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__builtins__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__cached__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__doc__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__file__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__loader__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__name__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__package__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__path__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__spec__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;lr_scheduler&amp;#39;]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Every optimizer constructor takes a list of parameters (aka PyTorch tensors, typically with &lt;code>requires_grad&lt;/code> set to &lt;code>True&lt;/code>) as the first input. All parameters passed to the optimizer are retained inside the optimizer object so the optimizer can update their values and access their &lt;code>grad&lt;/code> attribute:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-23%2015.00.49.png" alt="截屏2020-10-23 15.00.49" style="zoom:67%;" />
&lt;p>Each optimizer exposes two methods&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>zero_grad&lt;/code>: zeroes the &lt;code>grad&lt;/code> attribute of all the parameters passed to the optimizer upon construction.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>step&lt;/code>: updates the value of those parameters according to the optimization strategy implemented by the specific optimizer.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Let&amp;rsquo;s apply optimizer to our training loop:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># initialize parameters&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">params&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mf">1.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">requires_grad&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># choose learning rate and optimizer&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">learning_rate&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">1e-2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">optimizer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">optim&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SGD&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">lr&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">learning_rate&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span>&lt;span class="o">+&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># zero_grad before backward!&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># # update params&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">500&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;Epoch: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">, loss: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loss&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">params&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="training-validation-and-overfitting">Training, validation, and overfitting&lt;/h2>
&lt;p>A highly adaptable model will tend to use its many parameters to make sure the loss is minimal at the data points, but we’ll have no guarantee that the model behaves well away from or in between the data points. 🤪&lt;/p>
&lt;p>&lt;strong>Overfitting&lt;/strong>: Evaluating the loss at independent data points yield higher-than-expected loss.&lt;/p>
&lt;p>To overcome overfitting,&lt;/p>
&lt;ul>
&lt;li>we must take a few data points out of our dataset (the &lt;strong>validation set&lt;/strong>) and only fit our model on the remaining data points (the &lt;strong>training set&lt;/strong>).&lt;/li>
&lt;li>while we’re fitting the model, we can evaluate the loss once on the training set and once on the validation set.&lt;/li>
&lt;li>When we’re trying to decide if we’ve done a good job of fitting our model to the data, we must look at both!&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-23%2017.12.52.png" alt="截屏2020-10-23 17.12.52" style="zoom:80%;" />
&lt;h3 id="evaluating-the-training-loss">Evaluating the training loss&lt;/h3>
&lt;p>If the training loss is not decreasing, there may be two possibilities:&lt;/p>
&lt;ul>
&lt;li>the model is too simple for the data&lt;/li>
&lt;li>our data just doesn’t contain meaningful information that lets it explain the output&lt;/li>
&lt;/ul>
&lt;h3 id="generalizing-to-the-validation-set">Generalizing to the validation set&lt;/h3>
&lt;p>&lt;strong>If the training loss and the validation loss diverge, we’re overfitting.&lt;/strong> Overfitting really looks like a problem of making sure the behavior of the model in between data points is sensible for the process we’re trying to approximate.&lt;/p>
&lt;p>How to avoid overfitting?&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Make sure we get enough data for the process&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make our model simple&lt;/p>
&lt;blockquote>
&lt;p>A simpler model may not fit the training data as perfectly as a more complicated model would, but it will likely behave more regularly in between data points.&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;li>
&lt;p>Make sure the model that is capable of fitting the training data is as regular as possible in between them.&lt;/p>
&lt;ul>
&lt;li>Adding penalization terms to the loss function, to make it cheaper for the model to behave more smoothly and change more slowly (up to a point)&lt;/li>
&lt;li>Add noise to the input samples, to artificially create new data points in between training data samples and force the model to try to fit those, too.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>We’ve got some nice &lt;em>trade-off&lt;/em>s:&lt;/p>
&lt;ul>
&lt;li>we need the model to have enough capacity for it to fit the training set.&lt;/li>
&lt;li>we need the model to avoid overfitting&lt;/li>
&lt;/ul>
&lt;p>Therefore, in order to choose the right size for a neural network model in terms of parameters, the process is based on two steps:&lt;/p>
&lt;ol>
&lt;li>increase the size until it fits,&lt;/li>
&lt;li>then scale it down until it stops overfitting.&lt;/li>
&lt;/ol>
&lt;h3 id="splitting-a-dataset">Splitting a dataset&lt;/h3>
&lt;p>Use PyTorch&amp;rsquo;s &lt;code>randperm&lt;/code> function&lt;/p>
&lt;blockquote>
&lt;p>&lt;code>randperm&lt;/code> function: Shuffle the elements of a tensor amounts to finding a permutation of its indices.&lt;/p>
&lt;/blockquote>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_samples&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">t_u&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_val&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mf">0.2&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_samples&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">shuffled_indices&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">randperm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_samples&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_indices&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">shuffled_indices&lt;/span>&lt;span class="p">[:&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">n_val&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">val_indices&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">shuffled_indices&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">n_val&lt;/span>&lt;span class="p">:]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># training set&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_t_u&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">t_u&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">train_indices&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_t_c&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">train_indices&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># validation set&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">val_t_u&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">t_u&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">val_indices&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">val_t_c&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">val_indices&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Our training loop doesn’t really change. We just want to additionally evaluate the validation loss at every epoch, to have a chance to recognize whether we’re overfitting:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_t_u&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_t_c&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_t_c&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">val_t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">val_t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">val_loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">val_t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="mi">3&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">500&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Epoch &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">, Training loss &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">train_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.4f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">,&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34; Validation loss &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">val_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.4f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">params&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="observing-the-training">Observing the training&lt;/h3>
&lt;p>Our main goal: &lt;strong>both the training loss and the validation loss decreasing&lt;/strong>. While ideally both losses would be roughly the same value, as long as the validation loss stays reasonably close to the training loss, we know that our model is continuing to learn generalized things about our data.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-23%2018.05.47.png" alt="截屏2020-10-23 18.05.47" style="zoom:100%;" />
&lt;h3 id="switching-autograd-off-for-validation">Switching &lt;code>autograd&lt;/code> off for validation&lt;/h3>
&lt;p>We only ever call &lt;code>backward&lt;/code> on &lt;code>train_loss&lt;/code> and errors will only ever backpropagate based on the training set. The validation set is used to provide an &lt;strong>independent evaluation&lt;/strong> of the accuracy of the model’s output on data that wasn’t used for training.&lt;/p>
&lt;p>Since we’re not ever calling &lt;code>backward&lt;/code> on &lt;code>val_loss&lt;/code>, we could in fact just call &lt;code>model&lt;/code> and &lt;code>loss_fn&lt;/code> as plain functions, without tracking the computation. PyTorch allows us to switch off autograd when we don&amp;rsquo;t need it, using the &lt;code>torch.no_grad&lt;/code> context manager.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">params&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_t_u&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_t_c&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_t_c&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">no_grad&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">val_t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">val_t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">val_loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">val_t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Checks that our output requires_grad args are &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># forced to False inside this block&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">assert&lt;/span> &lt;span class="n">val_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">requires_grad&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="kc">False&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="mi">3&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">500&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Epoch &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">, Training loss &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">train_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.4f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">,&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34; Validation loss &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">val_loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.4f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">params&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="run-with-autograd-enabled-or-disabled">Run with &lt;code>autograd&lt;/code> enabled or disabled&lt;/h3>
&lt;p>Using the related &lt;code>set_grad_enabled&lt;/code> context, we can also condition the code to run with &lt;code>autograd&lt;/code> enabled or disabled, according to a Boolean expression—typically indicating whether we are running in &lt;em>training&lt;/em> or &lt;em>inference&lt;/em> mode.&lt;/p>
&lt;p>For instance, we could define a &lt;code>calc_forward&lt;/code> function that takes data as input and runs &lt;code>model&lt;/code> and &lt;code>loss_fn&lt;/code> with or without autograd according to a Boolean &lt;code>is_train&lt;/code> argument:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">cal_forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">is_train&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_grad_enabled&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">is_train&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="n">params&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">loss&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Using Neural Network to Fit Data</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch6/</link><pubDate>Mon, 26 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch6/</guid><description>&lt;h2 id="artficial-neurons">Artficial neurons&lt;/h2>
&lt;p>Core of deep learning are neural networks: &lt;strong>mathematical entities capable of representing complicated functions through a composition of simpler functions.&lt;/strong>&lt;/p>
&lt;p>The basic building block of these complicated functions is the &lt;strong>neuron&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>At its core, it is nothing but a linear transformation of the input (for example, multiplying the input by a number [the &lt;em>weight&lt;/em>] and adding a constant [the &lt;em>bias&lt;/em>]) followed by the application of a fixed nonlinear function (referred to as the &lt;em>activation function&lt;/em>).&lt;/li>
&lt;li>Mathematically, we can write this out as &lt;em>o&lt;/em> = &lt;em>f&lt;/em>(&lt;em>w&lt;/em> * &lt;em>x&lt;/em> + &lt;em>b&lt;/em>), with &lt;em>x&lt;/em> as our input, &lt;em>w&lt;/em> our weight or scaling factor, and &lt;em>b&lt;/em> as our bias or offset. &lt;em>f&lt;/em> is our activation function, set to the hyperbolic tangent, or tanh function here.&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-24%2012.46.23.png" alt="截屏2020-10-24 12.46.23" style="zoom:80%;" />
&lt;h3 id="composing-a-multilayer-network">Composing a multilayer network&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-24%2014.17.52.png" alt="截屏2020-10-24 14.17.52" style="zoom:80%;" />
&lt;p>is made up of a composition of functions like those we just discussed&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">x_1 = f(w_0 * x + b_0)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">x_2 = f(w_1 * x_1 + b_1)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">...
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">y = f(w_n * x_n + b_n)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>where the output of a layer of neurons is used as an input for the following layer.&lt;/p>
&lt;h3 id="the-error-function">The error function&lt;/h3>
&lt;ul>
&lt;li>Neural networks do not have property of a convex error surface&lt;/li>
&lt;li>There’s no single right answer for each parameter we’re attempting to approximate. Instead, we are trying to get all of the parameters, when acting in concert, to produce a useful output.&lt;/li>
&lt;li>Since that useful output is only going to &lt;em>approximate&lt;/em> the truth, there will be some level of imperfection. Where and how imperfections manifest is somewhat arbitrary, and by implication the parameters that control the output (and, hence, the imperfections) are somewhat arbitrary as well. 🤪&lt;/li>
&lt;/ul>
&lt;h3 id="activation-functions">Activation functions&lt;/h3>
&lt;p>The simplest unit in (deep) neural networks is a linear operation (scaling + offset) followed by an activation function. The activation function plays two important roles:&lt;/p>
&lt;ul>
&lt;li>In the inner parts of the model, it allows &lt;strong>the output function to have different slopes at different values&lt;/strong>—something a linear function by definition cannot do. By trickily composing these differently sloped parts for many outputs, neural networks can approximate arbitrary functions&lt;/li>
&lt;li>At the last layer of the network, it has the role of &lt;strong>concentrating&lt;/strong> the outputs of the preceding linear operation into a given range.
&lt;ul>
&lt;li>Capping the output range&lt;/li>
&lt;li>Compressing the output range&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Some activation functions:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-24%2015.26.02.png" alt="截屏2020-10-24 15.26.02" style="zoom:90%;" />
&lt;p>ReLU (Rectified Linear Unit) is currently considered one of the best-performing general activation functions. The LeakyReLU function modifies the standard ReLU to have a small positive slope, rather than being strictly zero for negative inputs (typically this slope is 0.01, but it’s shown here with slope 0.1 for clarity).&lt;/p>
&lt;h3 id="choosing-the-best-activation-function">Choosing the best activation function&lt;/h3>
&lt;p>By definition, activation functions are&lt;/p>
&lt;ul>
&lt;li>&lt;strong>nonlinear&lt;/strong>: The nonlinearity allows the overall network to approximate more complex functions.&lt;/li>
&lt;li>&lt;strong>differentiable&lt;/strong>: so that gradients can be computed through them.&lt;/li>
&lt;/ul>
&lt;p>The following are true for the functions:&lt;/p>
&lt;ul>
&lt;li>They have &lt;strong>at least one sensitive range&lt;/strong>, where nontrivial changes to the input result in a corresponding nontrivial change to the output. This is needed for training.&lt;/li>
&lt;li>Many of them &lt;strong>have an insensitive (or saturated) range&lt;/strong>, where changes to the input result in little or no change to the output.&lt;/li>
&lt;/ul>
&lt;p>Often (but far from universally so), the activation function will have at least one of these:&lt;/p>
&lt;ul>
&lt;li>A lower bound that is approached (or met) as the input goes to negative infinity&lt;/li>
&lt;li>A similar-but-inverse upper bound for positive infinity&lt;/li>
&lt;/ul>
&lt;h3 id="-what-learning-means-for-a-neural-network">🤔 What &lt;em>learning&lt;/em> means for a neural network&lt;/h3>
&lt;p>Building models out of stacks of linear transformations followed by differentiable activations leads to models that can approximate highly nonlinear processes and whose parameters we can estimate surprisingly well through gradient descent, even when dealing with models with millions of parameters. What makes using deep neural networks so attractive is that &lt;strong>it saves us from worrying too much about the exact function that represents our data&lt;/strong>. With a deep neural network model, we have a &lt;em>universal approximator&lt;/em> and a method to estimate its parameters. &amp;#x1f44f;&lt;/p>
&lt;p>Training consists of finding acceptable values for these weights and biases so that the resulting network correctly carries out a task. By &lt;em>carrying out a task successfully&lt;/em>, we mean obtaining a correct output on unseen data produced by the same data-generating process used for training data. A successfully trained network, through the values of its weights and biases, will capture the inherent structure of the data in the form of meaningful numerical representations that work correctly for previously unseen data.&lt;/p>
&lt;p>Deep neural networks give us the ability to approximate highly nonlinear phenomena &lt;strong>without&lt;/strong> having an explicit model for them. Instead, starting from a generic, untrained model, we specialize it on a task by providing it with a set of inputs and outputs and a loss function from which to backpropagate. Specializing a generic model to a task using examples is what we refer to as &lt;strong>learning&lt;/strong>, because the model wasn’t built with that specific task in mind—no rules describing how that task worked were encoded in the model.&lt;/p>
&lt;h2 id="the-pytorch-nn-module">The PyTorch &lt;code>nn&lt;/code> module&lt;/h2>
&lt;p>&lt;code>torch.nn&lt;/code>&lt;/p>
&lt;ul>
&lt;li>submodule dedicated to neural networks&lt;/li>
&lt;li>contains the building blocks needed to create all sorts of neural network architectures. Those building blocks are called &lt;strong>modules&lt;/strong> in PyTorch parlance (such building blocks are often referred to as &lt;strong>layers&lt;/strong> in other frameworks).&lt;/li>
&lt;/ul>
&lt;p>A module&lt;/p>
&lt;ul>
&lt;li>can have one or more &lt;code>Parameter&lt;/code> instances as attributes, which are tensors whose values are optimized during the training process&lt;/li>
&lt;li>can also have one or more submodules (subclasses of &lt;code>nn.Module&lt;/code>) as attributes, and it will be able to track their parameters as well.&lt;/li>
&lt;/ul>
&lt;h3 id="using-__call__-rather-than-forward">Using &lt;code>__call__&lt;/code> rather than &lt;code>forward&lt;/code>&lt;/h3>
&lt;ul>
&lt;li>All PyTorch-provided subclasses of &lt;code>nn.Module&lt;/code> have their &lt;code>__call__&lt;/code> method defined. This allows us to instantiate an &lt;code>nn.Linear&lt;/code> and call it as if it was a function.&lt;/li>
&lt;li>From user code, we should not call &lt;code>forward&lt;/code> directyly&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># correct&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># Don&amp;#39;t do it!&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="dealing-with-batches">Dealing with batches&lt;/h3>
&lt;p>PyTorch &lt;code>nn.Module&lt;/code> and its subclasses are designed to do so on &lt;em>multiple&lt;/em> samples at the same time.&lt;/p>
&lt;ul>
&lt;li>Modules expect the zeroth dimension of the input to be the number of samples in the &lt;em>batch&lt;/em>.&lt;/li>
&lt;li>E.g, we can create an input tensor of size &lt;em>B × Nin&lt;/em>, where
&lt;ul>
&lt;li>&lt;em>B&lt;/em>: the size of the batch&lt;/li>
&lt;li>&lt;em>Nin&lt;/em>: the number of input features&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>The reason we want to do this batching is multifaceted:&lt;/p>
&lt;ul>
&lt;li>Make sure the computation we’re asking for is big enough to saturate the computing resources we’re using to perform the computation
&lt;ul>
&lt;li>GPUs in particular are highly parallelized, so a single input on a small model will leave most of the computing units idle. By providing batches of inputs, the calculation can be spread across the otherwise-idle units, which means the batched results come back just sas quickly as a single result would.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>ome advanced models use statistical information from the entire batch, and those statistics get better with larger batch sizes.&lt;/li>
&lt;/ul>&lt;/span>
&lt;/div>
&lt;h3 id="loss-functions">Loss functions&lt;/h3>
&lt;p>&lt;strong>Loss functions in &lt;code>nn&lt;/code> are still subclasses of &lt;code>nn.Module&lt;/code>, so we will create an instance and call it as a function.&lt;/strong>&lt;/p>
&lt;p>Our training loop looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_u_train&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_u_val&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_c_train&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c_val&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span>&lt;span class="o">+&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># forward pass in training set&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_p_train&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u_train&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_p_train&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c_train&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># forward pass in validation set&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">no_grad&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_p_val&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_u_val&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_val&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t_p_val&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c_val&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">1&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">1000&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Epoch &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">, Training loss &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">loss_train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.4f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">,&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34; Validation loss &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">loss_val&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.4f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>, and we want to use Mean Square Error (MSE) as our loss function:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">linear_model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">optimizer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">optim&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SGD&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">linear_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">lr&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">1e-2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3000&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">linear_model&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_fn&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MSELoss&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">t_u_train&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">t_un_train&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_u_val&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">t_un_val&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t_c_train&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">t_c_train&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">t_c_val&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">t_c_val&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="building-neural-networks-using-pytorch">Building neural networks using PyTorch&lt;/h2>
&lt;h3 id="nnsequential-container">&lt;code>nn.Sequential&lt;/code> container&lt;/h3>
&lt;p>&lt;code>nn&lt;/code> provides a simple way to concatenate modules through the &lt;code>nn.Sequential&lt;/code> container. For example, let’s build the simplest possible neural network: a linear module, followed by an activation function, feeding into another linear module.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">seq_model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">13&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="c1"># 1 input feature to 13 hidden features&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="c1"># pass them through a tanh activation&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">13&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="c1"># linearly combine the resulting 13 numbers into 1 output feature&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">seq_model&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">Sequential(
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> (0): Linear(in_features=1, out_features=13, bias=True)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> (1): Tanh()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> (2): Linear(in_features=13, out_features=1, bias=True)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="inspecting-parameters">Inspecting parameters&lt;/h3>
&lt;p>Calling &lt;code>model.parameters()&lt;/code> will collect weight and bias from both the first and second linear modules. It’s instructive to inspect the parameters in this case by printing their shapes:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">param&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">param&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">seq_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">()]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">[torch.Size([13, 1]), torch.Size([13]), torch.Size([1, 13]), torch.Size([1])]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We can also use &lt;code>named_parameters&lt;/code> to identify parameters by name:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">param&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">seq_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">named_parameters&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">param&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">0.weight: torch.Size([13, 1])
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">0.bias: torch.Size([13])
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">2.weight: torch.Size([1, 13])
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">2.bias: torch.Size([1])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>Sequential&lt;/code> also accepts an &lt;code>OrderedDict&lt;/code>, in which we can name each module passed to &lt;code>Sequential&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">collections&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">OrderedDict&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">seq_model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">OrderedDict&lt;/span>&lt;span class="p">([(&lt;/span>&lt;span class="s1">&amp;#39;hidden_linear&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;hidden_activation&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">()),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;output_linear&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))]))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">seq_model&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">Sequential(
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> (hidden_linear): Linear(in_features=1, out_features=8, bias=True)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> (hidden_activation): Tanh()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> (output_linear): Linear(in_features=8, out_features=1, bias=True)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">param&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">seq_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">named_parameters&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">param&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">hidden_linear.weight: torch.Size([8, 1])
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">hidden_linear.bias: torch.Size([8])
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">output_linear.weight: torch.Size([1, 8])
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">output_linear.bias: torch.Size([1])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We can also access a particular &lt;code>Parameter&lt;/code> by using submodules as attributes:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">seq_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">output_linear&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">bias&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">Parameter containing:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">tensor([-0.0328], requires_grad=True)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Learning from Images</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch7/</link><pubDate>Mon, 26 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch7/</guid><description>&lt;h2 id="dataset-of-images">Dataset of images&lt;/h2>
&lt;p>&lt;code>torchvision&lt;/code> module:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>automatically download the dataset&lt;/p>
&lt;/li>
&lt;li>
&lt;p>load it as a collection of PyTorch tensors&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For example, download CIFAR-10 dataset:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">torchvision&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">datasets&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">data_path&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;../data-unversioned/p1ch7/&amp;#39;&lt;/span> &lt;span class="c1"># root directory&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Instantiates a dataset for the training data; &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># TorchVision downloads the data if it is not present.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cifar10&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datasets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CIFAR10&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">download&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># With train=False, this gets us a dataset for the validation data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cifar10_val&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datasets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CIFAR10&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">download&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>dataset&lt;/code> submodule:&lt;/p>
&lt;ul>
&lt;li>gives us precanned access to the most popular computer vision datasets, such as MNIST, Fashion-MNIST, CIFAR-100, SVHN, Coco, and Omniglot.&lt;/li>
&lt;li>In each case, the dataset is returned as a subclass of &lt;code>torch.utils.data.Dataset&lt;/code>.&lt;/li>
&lt;/ul>
&lt;h3 id="dataset-class">&lt;code>Dataset&lt;/code> class&lt;/h3>
&lt;p>&lt;code>torch.utils.data.Dataset&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Concept: does NOT necessarily hold the data, but provides uniform access to it through &lt;code>__len__&lt;/code> and &lt;code>__getitem__&lt;/code>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-26%2017.18.11.png" alt="截屏2020-10-26 17.18.11" style="zoom:80%;" />
&lt;/li>
&lt;li>
&lt;p>is an object that is required to implement two methods:&lt;/p>
&lt;ul>
&lt;li>&lt;code>__len__&lt;/code>: returns the number of items in the dataset&lt;/li>
&lt;li>&lt;code>__getitem__&lt;/code>: returns the item, consisting of a smaple and its corresponding label (an integer index)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="dataset-transformations">Dataset transformations&lt;/h3>
&lt;p>&lt;code>torchvision.transforms&lt;/code>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>defines a set of composable, function-like objects that can be passed as an argument to a &lt;code>torchvision&lt;/code> dataset&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">torchvision&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">transforms&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">dir&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">[&amp;#39;CenterCrop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;ColorJitter&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Compose&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;ConvertImageDtype&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;FiveCrop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Grayscale&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Lambda&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;LinearTransformation&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Normalize&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;PILToTensor&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Pad&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomAffine&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomApply&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomChoice&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomCrop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomErasing&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomGrayscale&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomHorizontalFlip&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomOrder&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomPerspective&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomResizedCrop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomRotation&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomSizedCrop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;RandomVerticalFlip&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Resize&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;Scale&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;TenCrop&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;ToPILImage&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;ToTensor&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__builtins__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__cached__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__doc__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__file__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__loader__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__name__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__package__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__path__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;__spec__&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;functional&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;functional_pil&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;functional_tensor&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;#39;transforms&amp;#39;]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>perform transformations on the data after it is loaded but before it is returned by &lt;code>__getitem__&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="totensor">&lt;code>ToTensor&lt;/code>&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>turns NumPy arrays and PIL images to tensors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>also takes care to lay out the dimensions of the output tensor as &lt;em>C × H × W&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Once instantiated, it can be called like a function with the PIL image as the argument, returning a tensor as output&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">torchvision&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">transforms&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">to_tensor&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ToTensor&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img_t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">to_tensor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;p>We can pass the transform dierctly as an argument to dataset instructor:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">tensor_cifar10&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datasets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CIFAR10&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">download&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">transform&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ToTensor&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>
&lt;p>At this point, accessing an element of the dataset will return a tensor, rather than a PIL image&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">img_t&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tensor_cifar10&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">99&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img_t&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">torch.Tensor
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Whereas the values in the original PIL image ranged from 0 to 255 (8 bits per channel), the &lt;code>ToTensor&lt;/code> transform turns the data into a 32-bit floating-point per channel, scaling the values down from 0.0 to 1.0.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">img_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">min&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">img_t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(tensor(0.), tensor(1.))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;h4 id="normalizing-data">Normalizing data&lt;/h4>
&lt;p>We can chain transforms using &lt;code>transforms.Compose&lt;/code>, and they can handle normalization and data augmentation transparently, directly in the data loader. It’s good practice to normalize the dataset so that each channel has zero mean and unitary standard deviation. Also, normalizing each channel so that it has the same distribution will ensure that channel information can be mixed and updated through gradient descent using the same learning rate.&lt;/p>
&lt;p>&lt;code>transforms.Normalize&lt;/code>: compute the mean value and the standard deviation of each channel across the dataset and apply the following transform: &lt;code>v_n[c] = (v[c] - mean[c]) / stdev[c]&lt;/code>.&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">Note that the values of &lt;code>mean&lt;/code> and &lt;code>stdev&lt;/code> must be computed offline in advance (they are not computed by the transform).&lt;/span>
&lt;/div>
&lt;p>Steps for normalization:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Stack all the tensors returned by the dataset along an extra dimension&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">imgs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stack&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">img_t&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">img_t&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">tensor_cifar10&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">torch.Size([3, 32, 32, 50000])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>(Channels x Height x Width x #images)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Compute mean and standard derivation per channel:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Mean&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Recall that view(3, -1) keeps the three channels and &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># merges all the remaining dimensions into one, figuring out the appropriate size. &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Here our 3 × 32 × 32 image is transformed into a 3 × 1,024 vector, &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># and then the mean is taken over the 1,024 elements of each channel.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">mean&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([0.4915, 0.4823, 0.4468])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Standard derivation&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">std&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">tensor([0.2470, 0.2435, 0.2616])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Initialize the &lt;code>Normalize&lt;/code> transform and chain it in &lt;code>transforms.Compose&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Normalize&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="mf">0.4915&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.4823&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.4468&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mf">0.2470&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.2435&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.2616&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">transform&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Compose&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ToTensor&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Normalize&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="mf">0.4915&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.4823&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.4468&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="mf">0.247&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.2435&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.2616&lt;/span>&lt;span class="p">))])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">transformed_cifar10&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datasets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CIFAR10&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">download&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">transform&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">transform&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ol>
&lt;h2 id="classifier">Classifier&lt;/h2>
&lt;p>Assume that we’ll pick out all the birds and airplanes from our CIFAR-10 dataset and build a neural network that can tell birds and airplanes apart. This is a classification problem.&lt;/p>
&lt;h3 id="a-fully-connected-model">A fully connected model&lt;/h3>
&lt;p>An image is just a set of numbers laid out in a spatial configuration. In theory if we just take the image pixels and straighten them into a long 1D vector, we could consider those numbers as input features, which can be illustrated with the following figure:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-26 23.29.01.png" alt="截屏2020-10-26 23.29.01" style="zoom:80%;" />
&lt;p>In our case, each image is 32 x 32 x 3, that&amp;rsquo;s 3072 input features per sample. Let&amp;rsquo;s build a simple fully connected neural network:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.nn&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">nn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_input&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">3072&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_hidden&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">512&lt;/span> &lt;span class="c1"># just arbitrary choice&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">2&lt;/span> &lt;span class="c1"># there&amp;#39;re 2 classes: bird and airplan&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_in&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_hidden&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_hidden&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="output-of-a-classifier">Output of a classifier&lt;/h3>
&lt;p>We need to recgnize that the output is &lt;strong>categorical&lt;/strong>: it&amp;rsquo;s either a bird or an airplane.&lt;/p>
&lt;p>In the ideal caes, the network would ouput &lt;code>torch.tensor([1.0, 0.0])&lt;/code> for an airplane and &lt;code>torch.tensor([0.0, 1.0])&lt;/code> for a bird. Practically speaking, since our classifier will not be perfect, we can expect the network to output something in between. The key realization in this case is that we can interpret our output as &lt;strong>probabilities&lt;/strong>: the first entry is the probability of “airplane,” and the second is the probability of “bird.”&lt;/p>
&lt;p>Casting the problem in terms of probabilities imposes a few extra constraints on the outputs of our network:&lt;/p>
&lt;ul>
&lt;li>Each element of the output must be in the [0.0, 1.0] range (a probability of an outcome cannot be less than 0 or greater than 1).&lt;/li>
&lt;li>The elements of the output must add up to 1.0 (we’re certain that one of the two outcomes will occur).&lt;/li>
&lt;/ul>
&lt;p>This is called &lt;strong>softmax&lt;/strong>: we take the elements of the vector, compute the elementwise exponential, and divide each element by the sum of exponentials&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-26%2023.39.08.png" alt="截屏2020-10-26 23.39.08" style="zoom:80%;" />
&lt;p>In code:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">softmax&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">exp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">exp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>nn&lt;/code> module makes &lt;code>softmax&lt;/code> available as a module, which requires us to specify the dimension along which the softmax function is applied. Now we add a softmax at the end of our model,&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_in&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_hidden&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_hidden&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_out&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Softmax&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>After training, we will be able to get the label as an index by computing the &lt;code>argmax&lt;/code> of the output probabilities: that is, the index at which we get the maximum probability. Conveniently, when supplied with a dimension, &lt;code>torch.max&lt;/code> returns the maximum element along that dimension as well as the index at which that value occurs.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">index&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="loss-for-classifying">Loss for classifying&lt;/h3>
&lt;p>We want to penalize misclassifications. What we need to maximize is the probability associated with the correct class, which is referred to as the &lt;strong>likelihood&lt;/strong>. I.e, we want a loss function that is&lt;/p>
&lt;ul>
&lt;li>high when the likelihood is low: so low that the alternatives have a higher probability.&lt;/li>
&lt;li>low when the likelihood is higher than the alternatives, and we’re not really fixated on driving the probability up to 1.&lt;/li>
&lt;/ul>
&lt;p>A loss function behaves that way is called &lt;strong>negative log likelihood (NLL)&lt;/strong>:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-27%2012.43.16.png" alt="截屏2020-10-27 12.43.16" style="zoom:80%;" />
&lt;p>PyTorch has an &lt;code>nn.NLLLoss&lt;/code> class.&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-yellow-100 dark:bg-yellow-900">
&lt;span class="pr-3 pt-1 text-red-400">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="M12 9v3.75m-9.303 3.376c-.866 1.5.217 3.374 1.948 3.374h14.71c1.73 0 2.813-1.874 1.948-3.374L13.949 3.378c-.866-1.5-3.032-1.5-3.898 0zM12 15.75h.007v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>Gotcha ahead!!!&lt;/p>
&lt;p>&lt;code>nn.NLLLoss&lt;/code> does NOT take probabilities but rather takes a tensor of &lt;strong>log probabilities&lt;/strong> as input. It then computes the NLL of our model given the batch of data.&lt;/p>
&lt;/span>
&lt;/div>
&lt;p>The workaround is to use &lt;code>nn.LogSoftmax&lt;/code> instead of &lt;code>nn.Softmax&lt;/code>, which takes care to make the calculation numerically stable.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_in&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_hidden&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_hidden&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">LogSoftmax&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NLLLoss&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># compute the NLL loss for a single sample:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">label&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">cifar2&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># cifar2 is the modified dataset containing only birds and airplanes&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>A more convenient way is to use &lt;code>nn.CrossEntropyLoss&lt;/code>, which is equivalent to the combination of &lt;code>nn.LogSoftmax&lt;/code> and &lt;code>nn.NLLLoss&lt;/code>. This cross entropy can be interpreted as a negative log likelihood of the predicted distribution under the target distribution as an outcome.&lt;/p>
&lt;p>In this case, we drop the last &lt;code>nn.LogSoftmax&lt;/code> layer from the network and use &lt;code>nn.CrossEntropyLoss&lt;/code> as a loss:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_in&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_hidden&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_hidden&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss_fn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CrossEntropyLoss&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The number will be exactly the same as with n&lt;code>n.LogSoftmax&lt;/code> and &lt;code>nn.NLLLoss.&lt;/code> It’s just more convenient to do it all in one pass, with the only gotcha being that the output of our model will NOT be interpretable as probabilities (or log probabilities). We’ll need to &lt;strong>explicitly&lt;/strong> pass the output through a softmax to obtain those.&lt;/p>
&lt;/span>
&lt;/div>
&lt;h3 id="training-the-classifier">Training the classifier&lt;/h3>
&lt;p>Training the classifier is similar to the process we&amp;rsquo;ve learned before:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.nn&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">nn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_in&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_hidden&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_hidden&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss_fn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CrossEntropyLoss&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">optimizer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">optim&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SGD&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">lr&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">learning_rate&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss_fn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NLLLoss&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">n_epochs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">100&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">img&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">label&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">cifar2&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># forward&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tensor&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># backward&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># update&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;Epoch: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">, Loss: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loss&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s1">4.3f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="data-loader">Data loader&lt;/h2>
&lt;p>The &lt;code>torch.utils.data&lt;/code> module has a class that helps with shuffling and organizing the data in minibatches: &lt;code>DataLoader&lt;/code>. The job of a data loader is to &lt;strong>sample minibatches from a dataset, giving us the flexibility to choose from different sampling strategies.&lt;/strong>&lt;/p>
&lt;p>A very common strategy is uniform sampling after shuffling the data at each epoch:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-27%2015.23.55.png" alt="截屏2020-10-27 15.23.55" style="zoom:90%;" />
&lt;p>The &lt;code>DataLoader&lt;/code> constructor takes a &lt;code>Dataset&lt;/code> object as input, along with &lt;code>batch_size&lt;/code> and a &lt;code>shuffle&lt;/code> Boolean that indicates whether the data needs to be shuffled at the beginning of each epoch:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_loader&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataLoader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cifar2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">shuffle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># training set&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">val_loader&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataLoader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cifar2_val&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">shuffle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># validation set&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A DataLoader can be iterated over, so we can use it directly in the inner loop of our new training code:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">batch_size&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">batch_size&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Due to the shuffling, this now prints the loss for a random batch&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Epoch: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">, Loss: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loss&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">4.3f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="parameters-of-the-model">Parameters of the model&lt;/h2>
&lt;p>PyTorch offers a quick way to determine how many parameters a model has through the &lt;code>parameters()&lt;/code> method of &lt;code>nn.Model&lt;/code>.&lt;/p>
&lt;p>To find out how many elements are in each tensor instance, we can call the &lt;code>numel&lt;/code> method. Summing those gives us our total count. Depending on our use case, counting parameters might require us to check whether a parameter has &lt;code>requires_grad&lt;/code> set to &lt;code>True&lt;/code>, as well. We might want to differentiate the number of trainable parameters from the overall model size.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">numel_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">numel&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">requires_grad&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="kc">True&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numel_list&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">numel_list&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(1574402, [1572864, 512, 1024, 2])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="727-the-limits-of-going-fully-connected">7.2.7 The limits of going fully connected&lt;/h2>
&lt;p>The model we trained above is like taking every single input value—that is, every single component in our RGB image—and computing a linear combination of it with all the other values for every output feature.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-27%2016.45.55.png" alt="截屏2020-10-27 16.45.55" style="zoom:80%;" />
&lt;ul>
&lt;li>On one hand, we are allowing for the combination of any pixel with every other pixel in the image being potentially relevant for our task.&lt;/li>
&lt;li>On the other hand, we aren’t utilizing the relative position of neighboring or faraway pixels, since we are treating the image as one big vector of numbers.&lt;/li>
&lt;/ul>
&lt;p>The problem of our fully connected network is: it is NOT &lt;strong>translation invariant&lt;/strong>. The solution to our current set of problems is to change our model to use convolutional layers.&lt;/p></description></item><item><title>Using Convolution to Generalize</title><link>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch8/</link><pubDate>Tue, 27 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/pytorch/dl-with-pytorch/p1ch8/</guid><description>&lt;h2 id="convolutions">Convolutions&lt;/h2>
&lt;p>Convolutions deliver &lt;strong>locality&lt;/strong> and &lt;strong>translation invariance&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>If we want to recognize patterns corresponding to objects, we will likely need to look at how nearby pixels are arranged, and we will be less interested in how pixels that are far from each other appear in combination.
&lt;ul>
&lt;li>In order to translate this intuition into mathematical form, we could compute the weighted sum of a pixel with its &lt;strong>immediate neighbors&lt;/strong>, rather than with all other pixels in the image.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="what-convolutions-do">What convolutions do&lt;/h3>
&lt;p>&lt;strong>Translation invariant&lt;/strong>: we want these localized patterns to have an effect on the output regardless of their location in the image.&lt;/p>
&lt;p>Convolution is defined for a 2D image as the scalar product of a weight matrix, the kernel, with every neighborhood in the input. The following figure illustrates applying a 3x3 kernel on a 2D image:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-27%2022.10.23.png" alt="截屏2020-10-27 22.10.23" style="zoom:80%;" />
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;ul>
&lt;li>The weights in the kernel are NOT known in advance, but they are initialized randomly and updated through backpropagation.&lt;/li>
&lt;li>It is the &lt;strong>SAME&lt;/strong> kernel, and thus each weight in the kernel, is reused across the whole image.
&lt;ul>
&lt;li>Thinking back to autograd, this means the use of each weight has a history spanning the entire image. Thus, the derivative of the loss with respect to a convolution weight includes contributions from the entire image.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/span>
&lt;/div>
&lt;p>Summarizing, by using to convolutions, we get&lt;/p>
&lt;ul>
&lt;li>Local operations on neighborhoods &amp;#x1f44f;&lt;/li>
&lt;li>Translation invariance &amp;#x1f44f;&lt;/li>
&lt;li>Models with a lot fewer parameters &amp;#x1f44f;
&lt;ul>
&lt;li>With a convolution layer, the number of parameters depends on
&lt;ul>
&lt;li>the size of the convolution kernel (3x3, 5x5, and so on)&lt;/li>
&lt;li>how manyy convlution filters (or output channels) we decide to use in our model.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="convolutions-in-pytorch">Convolutions in PyTorch&lt;/h3>
&lt;p>The &lt;code>torch.nn&lt;/code> module provides convolutions for 1, 2, and 3 dimensions:&lt;/p>
&lt;ul>
&lt;li>&lt;code>nn.Conv1d&lt;/code> for time series&lt;/li>
&lt;li>&lt;code>nn.Conv2d&lt;/code> for images&lt;/li>
&lt;li>&lt;code>nn.Conv3d&lt;/code> for volumes or videos&lt;/li>
&lt;/ul>
&lt;p>For image data, we will use &lt;code>nn.Conv2d&lt;/code>. The arguments we provide to &lt;code>nn.Conv2d&lt;/code> are&lt;/p>
&lt;ul>
&lt;li>
&lt;p>the number of input features/channels (since we’re dealing with &lt;em>multichannel&lt;/em> images: that is, &lt;strong>more than one value per pixel&lt;/strong>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>the number of output features&lt;/p>
&lt;/li>
&lt;li>
&lt;p>the size of the kernel&lt;/p>
&lt;blockquote>
&lt;p>It is very common to have kernel sizes that are the same in all directions, so PyTorch has a shortcut for this: whenever &lt;code>kernel_size=3&lt;/code> is specified for a 2D convolution, it means 3 × 3 (provided as a tuple (3, 3) in Python).&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;p>For example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">in_ch&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">3&lt;/span> &lt;span class="c1"># 3 input features epr pixel (the RGB channels)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">out_ch&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">16&lt;/span> &lt;span class="c1"># arbitrary number of channels in the output&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">conv&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">in_ch&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">out_ch&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">conv&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In addition, we need to add the zeroth batch dimension with &lt;code>unsqueeze&lt;/code> if we want to call the &lt;code>conv&lt;/code> module with one input image, since &lt;code>nn.Conv2d&lt;/code> expects a &lt;em>B × C × H × W&lt;/em> shaped tensor as input:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># cifar2 is a modified cifar10 which contains only airplanes and birds&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">cifar2&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">conv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([1, 3, 32, 32]), torch.Size([1, 16, 30, 30]))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="padding-the-boundary">Padding the boundary&lt;/h3>
&lt;p>By default, PyTorch will slide the convolution kernel within the input picture, getting &lt;code>width - kernel_width + 1&lt;/code> horizontal and vertical positions. PyTorch gives us the possibility of &lt;em>padding&lt;/em> the image by creating &lt;em>ghost&lt;/em> pixels around the border that have value zero as far as the convolution is concerned.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-28%2017.40.10.png" alt="截屏2020-10-28 17.40.10" style="zoom:80%;" />
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">conv&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">conv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([1, 3, 32, 32]), torch.Size([1, 1, 32, 32]))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>🤔 Reasons to pad convolutions&lt;/p>
&lt;ul>
&lt;li>Doing so helps us separate the matters of convolution and changing image sizes, so we have one less thing to remember&lt;/li>
&lt;li>when we have more elaborate structures such as skip connections or the U-Nets, we want the tensors before and after a few convolutions to be of compatible size so that we can add them or take differences.&lt;/li>
&lt;/ul>
&lt;h3 id="detecting-features-with-convolutions">Detecting features with convolutions&lt;/h3>
&lt;p>With deep learning, we let kernels be estimated from data in whatever way the discrimination is most effective. The the job of a convolutional neural network is to estimate the kernel of a set of filter banks in successive layers that will transform a multichannel image into another multichannel image, where different channels correspond to different features (such as one channel for the average, another channel for vertical edges, and so on).&lt;/p>
&lt;p>The following figure shows how the training automatically learns the kernels:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-28%2020.54.38.png" alt="截屏2020-10-28 20.54.38" style="zoom:80%;" />
&lt;h3 id="pooling">Pooling&lt;/h3>
&lt;h4 id="from-large-to-small-downsampling">From large to small: downsampling&lt;/h4>
&lt;p>Max pooling: taking non-overlapping 2 x 2 tiles and taking the maximum over each of them as the new pixel at the reduced scale.&lt;/p>
&lt;p>![截屏2020-10-28 21.00.33](&lt;a href="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/">https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/&lt;/a>截屏2020-10-28 21.00.33.png)&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>💡 Intuition of max pooling:&lt;/p>
&lt;p>The output images from a convolution layer, especially since they are followed by an activation just like any other linear layer, tend to have a high magnitude where certain features corresponding to the estimated kernel are detected (such as vertical lines). By keeping the highest value in the 2 × 2 neighborhood as the downsampled output, we ensure that the features that are found survive the downsampling, at the expense of the weaker responses.&lt;/p>
&lt;/span>
&lt;/div>
&lt;p>Max pooling is provided by the &lt;code>nn.MaxPool2d&lt;/code> module. It takes as input the size of the neighborhood over which to operate the pooling operation. If we wish to downsample our image by half, we’ll want to use a size of 2.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pool&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MaxPool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pool&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">img&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unsqueeze&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(torch.Size([1, 3, 32, 32]), torch.Size([1, 3, 16, 16]))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="combining-convolutions-and-downsampling">Combining convolutions and downsampling&lt;/h3>
&lt;p>Combining convolutions and downsampling can help us recognize larger structures&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-28 21.22.54.png" alt="截屏2020-10-28 21.22.54" style="zoom:120%;" />
&lt;ol>
&lt;li>we start by applying a set of 3 × 3 kernels on our 8 × 8 image, obtaining a multichannel output image of the same size.&lt;/li>
&lt;li>Then we scale down the output image by half, obtaining a 4 × 4 image, and apply another set of 3 × 3 kernels to it.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>The second set of kernels
&lt;ul>
&lt;li>operates on a 3 × 3 neighborhood of something that has been scaled down by half, so it effectively maps back to 8 × 8 neighborhoods of the input.&lt;/li>
&lt;li>takes the output of the first set of kernels (features like averages, edges, and so on) and extracts additional features on top of those.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Summarizing up:&lt;/p>
&lt;ul>
&lt;li>the first set of kernels operates on small neighborhoods on first-order, low-level features,&lt;/li>
&lt;li>while the second set of kernels effectively operates on wider neighborhoods, producing features that are compositions of the previous features.&lt;/li>
&lt;/ul>
&lt;p>This is a very powerful mechanism that provides convolutional neural networks with the ability to see into very complex scenes &amp;#x1f4aa;&lt;/p>
&lt;h2 id="subclassing-nnmodule">Subclassing &lt;code>nn.Module&lt;/code>&lt;/h2>
&lt;p>In order to subclass &lt;code>nn.Module&lt;/code>&lt;/p>
&lt;ul>
&lt;li>we need to define a &lt;code>forward&lt;/code> function that takes the inputs to the module and returns the output. (This is where we define our module’s computation.)
&lt;ul>
&lt;li>With PyTorch, if we use standard torch operations, &lt;code>autograd&lt;/code> will take care of the backward pass automatically &amp;#x1f44f;; and indeed, an &lt;code>nn.Module&lt;/code> never comes with a &lt;code>backward&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>To use other submodules (premade like convolutions or cutomized), we typically define them in the constructor &lt;code>__init__&lt;/code> and assign them to self for use in the &lt;code>forward&lt;/code> function. Before we can do that, we need toc all &lt;code>super().__init__()&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>For example, let&amp;rsquo;s model the following network:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-10-28%2022.15.03.png" alt="截屏2020-10-28 22.15.03" style="zoom:90%;" />
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">Net&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">16&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">act1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pool1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MaxPool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">16&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">act2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pool2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MaxPool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">act3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tanh&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pool1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">act1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pool2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">act2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># we leave the batch dimension as –1 in the call to view, &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># since in principle we don’t know how many samples will be in the batch.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">act3&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="keep-track-of-parameters-and-submodules">Keep track of parameters and submodules&lt;/h3>
&lt;p>Assigning an instance of &lt;code>nn.Module&lt;/code>to an attribute in an &lt;code>nn.Module&lt;/code> automatically registers the module as a submodule. We can call arbitrary methods of an &lt;code>nn.Module&lt;/code> subclass.&lt;/p>
&lt;p>We can call arbitrary methods of an nn.Module subclass. This allows &lt;code>Net&lt;/code> to have access to the parameters of its submodules without further action by the user:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Net&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">numel_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">numel&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">()]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numel_list&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">numel_list&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">(18090, [432, 16, 1152, 8, 16384, 32, 64, 2])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="the-functional-api">The functional API&lt;/h3>
&lt;p>Looking back at the implementation of the &lt;code>Net&lt;/code> class, it appears a bit of a waste that we are also registering submodules that have &lt;strong>no&lt;/strong> parameters, like &lt;code>nn.Tanh&lt;/code> and &lt;code>nn.MaxPool2d&lt;/code>. It would be easier to call these &lt;em>&lt;strong>directly&lt;/strong>&lt;/em> in the &lt;code>forward&lt;/code> function, just as we called &lt;code>view&lt;/code>.&lt;/p>
&lt;p>PyTorch has &lt;em>functional&lt;/em> counterparts for every &lt;code>nn&lt;/code> module.&lt;/p>
&lt;blockquote>
&lt;p>By “functional” here we mean “having no internal state”—in other words, “whose output value is solely and fully determined by the value input arguments.”&lt;/p>
&lt;/blockquote>
&lt;p>&lt;code>torch.nn.functional&lt;/code> provides many functions that work like the modules we find in &lt;code>nn&lt;/code> . Instead of working on the input arguments and stored parameters like the module counterparts, they take inputs and parameters as arguments to the function call. For instance, the functional counterpart of &lt;code>nn.Linear&lt;/code> is &lt;code>nn.functional.linear&lt;/code>, which is a function that has signature &lt;code>linear(input, weight, bias=None)&lt;/code>. The weight and bias parameters are arguments to the function.&lt;/p>
&lt;p>Let&amp;rsquo;s switch to the functional counterparts of pooling and activation, since they have no parameters:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">torch.nn.functional&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">F&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">Net&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">16&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">16&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="training-the-convnet">Training the convnet&lt;/h2>
&lt;p>Two nested loops:&lt;/p>
&lt;ul>
&lt;li>an outer one over the &lt;em>epochs&lt;/em> and
&lt;ul>
&lt;li>an inner one of the &lt;code>DataLoader&lt;/code> that produces batches from our &lt;code>Dataset&lt;/code>. In each loop, we then have to
&lt;ol>
&lt;li>Feed the inputs through the model (the forward pass).&lt;/li>
&lt;li>Compute the loss (also part of the forward pass).&lt;/li>
&lt;li>Zero any old gradients.&lt;/li>
&lt;li>Call &lt;code>loss.backward()&lt;/code> to compute the gradients of the loss with respect to all parameters (the backward pass).&lt;/li>
&lt;li>Have the optimizer take a step in toward lower loss.&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">datetime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span>&lt;span class="o">+&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># feeds a batch through our model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">imgs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># computes the loss we wish to minimize&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># get rid of the gradients from the last round&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># perform the backward step&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># (we compute the gradients of all parameters we want the network to learn)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># update the model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># sum the losses over the epoch&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># use .item() to escape the gradients&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="ow">or&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">10&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">: &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;Epoch &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">, Training loss: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">loss_train&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_loader&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_loader&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataLoader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cifar2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">shuffle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Net&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">optimizer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">optim&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SGD&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">lr&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">1e-2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss_fn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CrossEntropyLoss&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">100&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">loss_fn&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_loader&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">train_loader&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="measuring-accuracy">Measuring accuracy&lt;/h3>
&lt;p>Measure the accuracies on the training set and validation set:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_loader&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataLoader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cifar2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">shuffle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">val_loader&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataLoader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cifar2_val&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">shuffle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">validate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_loader&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loader&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="p">[(&lt;/span>&lt;span class="s2">&amp;#34;train&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;val&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_loader&lt;/span>&lt;span class="p">)]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">correct&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">total&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">no_grad&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">loader&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">imgs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">predicted&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dim&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">total&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">correct&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="n">predicted&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;Accuracy &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1"> : &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">correct&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">total&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s1">.2f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">validate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">val_loader&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="saving-and-loading-model">Saving and loading model&lt;/h3>
&lt;p>Save the model to a file:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># assume that the data_path is already specified&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># and we want to save our model with the name &amp;#34;birds_vs_airplanes.pt&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">save&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">state_dict&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">data_path&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s1">&amp;#39;birds_vs_airplanes.pt&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;strong>birds_vs_airplanes.pt&lt;/strong> file now contains all the parameters of model: weights and biases for the two convolution modules and the two linear modules. &lt;strong>No structure, just the weights.&lt;/strong>&lt;/p>
&lt;p>When we deploy the model in production, we’ll need to keep the &lt;code>model&lt;/code> class handy, create an instance, and then load the parameters back into it:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">loaded_model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Net&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loaded_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load_state_dict&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_path&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s1">&amp;#39;birds_vs_airplanes.pt&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">&amp;lt;All keys matched successfully&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="training-on-gpu">Training on GPU&lt;/h3>
&lt;p>&lt;code>nn.Module&lt;/code> implements a &lt;code>.to&lt;/code> function that moves all of its parameters to the GPU (or casts the type when you pass a &lt;code>dtype&lt;/code> argument). There is a somewhat subtle difference between &lt;code>Module.to&lt;/code> and &lt;em>Tensor.to&lt;/em>.&lt;/p>
&lt;ul>
&lt;li>&lt;code>Module.to&lt;/code> is &lt;strong>in place&lt;/strong>: the module instance is modified.&lt;/li>
&lt;li>But &lt;code>Tensor.to&lt;/code> is out of place (in some ways computation, just like Tensor.tanh), returning a new tensor.&lt;/li>
&lt;/ul>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>📝 &lt;strong>Good practice:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>create the &lt;code>Optimizer&lt;/code> after moving the parameters to the appropriate device&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>move things to the GPU if one is available. A good pattern is to set the a variable device depending on &lt;code>torch.cuda.is_available&lt;/code>:&lt;/strong>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">device&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;cuda&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">cuda&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">is_available&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;cpu&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/span>
&lt;/div>
&lt;p>Let&amp;rsquo;s amend the training loop by moving the tensors we get from the data loader to the GPU by using the &lt;code>Tensor.to&lt;/code> method.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">datetime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Move imgs and labels tensors to the device we&amp;#39;re training on&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">imgs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">labels&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">imgs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="ow">or&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">10&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datetime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">: &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;Epoch &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">, Training loss: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">loss_train&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_loader&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now we can instantiate our model, move it to &lt;code>device&lt;/code>, and run it:&lt;/p>
&lt;p>(Note: If you forget to move either the model or the inputs to the GPU, you will get errors about tensors not being on the same device, because the PyTorch operators do not support mixing GPU and CPU inputs.)&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">train_loader&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DataLoader&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cifar2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">shuffle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># moves our model (all parameters) to the GPU&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Net&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Good practice:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># create the Optimizer after moving the parameters to the appropriate device&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">optimizer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">optim&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SGD&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">lr&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">1e-2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">loss_fn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CrossEntropyLoss&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">training_loop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">100&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">loss_fn&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">train_loader&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">train_loader&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When loading network weights, PyTorch will attempt to load the weight to the same device it was saved from—that is, weights on the GPU will be restored to the GPU. As we don’t know whether we want the same device, we have two options:&lt;/p>
&lt;ul>
&lt;li>we could move the network to the CPU before saving it,&lt;/li>
&lt;li>or move it back after restoring.&lt;/li>
&lt;/ul>
&lt;p>It is a bit more concise to &lt;strong>instruct PyTorch to override the device information when loading weights&lt;/strong>. This is done by passing the &lt;code>map_location&lt;/code> keyword argument to torch.load:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">loaded_model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load_state_dict&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_path&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s1">&amp;#39;birds_vs_airplanes.pt&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">map_location&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="model-design">Model design&lt;/h2>
&lt;h3 id="width-memory-capacity">Width: memory capacity&lt;/h3>
&lt;p>&lt;strong>Width&lt;/strong> of the network: the number of neurons per layer, or channels per convolution.&lt;/p>
&lt;p>Making a model wider is very easy in PyTorch: just &lt;strong>specify a larger number of output channels&lt;/strong>, taking care to change the forward function to reflect the fact that we’ll now have a longer vector once we switch to fully connected layers&lt;/p>
&lt;p>For example, we change the number of output channels in the first convolution from 16 to 32:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">NetwWidth&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">n_chans1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The greater the capacity, the more variability in the inputs the model will be able to manage; but at the same time, the more likely overfitting will be, since the model can use a greater number of parameters to memorize unessential aspects of the input.&lt;/p>
&lt;h3 id="regularization-helping-to-converge-and-generalize">Regularization: helping to converge and generalize&lt;/h3>
&lt;p>Training a model involves two critical steps:&lt;/p>
&lt;ul>
&lt;li>optimization, when we need the loss to decrease on the training set;&lt;/li>
&lt;li>generalization, when the model has to work not only on the training set but also on data it has not seen before, like the validation set.&lt;/li>
&lt;/ul>
&lt;p>The mathematical tools aimed at easing these two steps are sometimes subsumed under the label &lt;strong>regularization&lt;/strong>.&lt;/p>
&lt;h4 id="weight-penalties">Weight penalties&lt;/h4>
&lt;p>The first way to stabilize generalization is to &lt;strong>add a regularization term to the loss&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>the weights of the model tend to be small on their own, limiting how much training makes them grow. I.e. it is penalty on larger weight values.&lt;/li>
&lt;li>This makes the loss have a smoother topography, and there’s relatively less to gain from fitting individual samples.&lt;/li>
&lt;/ul>
&lt;p>The most popular regularization terms are:&lt;/p>
&lt;ul>
&lt;li>L2 regularization: the sum of squares of all weights in the model&lt;/li>
&lt;li>L1 regularization: the sum of the absolute values of all weights in the model&lt;/li>
&lt;/ul>
&lt;p>Both of them are &lt;strong>scaled by a (small) factor&lt;/strong>, which is a hyperparameter we set prior to training.&lt;/p>
&lt;p>Here we&amp;rsquo;ll focus on L2 regularization.&lt;/p>
&lt;ul>
&lt;li>L2 regularization is also referred to as &lt;strong>weight decay&lt;/strong>.&lt;/li>
&lt;li>Adding L2 regularization to the loss function is equivalent to decreasing each weight by an amount proportional to its current value during the optimization step.&lt;/li>
&lt;li>Note that weight decay applies to all parameters of the network, such as biases.&lt;/li>
&lt;/ul>
&lt;p>In PyTorch, we could implement regularization pretty easily by adding a term to the loss.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">training_loop_l2reg&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_epochs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">optimizer&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_epochs&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">train_loader&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">imgs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">imgs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">labels&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">device&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">imgs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labels&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># L2 regularization&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">l2_lambda&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">0.001&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">l2_norm&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">pow&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mf">2.0&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">parameters&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">loss&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">l2_lambda&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">l2_norm&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zero_grad&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">backward&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">optimizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">step&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">loss_train&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">loss&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">peoch&lt;/span> &lt;span class="o">==&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="n">epoch&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">10&lt;/span> &lt;span class="o">==&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">datatime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">datatime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">, Epoch: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">epoch&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">,&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;Training loss: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">loss_train&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">train_loader&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">The SGD optimizer in PyTorch already has a &lt;code>weight_decay&lt;/code> parameter that corresponds to &lt;code>2 * lambda&lt;/code>, and it directly performs weight decay during the update as described previously.&lt;/span>
&lt;/div>
&lt;h4 id="dropout">Dropout&lt;/h4>
&lt;p>💡Idea of dropout: &lt;strong>zero out a random fraction of outputs from neurons across the network, where the randomization happens at each training iteration&lt;/strong>.&lt;/p>
&lt;p>This procedure effectively generates slightly different models with different neuron topologies at each iteration, giving neurons in the model less chance to coordinate in the memorization process that happens during overfitting.&lt;/p>
&lt;p>In Pytorch, we can implement dropout in a model&lt;/p>
&lt;ul>
&lt;li>by adding an &lt;code>nn.Dropout&lt;/code> module between the nonlinear activation function and the linear or convolutional module of the subsequent layer. (As an argument, we need to specify the probability with which inputs will be zeroed out.)&lt;/li>
&lt;li>In case of convolutions, we’ll use the specialized &lt;code>nn.Dropout2d&lt;/code> or nn.Dropout3d, which zero out entire channels of the input&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">NetDropout&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">n_chans1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1_dropout&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Dropout2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">0.4&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2_dropout&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Dropout2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">0.4&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1_dropout&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2_dropout&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Note&lt;/p>
&lt;ul>
&lt;li>dropout is normally &lt;strong>active during training&lt;/strong>,&lt;/li>
&lt;li>during the evaluation of a trained model in production, dropout is bypassed or, equivalently, assigned a probability equal to zero.
&lt;ul>
&lt;li>This is controlled through the &lt;code>train&lt;/code> property of the Dropout module. Recall that PyTorch lets us switch between the two modalities by calling &lt;code>model.train()&lt;/code> or &lt;code>model.eval()&lt;/code> on any &lt;code>nn.Model&lt;/code> subclass.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="batch-normalization">Batch normalization&lt;/h4>
&lt;p>Batch normalization has multiple beneficial effects on training:&lt;/p>
&lt;ul>
&lt;li>allowing us to increase the learning rate&lt;/li>
&lt;li>make training less dependent on initialization and act as a regularizer, thus representing an alternative to dropout.&lt;/li>
&lt;/ul>
&lt;p>💡 Main idea behind batch normalization: &lt;strong>rescale the inputs to the activations of the network so that minibatches have a certain desirable distribution&lt;/strong>.&lt;/p>
&lt;p>In practical terms:&lt;/p>
&lt;ul>
&lt;li>batch normalization shifts and scales an intermediate input using the mean and standard deviation collected at that intermediate location over the samples of the minibatch.&lt;/li>
&lt;li>The regularization effect is a result of the fact that an individual sample and its downstream activations are always seen by the model as shifted and scaled, depending on the statistics across the randomly extracted mini- batch.&lt;/li>
&lt;li>using batch normalization eliminates or at least alleviates the need for dropout.&lt;/li>
&lt;/ul>
&lt;p>In PyTorch&lt;/p>
&lt;ul>
&lt;li>Batch normalization is provided through the &lt;code>nn.BatchNorm1D&lt;/code>, &lt;code>nn.BatchNorm2d&lt;/code>, and &lt;code>nn.BatchNorm3d&lt;/code> modules, depending on the dimensionality of the input.&lt;/li>
&lt;li>the natural location is after the linear transformation (convolution, in this case) and the activation&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">NetBatchNorm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">n_chans1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1_batchnorm&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BatchNorm2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">num_features&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2_batchnorm&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BatchNorm2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">num_features&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1_batchnorm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv_batchnorm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tanh&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Note:&lt;/p>
&lt;p>Just as for dropout, batch normalization needs to behave &lt;strong>differently&lt;/strong> during training and inference.&lt;/p>
&lt;ul>
&lt;li>As minibatches are processed, in addition to estimating the mean and standard deviation for the current minibatch, PyTorch also updates the running estimates for mean and standard deviation that are representative of the whole dataset, as an approximation.&lt;/li>
&lt;li>This way, when the user specifies &lt;code>model.eval()&lt;/code> and the model contains a batch normalization module, the running estimates are frozen and used for normalization. To unfreeze running estimates and return to using the minibatch statistics, we call &lt;code>model.train()&lt;/code>, just as we did for dropout.&lt;/li>
&lt;/ul>
&lt;h3 id="depth-going-deeper-to-learn-more-complex-structures">Depth: going deeper to learn more complex structures&lt;/h3>
&lt;p>The second fundamental dimenison to make a model larger and more capable is &lt;strong>depth&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>With depth, the complexity of the function the network is able to approximate generally increases.&lt;/li>
&lt;li>Depth allows a model to deal with hierarchical information when we need to understand the context in order to say something about some input.&lt;/li>
&lt;/ul>
&lt;p>Another way to think about depth: &lt;strong>increasing depth is related to increasing the length of the sequence of operations that the network will be able to perform when processing input.&lt;/strong>&lt;/p>
&lt;h4 id="skip-connections">Skip connections&lt;/h4>
&lt;p>Adding depth to a model generally makes training harder to converge. The bottom line is that a long chain of multiplications will tend to make the contribution of the parameter to the &lt;em>&lt;strong>gradient vanish&lt;/strong>&lt;/em>, leading to ineffective training of that layer since that parameter and others like it won’t be properly updated.&lt;/p>
&lt;p>Residual networks use a simple trick to allow very deep networks to be successfully trained: using a &lt;strong>skip connection&lt;/strong> to short-circuit blocks of layers&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-01%2015.04.00.png" alt="截屏2020-11-01 15.04.00" style="zoom:90%;" />
&lt;p>&amp;#x261d;&amp;#xfe0f; &lt;strong>A skip connection is nothing but the addition of the input to the output of a block of layers.&lt;/strong>&lt;/p>
&lt;p>Let’s add one layer to our simple convolutional model, and let’s use ReLU as the activation for a change. The vanilla module with an extra layer looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">NetDepth&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">n_chans1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv3&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Adding a skip connection a la ResNet to this model amounts to adding the output of the first layer in the forward function to the input of the third layer:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">NetRes&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">n_chans1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Adding a skip connection is &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># adding the output of the first layer in the forward function &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># to the input of the third layer&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv3&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">out1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">//&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In other words, we’re using the output of the first activations as inputs to the last, in addition to the standard feed-forward path. This is also referred to as &lt;strong>identity mapping&lt;/strong>.&lt;/p>
&lt;p>Generally speaking: just arithmetically add earlier intermediate outputs to downstream intermediate outputs.&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>How does this alleviate the issues with vanishing gradients?&lt;/p>
&lt;p>Thinking about backpropagation, a skip connection, or a sequence of skip connections in a deep network, creates a direct path from the deeper parameters to the loss. This makes their contribution to the gradient of the loss more direct, as partial derivatives of the loss with respect to those parameters have a chance not to be multiplied by a long chain of other operations.&lt;/p>
&lt;/span>
&lt;/div>
&lt;p>It has been observed that skip connections have a beneficial effect on convergence especially in the initial phases of training. Also, the loss landscape of deep residual networks is a lot smoother than feed-forward networks of the same depth and width.&amp;#x1f44f;&lt;/p>
&lt;h4 id="building-very-deep-models-in-pytorch">Building very deep models in Pytorch&lt;/h4>
&lt;p>The standard strategy is:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>define a building block, such as a &lt;code>(Conv2d, ReLU, Conv2d) + skip connection&lt;/code> block&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>build the network dynamically in a &lt;code>for&lt;/code> loop&lt;/strong>.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-01%2015.45.31.png" alt="截屏2020-11-01 15.45.31" style="zoom:90%;" />
&lt;p>We first create a module subclass whose sole job is to provide the computation for one &lt;strong>block—that&lt;/strong> is, one group of convolutions, activation, and skip connection:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ResBlock&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ResBlock&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bias&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">batch_norm&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BatchNorm2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">num_features&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">n_chans&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Use custom initializations as in the ResNet paer&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">init&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">kaiming_normal_&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">weight&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">nonlinearity&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;relu&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The batch norm is initialized to produce output distributions &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># that initially have 0 mean and 0.5 variance&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">init&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">constant_&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">batch_norm&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">weight&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.5&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">init&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">zeros_&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">batch_norm&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">bias&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">batch_norm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We’d now like to generate a 100-block network.&lt;/p>
&lt;ul>
&lt;li>First, in &lt;code>init&lt;/code>, we create &lt;code>nn.Sequential&lt;/code> containing a list of &lt;code>ResBlock&lt;/code> instances. &lt;code>nn.Sequential&lt;/code> will ensure that the output of one block is used as input to the next. It will also ensure that all the parameters in the block are visible to &lt;code>Net&lt;/code>.&lt;/li>
&lt;li>Then, in &lt;code>forward&lt;/code>, we just call the sequential to traverse the 100 blocks and generate the output&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">NetResDeep&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Module&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_blocks&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">super&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="fm">__init__&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="n">n_chans1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Conv2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">padding&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># create a list of ResBlocks&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">resblocks&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequential&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">n_blocks&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">ResBlock&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_chans&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">n_chans&lt;/span>&lt;span class="p">)]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">n_chans1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Linear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">forward&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">conv1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># traverse the list of blocks&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">resblocks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">F&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">max_pool2d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">veiw&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">8&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">n_chans1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">torch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">relu&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fc2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">out&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">out&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item></channel></rss>