PyTorch provides the torch.nn module to help us in creating and training of the neural network. We will first train the basic neural network on the MNIST dataset without using any features from these models. We will use only the basic PyTorch tensor functionality and then we will incrementally add one feature from torch.nn at a time.

torch.nn provide us many more classes and modules to implement and train the neural network.

The nn package contains the following modules and classes:

S.NoClass and ModuleDescription
1.torch.nn.ParameterIt is a type of tensor which is to be considered as a module parameter.
1) torch.nn.ModuleIt is a base class for all neural network module.
2) torch.nn.SequentialIt is a sequential container in which Modules will be added in the same order as they are passed in the constructor.
3) torch.nn.ModuleListThis will holds sub-modules in a list.
4) torch.nn.ModuleDictThis will holds sub-modules in a directory.
5) torch.nn.ParameterListThis will holds the parameters in a list.
6) torch.nn.parameterDictThis will holds the parameters in a directory.
3.Convolution layers
1) torch.nn.Conv1dThis package will be used to apply a 1D convolution over an input signal composed of several input planes.
2) torch.nn.Conv2dThis package will be used to apply a 2D convolution over an input signal composed of several input planes.
3) torch.nn.Conv3dThis package will be used to apply a 3D convolution over an input signal composed of several input planes.
4) torch.nn.ConvTranspose1dThis package will be used to apply a 1D transposed convolution operator over an input image composed of several input planes.
5) torch.nn.ConvTranspose2dThis package will be used to apply a 2D transposed convolution operator over an input image composed of several input planes.
6) torch.nn.ConvTranspose3dThis package will be used to apply a 3D transposed convolution operator over an input image composed of several input planes.
7) torch.nn.UnfoldIt is used to extracts sliding local blocks from a batched input tensor.
8) torch.nn.FoldIt is used to combine an array of sliding local blocks into a large containing tensor.
4.Pooling layers
1) torch.nn.MaxPool1dIt is used to apply a 1D max pooling over an input signal composed of several input planes.
2) torch.nn.MaxPool2dIt is used to apply a 2D max pooling over an input signal composed of several input planes.
3) torch.nn.MaxPool3dIt is used to apply a 3D max pooling over an input signal composed of several input planes.
4) torch.nn.MaxUnpool1dIt is used to compute the partial inverse of MaxPool1d.
5) torch.nn.MaxUnpool2dIt is used to compute the partial inverse of MaxPool2d.
6) torch.nn.MaxUnpool3dIt is used to compute the partial inverse of MaxPool3d.
7) torch.nn.AvgPool1dIt is used to apply a 1D average pooling over an input signal composed of several input planes.
8) torch.nn.AvgPool2dIt is used to apply a 2D average pooling over an input signal composed of several input planes.
9) torch.nn.AvgPool3dIt is used to apply a 3D average pooling over an input signal composed of several input planes.
10) torch.nn.FractionalMaxPool2dIt is used to apply a 2D fractional max pooling over an input signal composed of several input planes.
11) torch.nn.LPPool1dIt is used to apply a 1D power-average pooling over an input signal composed of several input planes.
12) torch.nn.LPPool2dIt is used to apply a 2D power-average pooling over an input signal composed of several input planes.
13) torch.nn.AdavtiveMaxPool1dIt is used to apply a 1D adaptive max pooling over an input signal composed of several input planes.
14) torch.nn.AdavtiveMaxPool2dIt is used to apply a 2D adaptive max pooling over an input signal composed of several input planes.
15) torch.nn.AdavtiveMaxPool3dIt is used to apply a 3D adaptive max pooling over an input signal composed of several input planes.
16) torch.nn.AdavtiveAvgPool1dIt is used to apply a 1D adaptive average pooling over an input signal composed of several input planes.
17) torch.nn.AdavtiveAvgPool2dIt is used to apply a 2D adaptive average pooling over an input signal composed of several input planes.
18) torch.nn.AdavtiveAvgPool3dIt is used to apply a 3D adaptive average pooling over an input signal composed of several input planes.
5.Padding layers
1) torch.nn.ReflectionPad1dIt will pad the input tensor using the reflection of the input boundary.
2) torch.nn.ReflactionPad2dIt will pad the input tensor using the reflection of the input boundary.
3) torch.nn.ReplicationPad1It will pad the input tensor using the replication of the input boundary.
4) torch.nn.ReplicationPad2dIt will pad the input tensor using the replication of the input boundary.
5) torch.nn.ReplicationPad3dIt will pad the input tensor using the replication of the input boundary.
6) torch.nn.ZeroPad2dIt will pad the input tensor boundaries with zero.
7) torch.nn.ConstantPad1dIt will pad the input tensor boundaries with a constant value.
8) torch.nn.ConstantPad2dIt will pad the input tensor boundaries with a constant value.
9) torch.nn.ConstantPad3dIt will pad the input tensor boundaries with a constant value.
6.Non-linear activations (weighted sum, non-linearity)
1) torch.nn.ELUIt will use to apply the element-wise function:
2) torch.nn.HardshrinkIt will use to apply the hard shrinkage function element-wise function:
3) torch.nn.LeakyReLUIt will use to apply the element-wise function:
LeakyReLu(x)=max(0,x) +negative_slope*min(0,x)
4) torch.nn.LogSigmoidIt will use to apply the element-wise function:
5) torch.nn.MultiheadAttentionIt is used to allow the model to attend to information from different representation subspaces
6) torch.nn.PReLUIt will be used to apply the element-wise function:
7) torch.nn.ReLUIt will use to apply the rectified linear unit function element-wise:
8) torch.nn.ReLU6It will be used to apply the element-wise function:
9) torch.nn.RReLUIt will use to apply the randomized leaky rectified linear unit function, element-wise, as described in the paper:
10) torch.nn.SELUIt will use to apply the element-wise function as:
SELU(x)=scale*(max(0,x)+ min(0,a*(exp(x)-1)))

Here α= 1.6732632423543772848170429916717 and scale = 1.0507009873554804934193349852946.
11) torch.nn.CELUIt will use to apply the element-wise function as:
12) torch.nn.SigmoidIt will use to apply the element-wise function as:
13) torch.nn.SoftplusIt will use to apply the element-wise function as:
14) torch.nn.SoftshrinkIt will use to apply soft shrinkage function elementwise as:
15) torch.nn.SoftsignIt will use to apply the element-wise function as:
16) torch.nn.TanhIt will use to apply the element-wise function as:
17) torch.nn.TanhshrinkIt will use to apply the element-wise function as:
18) torch.nn.ThresholdIt will use to thresholds each element of the input Tensor. Threshold is defined as:
7.Non-linear activations (other)
1) torch.nn.SoftminIt is used to apply the softmin function to an n-dimensional input Tensor to rescaling them. After that, the elements of the n-dimensional output Tensor lies in the range 0, 1, and sum to 1. Softmin is defined as:
2) torch.nn.SoftmaxIt is used to apply the softmax function to an n-dimensional input Tensor to rescaling them. After that, the elements of the n-dimensional output Tensor lies in the range 0, 1, and sum to 1. Softmax is defined as:
3) torch.nn.Softmax2dIt is used to apply SoftMax over features to each spatial location.
4) torch.nn.LogSoftmaxIt is used to apply LogSoftmax function to an n-dimensional input Tensor. The LofSoftmax function can be defined as:
5) torch.nn.AdaptiveLogSoftmaxWithLossIt is a strategy for training models with large output spaces. It is very effective when the label distribution is highly imbalanced
8.Normalization layers
1) torch.nn.BatchNorm1dIt is used to apply batch normalization over a 2D or 3D inputs.
2) torch.nn.BatchNorm2dIt is used to apply batch normalization over a 4D.
3) torch.nn.BatchNorm3dIt is used to apply batch normalization over 5D inputs.
4) torch.nn.GroupNormIt is used to apply group normalization over a mini-batch of inputs.
5) torch.nn.SyncBatchNormIt is used to apply batch normalization over n-dimensional inputs.
6) torch.nn.InstanceNorm1dIt is used to apply an instance normalization over a 3D input.
7) torch.nn.InstanceNorm2dIt is used to apply an instance normalization over a 4D input.
8) torch.nn.InstanceNorm3dIt is used to apply an instance normalization over a 5D input.
9) torch.nn.LayerNormIt is used to apply layer normalization over a mini-batch of inputs.
10) torch.nn.LocalResponseNormIt is used to apply local response normalization over an input signal which is composed of several input planes, where the channel occupies the second dimension.
9.Recurrent layers
1) torch.nn.RNNIt is used to apply a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence. Each layer computes the following function for each element in the input sequence:
ht=tanh(Wih xt+bih+Whh tt-1+bhh)
2) torch.nn.LSTMIt is used to apply a multi-layer long short-term memory (LSTM) RNN to an input sequence. Each layer computes the following function for each element in the input sequence:
3) torch.nn.GRUIt is used to apply a multi-layer gated recurrent unit (GRU) RNN to an input sequence. Each layer computes the following function for each element in the input sequence:
4) torch.nn.RNNCellIt is used to apply an Elman RNN cell with tanh or ReLU non-linearity to an input sequence. Each layer computes the following function for each element in the input sequence:
h’=tanh(Wih x+bih+Whh h+bhh)
ReLU is used in place of tanh
5) torch.nn.LSTMCellIt is used to apply a long short-term memory (LSTM) cell to an input sequence. Each layer computes the following function for each element in the input sequence:
Where σ is the sigmoid function, and * is the Hadamard product.
6) torch.nn.GRUCellIt is used to apply a gated recurrent unit (GRU) cell to an input sequence. Each layer computes the following function for each element in the input sequence:
10.Linear layers
1) torch.nn.IdentityIt is a placeholder identity operator which is argument-insensitive.
2) torch.nn.LinearIt is used to apply a linear transformation to the incoming data:
3) torch.nn.BilinearIt is used to apply a bilinear transformation to the incoming data:
y=x1 Ax2+b
11.Dropout layers
1) torch.nn.DropoutIt is used for regularization and prevention of co-adaptation of neurons. A factor oftorch.nn in PyTorchduring training scales the output. That means the module computes an identity function during the evaluation.
2) torch.nn.Dropout2dIf adjacent pixels within feature maps are correlated, then torch.nn.Dropout will not regularize the activations, and it will decrease the effective learning rate. In this case, torch.nn.Dropout2d() is used to promote independence between feature maps.
3) torch.nn.Dropout3dIf adjacent pixels within feature maps are correlated, then torch.nn.Dropout will not regularize the activations, and it will decrease the effective learning rate. In this case, torch.nn.Dropout2d () is used to promote independence between feature maps.
4) torch.nn.AlphaDropoutIt is used to apply Alpha Dropout over the input. Alpha Dropout is a type of Dropout which maintains the self-normalizing property.
12.Sparse layers
1) torch.nn.EmbeddingIt is used to store word embedding’s and retrieve them using indices. The input for the module is a list of indices, and the output is the corresponding word embedding.
2) torch.nn.EmbeddingBagIt is used to compute sums or mean of ‘bags’ of embedding without instantiating the Intermediate embedding.
13.Distance Function
1) torch.nn.CosineSimilarityIt will return the cosine similarity between x1 and x2, computed along dim.
2) torch.nn.PairwiseDistanceIt computes the batch-wise pairwise distance between vectors v1, v2 using the p-norm:
14.Loss function
1) torch.nn.L1LossIt is used to a criterion which measures the mean absolute error between each element in the input x and target y. The unreduced loss can be described as:
      l(x,y)=L={l1,…,ln },ln=|xn-yn |,
Where N is the batch size.
2) torch.nn.MSELossIt is used to a criterion which measures the mean squared error between each element in the input x and target y. The unreduced loss can be described as:
l(x,y)=L={l1,…,ln },ln=(xn-yn)2,
Where N is the batch size.
3) torch.nn.CrossEntropyLossThis criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class. It is helpful when we train a classification problem with C classes.
4) torch.nn.CTCLossThe Connectionist Temporal Classification loss calculates loss between a continuous time series and a target sequence.
5) torch.nn.NLLLossThe Negative Log-Likelihood loss is used to train a classification problem with C classes.
6) torch.nn.PoissonNLLLossThe Negative log-likelihood loss with the Poisson distribution of t
target~Poisson(input)loss(input,target)=input-target*log(target!)he target.
7) torch.nn.KLDivLossIt is a useful distance measure for continuous distribution, and it is also useful when we perform direct regression over the space of continuous output distribution.
8) torch.nn.BCELossIt is used to create a criterion which measures the Binary Cross Entropy between the target and the output. The unreduced loss can be described as:
l(x,y)=L={l1,…,ln },ln=-wn [yn*logxn+ (1-yn )*log(1-xn)],
Where N is the batch size.
9) torch.nn.BCEWithLogitsLossIt combines a Sigmoid layer and the BCELoss in one single class. We can take advantage of the log-sum-exp trick for numerical stability by combining the operation into one layer.
10) torch.nn.MarginRankingLossIt creates a criterion which measures the loss of given inputs x1, x2, two 1D mini-batch Tensors, and a label 1D mini-batch tensor y which contain 1 or -1. The loss function for each sample in the mini-batch is as follows:
      loss(x,y)=max(0,-y*(x1-x2 )+margin
11) torch.nn.HingeEmbeddingLossHingeEmbeddingLoss measures the loss of given an input tensor x and a labels tensor y which contain 1 or -1. It is used for measuring whether two inputs are similar or dissimilar. The loss function is defined as:
12) torch.nn.MultiLabelMarginLossIt is used to create a criterion which optimizes a multi-class multi-classification hinge loss between input x and output y.
13) torch.nn.SmoothL1LossIt is used to create a criterion which uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. It is also known as Huber loss:
14) torch.nn.SoftMarginLossIt is used to create a criterion which optimizes the two-class classification logistic loss between input tensor x and target tensor y which contain 1 or -1.
15) torch.nn.MultiLabelSoftMarginLossIt is used to create a criterion which optimizes the multi-label one-versus-all loss based on max-entropy between input x and target y of size (N, C).
16) torch.nn.CosineEmbeddingLossIt is used to create a criterion which measures the loss of given input tensors x1, x2 and a tensor label y with values 1 or -1. It is used for measuring whether two inputs are similar or dissimilar, using the cosine distance.
17) torch.nn.MultiMarginLossIt is used to create a criterion which optimizes a multi-class classification hinge loss between input x and output y.
18) torch.nn.TripletMarginLossIt is used to create a criterion which measures the triplet loss of given an input tensors x1, x2, x3 and a margin with a value greater than 0. It is used for measuring a relative similarity between samples. A triplet is composed of an anchor, positive example, and a negative example.
L(a,p,n)=max{d(ai,pi )-d(ai,ni )+margin,0}
15.Vision layers
1) torch.nn.PixelShuffleIt is used to re-arrange the elements in a tensor of shape(*,C×r2,H,W) to a tensor of shape (*,C,H×r,W,r)
2) torch.nn.UpsampleIt is used to upsample a given multi-channel 1D, 2D or 3D data.
3) torch.nn.upsamplingNearest2dIt is used to apply 2D nearest neighbor upsampling to an input signal which is composed with multiple input channel.
4) torch.nn.UpsamplingBilinear2dIt is used to apply 2D bilinear upsampling to an input signal which is composed with, multiple input channel.
16.DataParallel layers(multi-GPU, distributed)
1) torch.nn.DataParallelIt is used to implement data parallelism at the module level.
2) torch.nn.DistributedDataParallelIt is used to implement distributed data parallelism, which is based on the torch.distributed package at the module level.
3) torch.nn.DistributedDataParallelCPUIt is used to implement distributed data parallelism for the CPU at the module level.
1) torch.nn.clip_grad_norm_It is used to clip the gradient norm of an iterable of parameters.
2) torch.nn.clip_grad_value_It is used to clip the gradient norm of an iterable of parameters at the specified value.
3) torch.nn.parameters_to_vectorIt is used to convert parameters to one vector.
4) torch.nn.vector_to_parametersIt is used to convert one vector to the parameters.
5) torch.nn.weight_normIt is used to apply weight normalization to a parameter in the given module.
6) torch.nn.remove_weight_normIt is used to remove the weight normalization and re-parameterization from a module.
7) torch.nn.spectral_normIt is used to apply spectral normalization to a parameter in the given module.
8) torch.nn.PackedSequenceIt will use to hold the data and list of batch_sizes of a packed sequence.
9) torch.nn.pack_padded_sequenceIt is used to pack a Tensor containing padded sequences of variable length.
10) torch.nn.pad_packed_sequenceIt is used to pads a packed batch of variable-length sequences.
11) torch.nn.pad_sequenceIt is used to pad a list of variable length Tensors with padding value.
12) torch.nn.pack_sequenceIt is used to packs a list of variable length Tensors
13) torch.nn.remove_spectral_normIt is used to removes the spectral normalization and re-parameterization from a module.


