data:image/s3,"s3://crabby-images/c5bfa/c5bfae169d50706b67a408a4a73cb9ef25606892" alt="Batch gradient descent"
#Batch gradient descent code
Before we get into the results, you’ll see that the code is similar, but we have a few extra elements. train_loss = test_loss = for epoch in range(epochs): ain() y_pred = model(X_train.float()) loss = BCE(y_pred, y_train.reshape(-1,1).float()) train_loss.append(loss) optimizer.zero_grad() loss.backward() optimizer.step() if(epoch % print_epoch = 0): print('Train: epoch: '.format(epoch, iteration_loss/(i+1), iteration_accuracy/(i+1))) Let’s first do it without batch gradient descent and then with. model = Model() BCE = nn.BCELoss() optimizer = (model.parameters(), lr = lr)Īwesome, we can finally train our model. Let’s instantiate our Model class and set our loss (BCE) and optimizer.
data:image/s3,"s3://crabby-images/57aa6/57aa65212c0a3361cedc4a2d6a8b3ccc1ee22ebc" alt="batch gradient descent batch gradient descent"
Our print_epoch variable just tells our code how often we want to see our metrics (i.e., BCE and accuracy). We’ll also setup a few variables to reuse. I like doing this because BCE isn’t really human readable, but accuracy is very human friendly. Let’s create a function to show accuracy as a metric (our loss is BCE). class Model(nn.Module): def _init_(self): super()._init_() self.hidden_linear = nn.Linear(8, 4) self.output_linear = nn.Linear(4, 1) self.sigmoid = nn.Sigmoid() def forward(self, X): hidden_output = self.sigmoid(self.hidden_linear(X)) output = self.sigmoid(self.output_linear(hidden_output)) return output We’ll build a single layer feed forward neural network, consisting of 4 nodes in its hidden layer. Now, we’re going to need our neural network. df = pd.read_csv(r' ') X = df] y = df X = X.values y = y.values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) scaler = StandardScaler() scaler.fit(X_train) X_train = torch.tensor(ansform(X_train)) X_test = torch.tensor(ansform(X_test)) y_train = torch.tensor(y_train) y_test = torch.tensor(y_test) We’ll import it, split it into a train and test set and then standardize both the train and the test sets, while converting them into PyTorch tensors. The dataset we’ll be working with is the Pima Indians Diabetes dataset. Next, let’s import our dataset and do a little bit of preprocessing on it. import pandas as pd import torch from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler import torch.nn as nn Let’s start off by importing a few useful libraries.
#Batch gradient descent how to
We’ll end this article off with how to implement batch gradient descent in code. Now that we’ve gone over the what and the why, let’s go over the how. The argument batch gradient descent makes is that given a good representation of a problem (this good representation is assumed to be present when we have a lot of data), a small random batch (e.g., 64 data points) is sufficient to generalize our larger dataset. Although we’re only plotting 64 random points, those 64 points give us a very good understanding of the shape and direction of the 1000 points. Let’s look at our example dataset by using matplotlib to plot it. It’s going to be a regression line made up of 1000 points.
data:image/s3,"s3://crabby-images/ae2c0/ae2c07e089328f1b4edc881d4d5423a2c8723b92" alt="batch gradient descent batch gradient descent"
data:image/s3,"s3://crabby-images/8bf42/8bf421aef02d08854a6eafe1d1e5e17612351f53" alt="batch gradient descent batch gradient descent"
Now that we’ve imported our libraries, using sklearn, we’re going to make an example dataset. from sklearn.datasets import make_regression import matplotlib.pyplot as plt import random from IPython import display import time I’m going to go over an example (with code) to show why breaking our data into smaller chunks actually works.īefore I show the example, we’re going to have to import a few libraries. One of the questions I had when I first came across batch gradient descent was, “we’re asked to gather as much data as we can only to break that data up into small chunks? I don’t get it… ” Better generalization = less chance of overfitting.
data:image/s3,"s3://crabby-images/ba64a/ba64a8a98a5bbb358208675b3184969e51de8d5c" alt="batch gradient descent batch gradient descent"
Always holding a million 4K images in memory is extremely taxing. Think about if we had a million 4K images. Performing calculations on small batches of the data, rather than all our data at once, is beneficial in a few ways.
#Batch gradient descent update
This allows us to update our weights multiple times in a single epoch. The question you’re probably asking right now is, “what is batch gradient descent and how does it differ from normal gradient descent?” Batch gradient descent splits the training data up into smaller chunks (batches) and performs a forward propagation and backpropagation by the batch.
data:image/s3,"s3://crabby-images/c5bfa/c5bfae169d50706b67a408a4a73cb9ef25606892" alt="Batch gradient descent"