torchlib.optim package

Submodules

torchlib.optim.learning_rate module

class torchlib.optim.learning_rate.LrFinder(device='cpu', plotdir=None, logf=None)

Bases: object

find(dataloader, model, optimizer, criterion, nin=1, nout=1, nbgc=1, lr_init=1e-08, lr_final=100.0, beta=0.98, gamma=4.0)

Find learning rate

Find learning rate, see How Do You Find A Good Learning Rate .

During traing, two types losses are computed

The average loss is:

\[\rm{avg\_loss}_i=\beta * \rm{avg\_loss}_{i-1}+(1-\beta) * \rm{loss}_i \]

The smoothed loss is:

\[\rm{smt\_loss }_{i}=\frac{\rm{avg\_loss}_{i}}{1-\beta^{i+1}} \]

If \(i > 1\) and \(\rm{smt\_loss} > \gamma * \rm{best\_loss}\), stop.

If \(\rm{smt\_loss} < \rm{best\_loss}\) or \(i = 1\), let \(\rm{best\_loss} = \rm{smt\_loss}\).

Parameters
  • dataloader (DataLoader) – The dataloader that contains a dataset for training.

  • model (Module) – Your network module.

  • optimizer (Optimizer) – The optimizer such as SGD, Adam…

  • criterion (Loss) – The criterion/loss used for training model.

  • nin (int, optional) – The number of inputs of the model, the first nin elements are inputs, the rest are targets(can be None) used for computing loss. (the default is 1)

  • nou (int, optional) – The number of outputs of the model used for computing loss, it works only when the model has multiple outputs, i.e. the outputs is a tuple or list which has several tensor elements (>=1). the first nout elements are used for computing loss, the rest are ignored. (the default is 1)

  • nbgc (int, optional) – The number of batches for grad cumulation (the default is 1, which means no cumulation)

  • lr_init (int, optional) – The initial learning rate (the default is 1e-8)

  • lr_final (int, optional) – The final learning rate (the default is 1e-8)

  • beta (float, optional) – weight for weighted sum of loss (the default is 0.98)

  • gamma (float, optional) – The exploding factor \(\gamma\). (the default is 4.)

Returns

  • lrs (list) – Learning rates during training.

  • smt_losses (list) – Smoothed losses during training.

  • avg_losses (list) – Average losses during training.

  • losses (list) – Original losses during training.

Examples

device = 'cuda:1'
# device = 'cpu'

num_epochs = 30
X = th.randn(100, 2, 3, 4)
Y = th.randn(100, 1, 3, 4)

trainds = TensorDataset(X, Y)
# trainds = TensorDataset(X)

model = th.nn.Conv2d(2, 1, 1)
model.to(device)

trainld = DataLoader(trainds, batch_size=10, shuffle=False)

criterion = th.nn.MSELoss(reduction='mean')

optimizer = th.optim.SGD(model.parameters(), lr=1e-1)

lrfinder = LrFinder(device)
# lrfinder = LrFinder(device, plotdir='./')

lrfinder.find(trainld, model, optimizer, criterion, nin=1,
              nbgc=1, lr_init=1e-8, lr_final=10., beta=0.98)

lrfinder.plot(lrmod='Linear')
lrfinder.plot(lrmod='Log')
plot(lrmod='log', loss='smoothed')

plot the loss-lr curve

Plot the loss-learning rate curve.

Parameters
  • lrmod (str, optional) – 'log' –> use log scale, i.e. log10(lr) instead lr. (default) 'linear' –> use original lr.

  • loss (str, optional) – Specify which type of loss will be ploted. (the default is ‘smoothed’)

torchlib.optim.learning_rate.gammalr(x, k=2, t=2, a=1)

torchlib.optim.lr_scheduler module

class torchlib.optim.lr_scheduler.GaussianLR(optimizer, t_eta_max, sigma1, sigma2, eta_start=1e-06, eta_stop=1e-05, last_epoch=- 1)

Bases: torch.optim.lr_scheduler._LRScheduler

Set the learning rate of each parameter group using a double gaussian kernel schedule

_images/GaussianLREquation.png

where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators.

The maximum learning rate are the base learning rate setted in Optimizer.

Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • t_eta_max (int) – Iterations when the learning rate reach to the maximum value \(\eta_{\max}\).

  • sigma1 (int) – Controls the shape of warming up phase.

  • sigma2 (int) – Controls the shape of annealing phase.

  • eta_start (float) – Starting learning rate. Default: 0.

  • eta_stop (float) – Stopping learning rate. Default: 0.

  • last_epoch (int) – The index of last epoch. Default: -1.

Examples

_images/DoubleGaussianKernelLR.png

The results shown in the above figure can be obtained by the following codes.

import torch as th
import torchlib as tl
import matplotlib; matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

lr = 1e-1
lr = 1e-2
# lr = 1e2

num_epochs = 1000
num_epochs = 500
batch_size = 8
num_batch = 750

params = {th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        }

optimizer = th.optim.Adam(params, lr=lr)
# optimizer = th.optim.SGD(params, lr=lr, momentum=0.9)
scheduler = tl.optim.lr_scheduler.GaussianLR(optimizer, t_eta_max=50, sigma1=15, sigma2=100, eta_start=1e-4, eta_stop=1e-3, last_epoch=-1)

print(optimizer)

lrs = []
for n in range(num_epochs):
    for b in range(num_batch):

        optimizer.step()

        # lrs.append(optimizer.param_groups[0]['lr'])

    scheduler.step()
    lrs.append(optimizer.param_groups[0]['lr'])

plt.figure()
plt.plot(lrs)
plt.xlabel('Iteration')
plt.ylabel('Learning rate')
plt.grid()
plt.show()
get_lr()
class torchlib.optim.lr_scheduler.MountainLR(optimizer, total_epoch, peak_epoch, period_epoch, last_epoch=- 1)

Bases: torch.optim.lr_scheduler._LRScheduler

Set the learning rate of each parameter group using a double gaussian kernel

\[(|x-P| / N) .* (-2 + cos(2 * (x-P) / T)) \]

schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators.

The maximum learning rate are the base learning rate setted in Optimizer.

Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • t_eta_max (int) – Iterations when the learning rate reach to the maximum value \(\eta_{\max}\).

  • sigma1 (int) – Controls the shape of warming up phase.

  • sigma2 (int) – Controls the shape of annealing phase.

  • eta_start (float) – Starting learning rate. Default: 0.

  • eta_stop (float) – Stopping learning rate. Default: 0.

  • last_epoch (int) – The index of last epoch. Default: -1.

Examples

_images/MountainLR.png

The results shown in the above figure can be obtained by the following codes.

import torch as th
import torchlib as tl
import matplotlib; matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

lr = 1e-1
lr = 1e-2
# lr = 1e2

num_epochs = 1000
num_epochs = 500
batch_size = 8
num_batch = 750

params = {th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        }

optimizer = th.optim.Adam(params, lr=lr)
scheduler = tl.optim.lr_scheduler.MountainLR(optimizer, total_epoch=num_epochs, peak_epoch=300, period_epoch=50, last_epoch=-1)

print(optimizer)

lrs = []
for n in range(num_epochs):
    for b in range(num_batch):

        optimizer.step()

        # lrs.append(optimizer.param_groups[0]['lr'])

    scheduler.step()
    lrs.append(optimizer.param_groups[0]['lr'])

plt.figure()
plt.plot(lrs)
plt.xlabel('Iteration')
plt.ylabel('Learning rate')
plt.grid()
plt.show()
get_lr()

torchlib.optim.save_load module

torchlib.optim.save_load.device_transfer(obj, name, device)

Module contents