torchbox.optim package

Submodules

torchbox.optim.learning_rate module

class torchbox.optim.learning_rate.LrFinder(device='cpu', plotdir=None, logf=None)

Bases: object

find(dataloader, model, optimizer, criterion, nin=1, nout=1, nbgc=1, lr_init=1e-08, lr_final=100.0, beta=0.98, gamma=4.0)

Find learning rate

Find learning rate, see How Do You Find A Good Learning Rate .

During traing, two types losses are computed

The average loss is:

\[\rm{avg\_loss}_i=\beta * \rm{avg\_loss}_{i-1}+(1-\beta) * \rm{loss}_i \]

The smoothed loss is:

\[\rm{smt\_loss }_{i}=\frac{\rm{avg\_loss}_{i}}{1-\beta^{i+1}} \]

If \(i > 1\) and \(\rm{smt\_loss} > \gamma * \rm{best\_loss}\), stop.

If \(\rm{smt\_loss} < \rm{best\_loss}\) or \(i = 1\), let \(\rm{best\_loss} = \rm{smt\_loss}\).

Parameters:
  • dataloader (DataLoader) – The dataloader that contains a dataset for training.

  • model (Module) – Your network module.

  • optimizer (Optimizer) – The optimizer such as SGD, Adam…

  • criterion (Loss) – The criterion/loss used for training model.

  • nin (int, optional) – The number of inputs of the model, the first nin elements are inputs, the rest are targets(can be None) used for computing loss. (the default is 1)

  • nou (int, optional) – The number of outputs of the model used for computing loss, it works only when the model has multiple outputs, i.e. the outputs is a tuple or list which has several tensor elements (>=1). the first nout elements are used for computing loss, the rest are ignored. (the default is 1)

  • nbgc (int, optional) – The number of batches for grad cumulation (the default is 1, which means no cumulation)

  • lr_init (int, optional) – The initial learning rate (the default is 1e-8)

  • lr_final (int, optional) – The final learning rate (the default is 1e-8)

  • beta (float, optional) – weight for weighted sum of loss (the default is 0.98)

  • gamma (float, optional) – The exploding factor \(\gamma\). (the default is 4.)

Returns:

  • lrs (list) – Learning rates during training.

  • smt_losses (list) – Smoothed losses during training.

  • avg_losses (list) – Average losses during training.

  • losses (list) – Original losses during training.

Examples

device = 'cuda:1'
# device = 'cpu'

num_epochs = 30
X = th.randn(100, 2, 3, 4)
Y = th.randn(100, 1, 3, 4)

trainds = TensorDataset(X, Y)
# trainds = TensorDataset(X)

model = th.nn.Conv2d(2, 1, 1)
model.to(device)

trainld = DataLoader(trainds, batch_size=10, shuffle=False)

criterion = th.nn.MSELoss(reduction='mean')

optimizer = th.optim.SGD(model.parameters(), lr=1e-1)

lrfinder = LrFinder(device)
# lrfinder = LrFinder(device, plotdir='./')

lrfinder.find(trainld, model, optimizer, criterion, nin=1,
              nbgc=1, lr_init=1e-8, lr_final=10., beta=0.98)

lrfinder.plot(lrmod='Linear')
lrfinder.plot(lrmod='Log')
plot(lrmod='log', loss='smoothed')

plot the loss-lr curve

Plot the loss-learning rate curve.

Parameters:
  • lrmod (str, optional) – 'log' –> use log scale, i.e. log10(lr) instead lr. (default) 'linear' –> use original lr.

  • loss (str, optional) – Specify which type of loss will be ploted. (the default is ‘smoothed’)

torchbox.optim.learning_rate.gammalr(x, k=2, t=2, a=1)

torchbox.optim.lr_scheduler module

class torchbox.optim.lr_scheduler.GaussianLR(optimizer, t_eta_max, sigma1, sigma2, eta_start=1e-06, eta_stop=1e-05, last_epoch=-1)

Bases: _LRScheduler

Set the learning rate of each parameter group using a double gaussian kernel schedule

_images/GaussianLREquation.png

where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators.

The maximum learning rate are the base learning rate setted in Optimizer.

Parameters:
  • optimizer (Optimizer) – Wrapped optimizer.

  • t_eta_max (int) – Iterations when the learning rate reach to the maximum value \(\eta_{\max}\).

  • sigma1 (int) – Controls the shape of warming up phase.

  • sigma2 (int) – Controls the shape of annealing phase.

  • eta_start (float) – Starting learning rate. Default: 0.

  • eta_stop (float) – Stopping learning rate. Default: 0.

  • last_epoch (int) – The index of last epoch. Default: -1.

Examples

_images/DoubleGaussianKernelLR.png

The results shown in the above figure can be obtained by the following codes.

import torch as th
import torchbox as tb
import matplotlib; matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

lr = 1e-1
lr = 1e-2
# lr = 1e2

num_epochs = 1000
num_epochs = 500
batch_size = 8
num_batch = 750

params = {th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        }

optimizer = th.optim.Adam(params, lr=lr)
# optimizer = th.optim.SGD(params, lr=lr, momentum=0.9)
scheduler = tb.optim.lr_scheduler.GaussianLR(optimizer, t_eta_max=50, sigma1=15, sigma2=100, eta_start=1e-4, eta_stop=1e-3, last_epoch=-1)

print(optimizer)

lrs = []
for n in range(num_epochs):
    for b in range(num_batch):

        optimizer.step()

        # lrs.append(optimizer.param_groups[0]['lr'])

    scheduler.step()
    lrs.append(optimizer.param_groups[0]['lr'])

plt.figure()
plt.plot(lrs)
plt.xlabel('Iteration')
plt.ylabel('Learning rate')
plt.grid()
plt.show()
get_lr()
class torchbox.optim.lr_scheduler.MountainLR(optimizer, total_epoch, peak_epoch, period_epoch, last_epoch=-1)

Bases: _LRScheduler

Set the learning rate of each parameter group using a double gaussian kernel

\[(|x-P| / N) .* (-2 + cos(2 * (x-P) / T)) \]

schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators.

The maximum learning rate are the base learning rate setted in Optimizer.

Parameters:
  • optimizer (Optimizer) – Wrapped optimizer.

  • t_eta_max (int) – Iterations when the learning rate reach to the maximum value \(\eta_{\max}\).

  • sigma1 (int) – Controls the shape of warming up phase.

  • sigma2 (int) – Controls the shape of annealing phase.

  • eta_start (float) – Starting learning rate. Default: 0.

  • eta_stop (float) – Stopping learning rate. Default: 0.

  • last_epoch (int) – The index of last epoch. Default: -1.

Examples

_images/MountainLR.png

The results shown in the above figure can be obtained by the following codes.

import torch as th
import torchbox as tb
import matplotlib; matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

lr = 1e-1
lr = 1e-2
# lr = 1e2

num_epochs = 1000
num_epochs = 500
batch_size = 8
num_batch = 750

params = {th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        th.nn.parameter.Parameter(th.zeros(128), requires_grad=True),
        }

optimizer = th.optim.Adam(params, lr=lr)
scheduler = tb.optim.lr_scheduler.MountainLR(optimizer, total_epoch=num_epochs, peak_epoch=300, period_epoch=50, last_epoch=-1)

print(optimizer)

lrs = []
for n in range(num_epochs):
    for b in range(num_batch):

        optimizer.step()

        # lrs.append(optimizer.param_groups[0]['lr'])

    scheduler.step()
    lrs.append(optimizer.param_groups[0]['lr'])

plt.figure()
plt.plot(lrs)
plt.xlabel('Iteration')
plt.ylabel('Learning rate')
plt.grid()
plt.show()
get_lr()

torchbox.optim.mamls_solver module

class torchbox.optim.mamls_solver.MAML(net, alpha=0.01)

Bases: object

copy_weights()
forward(x, adapted_weight=None, **kwards)
update_base(grads)
zero_grad()
class torchbox.optim.mamls_solver.MetaSGD(net)

Bases: object

copy_weights()
forward(x, adapted_weight=None, **kwards)
update_base(grads)
zero_grad()
torchbox.optim.mamls_solver.mamls_test_epoch(mmodel, mdl, criterions, criterionws=None, nsteps_base=1, epoch=None, logf='terminal', device='cuda:0', **kwargs)

Test one epoch using MAML, MetaSGD

Parameters:
  • mmodel (Module) – the network model

  • mdl (MetaDataLoader) – the meta dataloader for valid \(\{(x_s, y_s, x_q, y_q)\}\)

  • criterions (list or tuple) – list of loss function

  • criterionws (list or tuple) – list of float loss weight

  • nsteps_base (int, optional) – the number of fast adapt steps in inner loop, by default 1

  • epoch (int or None, optional) – current epoch index, by default None

  • logf (str or object, optional) – IO for print log, file path or 'terminal' (default)

  • device (str, optional) – device for training, by default 'cuda:0'

  • kwargs – other forward args

torchbox.optim.mamls_solver.mamls_train_epoch(mmodel, mdl, criterions, criterionws=None, optimizer=None, scheduler=None, nsteps_base=1, epoch=None, logf='terminal', device='cuda:0', **kwargs)

train one epoch using MAML, MetaSGD

Parameters:
  • mmodel (Module) – the network model

  • mdl (MetaDataLoader) – the meta dataloader for training \(\{(x_s, y_s, x_q, y_q)\}\)

  • criterions (list or tuple) – list of loss function

  • criterionws (list or tuple) – list of float loss weight

  • optimizer (Optimizer or None) – optimizer for meta learner, default is None, which means th.optim.Adam(model.parameters(), lr=0.001)

  • scheduler (LrScheduler or None, optional) – scheduler for meta learner, default is None, which means using fixed learning rate

  • nsteps_base (int, optional) – the number of fast adapt steps in inner loop, by default 1

  • epoch (int or None, optional) – current epoch index, by default None

  • logf (str or object, optional) – IO for print log, file path or 'terminal' (default)

  • device (str, optional) – device for training, by default 'cuda:0'

  • kwargs – other forward args

torchbox.optim.mamls_solver.mamls_valid_epoch(mmodel, mdl, criterions, criterionws=None, nsteps_base=1, epoch=None, logf='terminal', device='cuda:0', **kwargs)

valid one epoch using MAML, MetaSGD

Parameters:
  • mmodel (Module) – the network model

  • mdl (MetaDataLoader) – the meta dataloader for valid \(\{(x_s, y_s, x_q, y_q)\}\)

  • criterions (list or tuple) – list of loss function

  • criterionws (list or tuple) – list of float loss weight

  • nsteps_base (int, optional) – the number of fast adapt steps in inner loop, by default 1

  • epoch (int or None, optional) – current epoch index, by default None

  • logf (str or object, optional) – IO for print log, file path or 'terminal' (default)

  • device (str, optional) – device for training, by default 'cuda:0'

  • kwargs – other forward args

torchbox.optim.save_load module

torchbox.optim.save_load.device_transfer(obj, name, device)
torchbox.optim.save_load.get_parameters(model, optimizer=None, scheduler=None, epoch=None)

save model to a file

Parameters:
  • model (object) – the model object

  • optimizer (object or None, optional) – the torch.optim.Optimizer, by default None

  • scheduler (object or None, optional) – th.optim.lr_scheduler, by default None

  • epoch (int or None, optional) – epoch number, by default None

Returns:

keys: ‘epoch’, ‘network’ (model.state_dict), ‘optimizer’ (optimizer.state_dict), ‘scheduler’ (scheduler.state_dict)

Return type:

dict

torchbox.optim.save_load.load_model(modelfile, model=None, optimizer=None, scheduler=None, mode='parameter', device='cpu')

load a model from file

Parameters:
  • modelfile (str) – the model file path

  • model (object or None) – the model object or None (default)

  • optimizer (object or None, optional) – the torch.optim.Optimizer, by default None

  • scheduler (object or None, optional) – th.optim.lr_scheduler, by default None

  • mode (str, optional) – the saving mode of model in file, 'model' means saving model structure and parameters, 'parameter' means only saving parameters (default)

  • device (str, optional) – load model to the specified device

torchbox.optim.save_load.save_model(modelfile, model, optimizer=None, scheduler=None, epoch=None, mode='parameter')

save model to a file

Parameters:
  • modelfile (str) – model file path

  • model (object) – the model object or parameter dict

  • optimizer (object or None, optional) – the torch.optim.Optimizer, by default None

  • scheduler (object or None, optional) – th.optim.lr_scheduler, by default None

  • epoch (int or None, optional) – epoch number, by default None

  • mode (str, optional) – saving mode, 'model' means saving model structure and parameters, 'parameter' means only saving parameters (default)

Returns:

0 is OK

Return type:

int

torchbox.optim.solver module

torchbox.optim.solver.demo_epoch(model, x, bs, logf='stdout', device='cuda:0', **kwargs)

Test one epoch

Parameters:
  • model (function handle) – an instance of torch.nn.Module

  • x (tensor) – the input data

  • bs (int) – batch size

  • logf (str or object, optional) – IO for print log, file object or 'stdout' (default)

  • device (str, optional) – device for testing, by default 'cuda:0'

  • kwargs – other forward args

:param see also train_epoch(): :param valid_epoch(): :param save_model(): :param load_model().:

torchbox.optim.solver.test_epoch(model, dl, criterions, criterionws=None, epoch=None, logf='stdout', device='cuda:0', **kwargs)

Test one epoch

Parameters:
  • model (function handle) – an instance of torch.nn.Module

  • dl (dataloder) – the testing dataloader

  • criterions (list or tuple) – list of loss function

  • criterionws (list or tuple) – list of float loss weight

  • epoch (int or None) – epoch index, default is None

  • logf (str or object, optional) – IO for print log, file object or 'stdout' (default)

  • device (str, optional) – device for testing, by default 'cuda:0'

  • kwargs – other forward args

:param see also train_epoch(): :param valid_epoch(): :param save_model(): :param load_model().:

torchbox.optim.solver.train_epoch(model, dl, criterions, criterionws=None, optimizer=None, scheduler=None, epoch=None, logf='stdout', device='cuda:0', **kwargs)

train one epoch

Parameters:
  • model (Module) – an instance of torch.nn.Module

  • dl (DataLoader) – the dataloader for training

  • criterions (list or tuple) – list of loss function

  • criterionws (list or tuple) – list of float loss weight

  • optimizer (Optimizer or None) – an instance of torch.optim.Optimizer, default is None, which means th.optim.Adam(model.parameters(), lr=0.001)

  • scheduler (LrScheduler or None) – an instance of torch.optim.LrScheduler, default is None, which means using fixed learning rate

  • epoch (int) – epoch index

  • logf (str or object, optional) – IO for print log, file object or 'stdout' (default)

  • device (str, optional) – device for training, by default 'cuda:0'

  • kwargs – other forward args

:param see also valid_epoch(): :param test_epoch(): :param save_model(): :param load_model().:

torchbox.optim.solver.valid_epoch(model, dl, criterions, criterionws=None, epoch=None, logf='stdout', device='cuda:0', **kwargs)

valid one epoch

Parameters:
  • model (function handle) – an instance of torch.nn.Module

  • dl (dataloder) – the validation dataloader

  • criterions (list or tuple) – list of loss function

  • criterionws (list or tuple) – list of float loss weight

  • epoch (int) – epoch index, default is None

  • logf (str or object, optional) – IO for print log, file object or 'stdout' (default)

  • device (str, optional) – device for validation, by default 'cuda:0'

  • kwargs – other forward args

:param see also train_epoch(): :param test_epoch(): :param save_model(): :param load_model().:

Module contents