admin管理员组

文章数量:1530282

今天看到师兄的代码里面用到了amp包,但是我在使用的时候遇到了apx无法使用的问题,后来知道pytorch已经集成了amp,因此学习了一下pytorch中amp的使用。

官网https://pytorch/docs/stable/amp.html?highlight=amp

torch.cuda.amp

作用:

torch.cuda.amp提供了可以使用混合精度的方便方法,以加速训练。在网络中,有一些操作,例如linear layer和convolution,在float16时会更加快速,而另外一些操作,例如reduction,会需要float32的动态范围,混合精度就是在尽可能地将每一种操作匹配到最合适的精度。

使用方法:

import torch.cuda.amp.autocast as aotucast

import torch.cuda.amp.GradScaler as GradScaler
  • Typical Mixed Precision Training
# create model and optimizer in default precision

model = Net().cuda()
optimizer = optim.SGD(model.patameters(), ...)

scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()
        with aotucast():
            output = model(input)
            loss = loss_fn(output, target)
        
        scaler.scale(loss).backward()
        
        scaler.step(optimizer)
        
        scaler.update()
        
  • Working with unscaled Gradients
scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()
        
        with aotucast():
            output = model(input)
            loss = loss_fn(output, target)
        scaler.scale(loss).backward()
        
        scaler.unscale_(optimizer)
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
        
        scaler.step(optimizer)
        
        scaler.update()
  • Working with Scaled Gradients (gradient accumulation)
scaler = GradScaler()

for epoch in epochs:
    for i (input, target) in enumerate(data):
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
            loss = loss / iters_to_accumulate
        if((i+1)%iters_to_accumulate == 0):
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()
    

Working with Multiple Models, Losses, and Optimizers

scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer0.zero_grad()
        optimizer1.zero_grad()
        with autocast():
            output0 = model0(input)
            output1 = model1(input)
            loss0 = loss_fn(2*output0 + 3*output1, target)
            loss1 = loss_fn(3*output0 - 5*output1, target)
        scaler.scale(loss0).backward(retain_graph = True)
        scaler.scale(loss1).backward()
        
        scaler.unscale_(optimizer0)
        scaler.step(optimizer0)
        scaler.step(optimizer1)
        scaler.update()

 

本文标签: torchCUDAamp