gpt-2 中文注释对gpt-2代码进行了梳理|电子爱好者

admin管理员组
文章数量:1530281

查了一下关于原生GPT-2的资料比较少，而且源代码注释比较少，我就自己读了一遍代码并且用中文注释起来了。在这里记录一下。

GPT-2简介：

GPT-2是openAI开发的一个基于transform的开源深度学习架构，它只使用了transform的deconding部分。源代码：https://github/openai/gpt-2

GPT-2使用：

1、下载下来gpt-2之后，首先下载与训练模型，使用download_model.py, 在终端运行:

python3 download_model.py 124M

下载124M的模型，还有其它可选项，M就是大小MB
这个是下载下来的模型：

encoder.py 会使用一下文件：
encoder.json: 文字编码
vocab.bpe: BPE编码

超参数：
hparams.json

预训练模型：
checkpoint
model.ckpt.data-0000-of-00001
model.ckpt.index # 存放权重
model.ckpt.meta # 存放图结构

2、在/gpt-2/src/中找到interactive_conditional_samples.py，直接运行，即可使用。

GPT-2代码注释：

以interactive_conditional_samples.py为例：
interactive_conditional_samples.py

#!/usr/bin/env python3

import fire
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

def interact_model(
    model_name='124M',
    seed=None,
    nsamples=2,
    batch_size=2,
    length=None,
    temperature=1,  # 随机度 越大越随机
    top_k=0,  # 取第几个概率值
    top_p=1,  #
    models_dir='models',
):
    """
    Interactively run the model
    :model_name=124M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :nsamples=1 : Number of samples to return total
    :batch_size=1 : Number of batches (only affects speed/memory).  Must divide nsamples.
    :length=None : Number of tokens in generated text, if None (default), is
     determined by model hyperparameters
    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.
    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
     :models_dir : path to parent folder containing model subfolders
     (i.e. contains the <model_name> folder)
    """
    models_dir = os.path.expanduser(os.path.expandvars(models_dir))
    if batch_size is None:
        batch_size = 1
    assert nsamples % batch_size == 0

    enc = encoder.get_encoder(model_name, models_dir)   # (encode, BPE)
    hparams = model.default_hparams()
    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))  # Hparams可以用于超参数调优

    if length is None:  #  length 标记的数量
        length = hparams.n_ctx // 2
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)
        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k, top_p=top_p
        )

        saver = tf.train.Saver()   # 保存和读取模型的
        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))  # 读取模型
        saver.restore(sess, ckpt)

        while True:
            raw_text = input("Model prompt >>> ")
            while not raw_text:
                print('Prompt should not be empty!')
                raw_text = input("Model prompt >>> ")
            context_tokens = enc.encode(raw_text)  # 文本编码
            generated = 0
            for _ in range(nsamples // batch_size):
                out = sess.run(output, feed_dict={
                    context: [context_tokens for _ in range(batch_size)]
                })[:, len(context_tokens):]   # 通过run绑定原有模型和现在的图
                for i in range(batch_size):
                    generated += 1
                    text = enc.decode(out[i])
                    print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
                    print(text)
            print("=" * 80)

if __name__ == '__main__':
    fire.Fire(interact_model)

sample.py

import tensorflow as tf

import model

def top_k_logits(logits, k):  ## 计算top_k
    if k == 0:
        # no truncation
        return logits

    def _top_k():
        values, _ = tf.nn.top_k(logits, k=k)
        min_values = values[:, -1, tf.newaxis]
        return tf.where(
            logits < min_values,
            tf.ones_like(logits, dtype=logits.dtype) * -1e10,
            logits
        )
    return tf.cond(
       tf.equal(k, 0),
       lambda: logits,
       lambda: _top_k(),
    )


def top_p_logits(logits, p):  ### 计算top_p, 等于1时相当于没计算
    """Nucleus sampling"""
    batch, _ = logits.shape.as_list()
    sorted_logits = tf.sort(logits, direction='DESCENDING', axis=-1)
    cumulative_probs = tf.cumsum(tf.nn.softmax(sorted_logits, axis=-1), axis=-1)
    indices = tf.stack([
        tf.range(0, batch),
        # number of indices to include
        tf.maximum(tf.reduce_sum(tf.cast(cumulative_probs <= p, tf.int32), axis=-1) - 1, 0),
    ], axis=-1)
    min_values = tf.gather_nd(sorted_logits, indices)  # 按照indices的格式从sorted_logits中抽取切片
    return tf.where(  # 若condition=True,则返回对应X的值，False则返回对应的Y值。
        logits < min_values,
        tf.ones_like(logits) * -1e10,
        logits,
    )


def sample_sequence(*, hparams, length, start_token=None, batch_size=None, context=None, temperature=1, top_k=0, top_p=1):
    if start_token is None:
        assert context is not None, 'Specify exactly one of start_token and context!'
    else:
        assert context is None, 'Specify exactly one of start_token and context!'
        context = tf.fill([batch_size, 1], start_token)  # 填充

    def step(hparams, tokens, past=None):
        lm_output = model.model(hparams=hparams, X=tokens, past=past, reuse=tf.AUTO_REUSE)   # tf.AUTO_REUSE共享变量作用域 节省内存空间

        logits = lm_output['logits'][:, :, :hparams.n_vocab]
        presents = lm_output['present']
        presents.set_shape(model.past_shape(hparams=hparams, batch_size=batch_size))
        return {
            'logits': logits,
            'presents': presents,
        }

    with tf.name_scope('sample_sequence'):
        def body(past, prev, output):
            next_outputs = step(hparams, prev, past=past)  # shape=(1, ?, 50257)
            logits = next_outputs['logits'][:, -1, :]  / tf.to_float(temperature)  ## 只要最后一个输出的值（可能值的概率向量）
            logits = top_k_logits(logits, k=top_k)
            logits = top_p_logits(logits, p=top_p)   ## [00,00,0.2,00,1,] 概率
            samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)  ### 这里限制了仅仅采样一个值
            return [
                next_outputs['presents'] if past is None else tf.concat([past, next_outputs['presents']], axis=-2), # present 是每一层的[k,v]
                samples,
                tf.concat([output, samples], axis=1)
            ]

        past, prev, output = body(None, context, context)

        def cond(*args):
            return True

        _, _, tokens = tf.while_loop(  # 循环  loop_vars既是输出值也是下次循环的输入值
            cond=cond, body=body,
            maximum_iterations=length - 1,
            loop_vars=[
                past,
                prev,
                output
            ],
            shape_invariants=[
                tf.TensorShape(model.past_shape(hparams=hparams, batch_size=batch_size)),
                tf.TensorShape([batch_size, None]),
                tf.TensorShape([batch_size, None]),
            ],
            back_prop=False,
        )

        return tokens

model.py

import numpy as np
import tensorflow as tf
from tensorflow.contrib.training import HParams

def default_hparams():
    return HParams(
        n_vocab=0,
        n_ctx=1024,
        n_embd=768,
        n_head=12,
        n_layer=12,
    )

def shape_list(x):
    """Deal with dynamic shape in tensorflow cleanly."""
    static = x.shape.as_list()
    dynamic = tf.shape(x)
    return [dynamic[i] if s is None else s for i, s in enumerate(static)]  # 有值的返回，没有值的待定

def softmax(x, axis=-1):
    x = x - tf.reduce_max(x, axis=axis, keepdims=True)
    ex = tf.exp(x)
    return ex / tf.reduce_sum(ex, axis=axis, keepdims=True)

def gelu(x):
    return 0.5*x*(1+tf.tanh(np.sqrt(2/np.pi)*(x+0.044715*tf.pow(x, 3))))

def norm(x, scope, *, axis=-1, epsilon=1e-5):  ## norm层
    """Normalize to mean = 0, std = 1, then do a diagonal affine transform."""
    with tf.variable_scope(scope):
        n_state = x.shape[-1].value
        g = tf.get_variable('g', [n_state], initializer=tf.constant_initializer(1))
        b = tf.get_variable('b', [n_state], initializer=tf.constant_initializer(0))
        u = tf.reduce_mean(x, axis=axis, keepdims=True)
        s = tf.reduce_mean(tf.square(x-u), axis=axis, keepdims=True)
        x = (x - u) * tf.rsqrt(s + epsilon)
        x = x*g + b
        return x

def split_states(x, n):
    """Reshape the last dimension of x into [n, x.shape[-1]/n]."""
    *start, m = shape_list(x)  # *start 接受返回的除最后一个以外的值
    return tf.reshape(x, start + [n, m//n])  # 准备多头

def merge_states(x):
    """Smash the last two dimensions of x into a single dimension."""
    *start, a, b = shape_list(x)
    return tf.reshape(x, start + [a*b])

def conv1d(x, scope, nf, *, w_init_stdev=0.02):
    with tf.variable_scope(scope):
        *start, nx = shape_list(x)
        w = tf.get_variable('w', [1, nx, nf], initializer=tf.random_normal_initializer(stddev=w_init_stdev))
        b = tf.get_variable('b', [nf], initializer=tf.constant_initializer(0))
        c = tf.reshape(tf.matmul(tf.reshape(x, [-1, nx]), tf.reshape(w, [-1, nf]))+b, start+[nf])
        return c

def attention_mask(nd, ns, *, dtype):
    """1's in the lower triangle, counting from the lower right corner.

    Same as tf.matrix_band_part(tf.ones([nd, ns]), -1, ns-nd), but doesn't produce garbage on TPUs.
    """
    i = tf.range(nd)[:,None]
    j = tf.range(ns)
    m = i >= j - ns + nd
    return tf.cast(m, dtype)


def attn(x, scope, n_state, *, past, hparams):  # 注意力decodinig的实现
    assert x.shape.ndims == 3  # Should be [batch, sequence, features]
    assert n_state % hparams.n_head == 0
    if past is not None:
        assert past.shape.ndims == 5  # Should be [batch, 2, heads, sequence, features], where 2 is [k, v]

    def split_heads(x):
        # From [batch, sequence, features] to [batch, heads, sequence, features]
        return tf.transpose(split_states(x, hparams.n_head), [0, 2, 1, 3])  # tf.transpose(input, [dimension_1, dimenaion_2,…,dimension_n]):这个函数主要适用于交换输入张量的不同维度用的

    def merge_heads(x):
        # Reverse of split_heads
        return merge_states(tf.transpose(x, [0, 2, 1, 3]))

    def mask_attn_weights(w):
        # w has shape [batch, heads, dst_sequence, src_sequence], where information flows from src to dst.
        _, _, nd, ns = shape_list(w)
        b = attention_mask(nd, ns, dtype=w.dtype)
        b = tf.reshape(b, [1, 1, nd, ns])
        w = w*b - tf.cast(1e10, w.dtype)*(1-b)
        return w

    def multihead_attn(q, k, v):
        # q, k, v have shape [batch, heads, sequence, features]
        w = tf.matmul(q, k, transpose_b=True)  # 这里k转置
        w = w * tf.rsqrt(tf.cast(v.shape[-1].value, w.dtype))

        w = mask_attn_weights(w)
        w = softmax(w)
        a = tf.matmul(w, v) ## 如果past不为None 这一步之后 又回到了原来矩阵的shap
        return a

    with tf.variable_scope(scope):
        c = conv1d(x, 'c_attn', n_state*3)  # shape=(?, ?, 2304) 扩大 tf.split(c, 3, 2) 又分开了
        q, k, v = map(split_heads, tf.split(c, 3, axis=2))   # list(map(square, [1,2,3,4,5])) -> [1, 4, 9, 16, 25]
        present = tf.stack([k, v], axis=1)
        if past is not None:
            pk, pv = tf.unstack(past, axis=1)
            k = tf.concat([pk, k], axis=-2)
            v = tf.concat([pv, v], axis=-2)
        a = multihead_attn(q, k, v)  # a shape=(?, 12, ?, 64)
        a = merge_heads(a)  ## 合并多头
        a = conv1d(a, 'c_proj', n_state) # 再次卷积
        return a, present  ## 预测的值, 每次的[k,v]


def mlp(x, scope, n_state, *, hparams):
    with tf.variable_scope(scope):
        nx = x.shape[-1].value
        h = gelu(conv1d(x, 'c_fc', n_state))
        h2 = conv1d(h, 'c_proj', nx)
        return h2


def block(x, scope, *, past, hparams):
    with tf.variable_scope(scope):
        nx = x.shape[-1].value  ## shape [None, None, 768]
        a, present = attn(norm(x, 'ln_1'), 'attn', nx, past=past, hparams=hparams)
        x = x + a
        m = mlp(norm(x, 'ln_2'), 'mlp', nx*4, hparams=hparams)  # MLP多层感知机
        x = x + m
        return x, present

def past_shape(*, hparams, batch_size=None, sequence=None):
    return [batch_size, hparams.n_layer, 2, hparams.n_head, sequence, hparams.n_embd // hparams.n_head]

def expand_tile(value, size):
    """Add a new axis of given size."""
    value = tf.convert_to_tensor(value, name='value')
    ndims = value.shape.ndims  # 维数
    return tf.tile(tf.expand_dims(value, axis=0), [size] + [1]*ndims)  #  这里+代表的是合并; tf.expand_dims增加一个维度 tf.tile用来对张量(Tensor)进行扩展的，其特点是对当前张量内的数据进行一定规则的复制。最终的输出张量维度不变。

def positions_for(tokens, past_length):  ## 最终相对输出位置矩阵
    batch_size = tf.shape(tokens)[0]  # 将矩阵的维度输出为一个维度矩阵
    nsteps = tf.shape(tokens)[1]
    return expand_tile(past_length + tf.range(nsteps), batch_size) # tf.range(limit, delta=1, dtype=None, name='range');


def model(hparams, X, past=None, scope='model', reuse=False):
    with tf.variable_scope(scope, reuse=reuse):  # 共享变量model
        results = {}
        batch, sequence = shape_list(X)  ## 这里会根据X的shape进行调整

        wpe = tf.get_variable('wpe', [hparams.n_ctx, hparams.n_embd],  # 位置编码权重 []
                             initializer=tf.random_normal_initializer(stddev=0.01))  # 如果已经定义了则不重复定义
        wte = tf.get_variable('wte', [hparams.n_vocab, hparams.n_embd],  # 编码权重
                             initializer=tf.random_normal_initializer(stddev=0.02))
        past_length = 0 if past is None else tf.shape(past)[-2]
        h = tf.gather(wte, X) + tf.gather(wpe, positions_for(X, past_length)) # tf.gather(params,indices,axis=0 ) 从params的axis维根据indices的参数值获取切片

        # Transformer
        presents = []
        pasts = tf.unstack(past, axis=1) if past is not None else [None] * hparams.n_layer   # 对矩阵进行分解
        assert len(pasts) == hparams.n_layer
        for layer, past in enumerate(pasts):
            h, present = block(h, 'h%d' % layer, past=past, hparams=hparams)
            presents.append(present)
        results['present'] = tf.stack(presents, axis=1)
        h = norm(h, 'ln_f')   # shape=(?, ?, 768)

        # Language model loss.  Do tokens <n predict token n?
        h_flat = tf.reshape(h, [batch*sequence, hparams.n_embd])
        logits = tf.matmul(h_flat, wte, transpose_b=True)  ## 解码 返回：获得每个词的概率[[0.1,0.001,0.3...*5万]*输入的单词个数]*bs
        logits = tf.reshape(logits, [batch, sequence, hparams.n_vocab]) ## 重新回batch.
        results['logits'] = logits
        return results

本文标签：注释中文进行了代码 gpt

版权声明：本文标题：gpt-2 中文注释对gpt-2代码进行了梳理内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dianzi/1725589115a1031526.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

gpt-2 中文注释 对gpt-2代码进行了梳理

查了一下 关于原生GPT-2的资料比较少，而且源代码注释比较少，我就自己读了一遍代码并且用中文注释起来了。在这里记录一下。

GPT-2简介：

GPT-2使用：

GPT-2代码注释：

更多相关文章

一款非常实用的数据恢复软件Ontrack EasyRecovery 16中文破解版免费下载安装激活

java中contact方法_Java ContactListener.endContact方法代码示例

C#代码连接Oracle数据库一段时间以后[connection lost contact]的问题

Benvista PhotoZoom Pro 9.0.3 中文破解版-图片无损放大含独立版+PS插件-图片无损放大含独立版+PS插件

PhotoZoom Pro 9.0.4中文破解版：图像放大的艺术新境界

为什么python代码运行不了_人生苦短，为什么我要用Python？

python实现bt下载器_10行 Python代码使用磁力链接批量下载种子

路由器管理系统html代码,网页管理中心做的路由器换ip小助手

GPTs 无需代码即可开发自己的ChatGPT 使用方法和示例详解

用ChatGPT写代码学物联网，10分钟模拟设备并查看数据

ChatGPT transformer 5篇经典论文以及代码和解读

AnythingLLM：私人 ChatGPT，构建专属知识库，本地代码库问答助手

探索未知世界的奇妙之旅：ChatGPT 代码生成

Mina中的基于DLG的Plonk polynomial commitment scheme代码解析

【Python】发送邮件，超详细看图敲码（附完整代码）一一CSDN21天学习挑战赛

win10蓝屏代码_电脑蓝屏代码在哪看？如何解决

Chrome浏览器中如何将开发者工具（F12）语言从英文设置成中文

服务器蓝屏显示7f,电脑蓝屏代码7f该怎么解决

Windows2012 系统从MBR转GPT免重装的经验

会声会影2023旗舰版V26.0.0.136完整版2024免费下载最新中文旗舰版新功能讲解

发表评论

推荐文章

运行roscore出现Unable to contact my own server at [http:192.168.0.101:35099].

查看计算机配置在哪里,怎么看电脑配置 查看电脑配置的方法有哪些

坚持#第17天~回忆重装系统

Centos7-Linux

暴风影音：欢迎百度加入播放器客户端队伍

热门文章

Win11怎么进入高级启动？Win11进入高级启动模式的方法

Windows10系统的启动流程

8个免费在线字体转换器

Zcash中的hash函数

[转]QQ邮箱开通提醒免费发送到手机

基于Springboot的Java邮件系统的设计与实现（附论文和源码）

删除win10电脑U盘使用记录

计算机睡眠和休眠的区别win10,win10睡眠和休眠有何不同_win10休眠和睡眠的区别...

计算机ip地址和用户名和密码是什么,登陆无线路由器的IP地址是多少？怎么查看登录地址...

中标麒麟系统修改系统启动选项（亲测有效）

最新文章

计算机无法连接单片机,Win10系统识别不了51单片机怎么办？Windows10无法识别51单片机的解决方法...

win10计算机 权限,介绍电脑windows10管理员权限开启的4种方法

Eclipse详细安装步骤 适用于windows 10

Windows 10无法设置静态IP的解决办法

win11开始菜单怎么修改成win10风格 Windows11开始菜单修改成win10右键风格的设置方法

如何解决Windows 10中的LogonUI.exe错误

win10+ubuntu双系统下，ubuntu不能访问windows的磁盘分区

Windows10蓝牙驱动丢失，100%解决方案

windows10安装Ubuntu子系统

【第022篇】解决win10系统使用Windows 照片查看器无法显示此图片，因为计算机上的可用内存可能不足

如何设置Windows10定时重启？

在Windows10家庭版安装Docker遇到问题的解决方法

windows10和fedora 23双系统安装，用easyBCD实现windows引导fedora

win台式找不到计算机管理,win10系统打开计算机管理提示windows找不到文件computer management.lnk的具体方案...

win10无法装载重装系统iso文件_Win10系统如何装载和弹出ISO镜像文件？

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

gpt-2 中文注释对gpt-2代码进行了梳理

查了一下关于原生GPT-2的资料比较少，而且源代码注释比较少，我就自己读了一遍代码并且用中文注释起来了。在这里记录一下。

查看计算机配置在哪里,怎么看电脑配置查看电脑配置的方法有哪些

win10计算机权限,介绍电脑windows10管理员权限开启的4种方法

Eclipse详细安装步骤适用于windows 10

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载