【完结】cyのMemo（20240609~20240721）|电子爱好者

admin管理员组
文章数量:1532180

序言

节后首日，中期结束，接下来一切即将如期而至，时光像道皮鞭在赶鸭子上架，但似乎还并没有准备好一切。

最近几日，休息得不是很好，状态跌落，但总之手头事情告一段落，可以好好休整几日，韬光养晦。

安静了许多，大家都很忙碌。

筵席终散。然，命运的交点，将于未来邂逅。

有感，且歌一场。

《五月初五登高赋》囚生
端午登高祈福佑，轻语绕梁疾未休。
花开叶落人渐朽，日没月升活似囚。
丛木泥径盘峰秀，远岫倩影碧空游。
君不见月圆常有，空太息圆月难留。

文章目录

序言
20240611
20240612
20240613
20240614
20240615~20240616
20240617
20240618
20240619
20240620
20240621
20240622
20240623
20240624
20240625
20240626
20240627
20240628
20240629
20240630
20240701
20240702
20240703
20240704
20240705~20240706
20240707
20240708
20240709
20240710
20240711
20240712
20240713
20240714
20240715
20240716
20240717
20240718
20240719
20240720
20240721
后记

20240611

第一层和第一个头的值权重矩阵如下所示。

v_layer0_head0 = v_layer0 [0] v_layer0_head0.shape
# torch.Size ([128, 4096])

现在使用值权重来获取每个 token 的注意力值，其大小为 [17x128]，其中 17 为提示中的 token 数，128 为每个 token 的值向量维数。

v_per_token = torch.matmul (token_embeddings, v_layer0_head0.T)v_per_token.shape
# torch.Size ([17, 128])

与每个 token 的值相乘后得到的注意力向量的形状为 [17*128]。

qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token) qkv_attention.shape
# torch.Size ([17, 128])

现在有了第一层和第一个头的注意力值。

接下来运行一个循环并执行与上面单元完全相同的数学运算，不过第一层中的每个头除外。

qkv_attention_store = []
for head in range (n_heads):
    q_layer0_head = q_layer0 [head]
    k_layer0_head = k_layer0 [head//4] # key weights are shared across 4 heads
v_layer0_head = v_layer0 [head//4] # value weights are shared across 4 heads
q_per_token = torch.matmul (token_embeddings, q_layer0_head.T)
    k_per_token = torch.matmul (token_embeddings, k_layer0_head.T)
    v_per_token = torch.matmul (token_embeddings, v_layer0_head.T)

    q_per_token_split_into_pairs = q_per_token.float ().view (q_per_token.shape [0], -1, 2)
    q_per_token_as_complex_numbers = torch.view_as_complex (q_per_token_split_into_pairs)
    q_per_token_split_into_pairs_rotated = torch.view_as_real (q_per_token_as_complex_numbers * freqs_cis [:len (tokens)])
    q_per_token_rotated = q_per_token_split_into_pairs_rotated.view (q_per_token.shape)

    k_per_token_split_into_pairs = k_per_token.float ().view (k_per_token.shape [0], -1, 2)
    k_per_token_as_complex_numbers = torch.view_as_complex (k_per_token_split_into_pairs)
    k_per_token_split_into_pairs_rotated = torch.view_as_real (k_per_token_as_complex_numbers * freqs_cis [:len (tokens)])
    k_per_token_rotated = k_per_token_split_into_pairs_rotated.view (k_per_token.shape)

    qk_per_token = torch.matmul (q_per_token_rotated, k_per_token_rotated.T)/(128)**0.5
mask = torch.full ((len (tokens), len (tokens)), float ("-inf"), device=tokens.device)
    mask = torch.triu (mask, diagonal=1)
    qk_per_token_after_masking = qk_per_token + mask
qk_per_token_after_masking_after_softmax = torch.nn.functional.softmax (qk_per_token_after_masking, dim=1).to (torch.bfloat16)
    qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token)
    qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token)
    qkv_attention_store.append (qkv_attention)
len (qkv_attention_store)
# 32

现在第一层上的所有 32 个头都有了 qkv_attention 矩阵，并在快结束的时候将所有注意力分数合并为一个大小为 [17x4096] 的大矩阵。

stacked_qkv_attention = torch.cat (qkv_attention_store, dim=-1) stacked_qkv_attention.shape
# torch.Size ([17, 4096])

20240612

8号资格赛那天傍晚，补了30箭步×8组（+20kg，终于有20kg的片用了）。9号和10号很忙，都没怎么跑，到昨晚开完会已经九点半，出来操场熄灯，悻悻而归，身体也确实很累，好在终于可以睡个好觉，把状态给养回来吧。

尽管近期频繁熬夜，但胃口很好，一顿能吃四个菜，吃得挺多，感觉小肚子肥了一圈，加之天气炎热，原先每晚回去例行的核心训练都懒得做，腹肌似又挥之而去，而且引体也好久不练，又回到只能硬拉一两个的程度，扎心。但好歹是尽力管住嘴，没在睡前吃太多东西，饮食相对规律，但太缺觉，睡眠质量一直不太好。

今天，一言难尽的天气，阴沉大半天，闷得不行。指望赶紧下个雨，结果，下午开始出太阳，怎一个难受了得！实验室一堆咳嗽的人，感觉自己也要快热出病来，疯狂地喝水。

到晚上，还是陪自己刚一波强度，毕竟好多天没有认真训练。起手三圈跑得还是很舒服，有种可以用3’50"配速顶完10km的感觉，但是很快心肺不支，跑成四段间歇（3K@3’49"+2K@3’47"+2K@4’15"+3K@3’48"），间歇心率完全恢复。坦言很吃力，浑身湿透，去年夏训被支配的恐惧跃然心头，而且今天跑完脚踝稍有不适，这种强度还是少跑，有点伤，接下来正式入梅肯定也要停跑好一阵子。

PS：逮到一个高手AX，统院博二，全马329，跑姿很标准，一看就是多年跑龄，万米必有40分以内的水平。拉他参加下半年的高百分站赛，他欣然应允，他长得很像我的本科室友YC，可能也是青海人。

权重矩阵是最后的步骤之一。

第 0 层注意力要做的最后一件事是，对以下的权重矩阵进行乘法操作。

w_layer0 = model ["layers.0.attention.wo.weight"] w_layer0.shape
# torch.Size ([4096, 4096])

这是一个简单的线性层，所以只做矩阵乘法（matmul）。

embedding_delta = torch.matmul (stacked_qkv_attention, w_layer0.T) embedding_delta.shape
# torch.Size ([17, 4096])

现在，注意力之后的嵌入值有了变化，并应该被添加到原始 token 嵌入中。

embedding_after_edit = token_embeddings_unnormalized + embedding_delta
embedding_after_edit.shape
# torch.Size ([17, 4096])

归一化并在嵌入 delta 过程中运行一个前馈神经网络。

embedding_after_edit_normalized = rms_norm (embedding_after_edit, model ["layers.0.ffn_norm.weight"]) embedding_after_edit_normalized.shape
# torch.Size ([17, 4096])

加载 ff 权重，并实现前馈网络。

llama3 使用 SwiGLU前馈网络，该网络架构非常擅长在模型需要时添加非线性。当前，在 LLMs 中使用这一前馈网络是非常标准的做法。

20240613

今天跟腱明显疼痛，伤是还没完全好，或许不算太碍事，但接下来也不能再跑昨晚这种强度。慢点儿好。

晚上计划出去放纵，遂傍晚五点多去跑了会儿，太阳还没下山，晒得很难受，5’40"的配速都显得笨重，不一会儿吴征过来了，遂陪他一起跑了半个小时，有人一起确实要舒服很多，慢慢加到4’50"以内，心率保持得很好，最终是5’08"均配的7.09km，平均心率138bpm，并无不满。就这个节奏还是很舒服的，但是没有人一起真的很无聊。

可能还是得缓几天，慢慢来吧，总觉得离梦想越发遥远。

w1 = model ["layers.0.feed_forward.w1.weight"] w2 = model ["layers.0.feed_forward.w2.weight"] w3 = model ["layers.0.feed_forward.w3.weight"] output_after_feedforward = torch.matmul (torch.functional.F.silu (torch.matmul (embedding_after_edit_normalized, w1.T)) * torch.matmul (embedding_after_edit_normalized, w3.T), w2.T) output_after_feedforward.shape
torch.Size ([17, 4096])

现在终于在第一层之后为每个 token 提供了新的编辑后的嵌入，并且在完成之前只剩下 31 层需要处理（one for loop away）。

你可以想象这个编辑后的嵌入拥有在第一层上所有查询的信息。现在每一层将在所问问题上编码越来越复杂的查询，直到得到的嵌入了解所需的下一个 token 的一切。

layer_0_embedding = embedding_after_edit+output_after_feedforward
layer_0_embedding.shape
torch.Size ([17, 4096])

之前为每一层做的所有事情，都可以一次性完成。


final_embedding = token_embeddings_unnormalized
for layer in range (n_layers):
    qkv_attention_store = []
    layer_embedding_norm = rms_norm (final_embedding, model [f"layers.{layer}.attention_norm.weight"])
    q_layer = model [f"layers.{layer}.attention.wq.weight"]
    q_layer = q_layer.view (n_heads, q_layer.shape [0] //n_heads, dim)
    k_layer = model [f"layers.{layer}.attention.wk.weight"]
    k_layer = k_layer.view (n_kv_heads, k_layer.shape [0] //n_kv_heads, dim)
    v_layer = model [f"layers.{layer}.attention.wv.weight"]
    v_layer = v_layer.view (n_kv_heads, v_layer.shape [0] //n_kv_heads, dim)
    w_layer = model [f"layers.{layer}.attention.wo.weight"]
    for head in range (n_heads):
        q_layer_head = q_layer [head]
        k_layer_head = k_layer [head//4]
        v_layer_head = v_layer [head//4]
        q_per_token = torch.matmul (layer_embedding_norm, q_layer_head.T)
        k_per_token = torch.matmul (layer_embedding_norm, k_layer_head.T)
        v_per_token = torch.matmul (layer_embedding_norm, v_layer_head.T)
        q_per_token_split_into_pairs = q_per_token.float ().view (q_per_token.shape [0], -1, 2)
        q_per_token_as_complex_numbers = torch.view_as_complex (q_per_token_split_into_pairs)
        q_per_token_split_into_pairs_rotated = torch.view_as_real (q_per_token_as_complex_numbers * freqs_cis)
        q_per_token_rotated = q_per_token_split_into_pairs_rotated.view (q_per_token.shape)
        k_per_token_split_into_pairs = k_per_token.float ().view (k_per_token.shape [0], -1, 2)
        k_per_token_as_complex_numbers = torch.view_as_complex (k_per_token_split_into_pairs)
        k_per_token_split_into_pairs_rotated = torch.view_as_real (k_per_token_as_complex_numbers * freqs_cis)
        k_per_token_rotated = k_per_token_split_into_pairs_rotated.view (k_per_token.shape)
        qk_per_token = torch.matmul (q_per_token_rotated, k_per_token_rotated.T)/(128)**0.5
        mask = torch.full ((len (token_embeddings_unnormalized), len (token_embeddings_unnormalized)), float ("-inf"))
        mask = torch.triu (mask, diagonal=1)
        qk_per_token_after_masking = qk_per_token + mask
        qk_per_token_after_masking_after_softmax = torch.nn.functional.softmax (qk_per_token_after_masking, dim=1).to (torch.bfloat16)
        qkv_attention = torch.matmul (qk_per_token_after_masking_after_softmax, v_per_token)
        qkv_attention_store.append (qkv_attention)


    stacked_qkv_attention = torch.cat (qkv_attention_store, dim=-1)
    w_layer = model [f"layers.{layer}.attention.wo.weight"]
    embedding_delta = torch.matmul (stacked_qkv_attention, w_layer.T)
    embedding_after_edit = final_embedding + embedding_delta
    embedding_after_edit_normalized = rms_norm (embedding_after_edit, model [f"layers.{layer}.ffn_norm.weight"])
    w1 = model [f"layers.{layer}.feed_forward.w1.weight"]
    w2 = model [f"layers.{layer}.feed_forward.w2.weight"]
    w3 = model [f"layers.{layer}.feed_forward.w3.weight"]
    output_after_feedforward = torch.matmul (torch.functional.F.silu (torch.matmul (embedding_after_edit_normalized, w1.T)) * torch.matmul (embedding_after_edit_normalized, w3.T), w2.T)
    final_embedding = embedding_after_edit+output_after_feedforward

现在有了最终的嵌入，即该模型对下一个 token 的最佳猜测。该嵌入的形状与常见的 token 嵌入 [17x4096] 相同，其中 17 为 token 数，4096 为嵌入维数。

final_embedding = rms_norm (final_embedding, model ["norm.weight"]) final_embedding.shape
torch.Size ([17, 4096])

将该嵌入解码为 token 值。

使用该输入解码器将最终的嵌入转换为一个 token。

model ["output.weight"].shape
torch.Size ([128256, 4096])

使用最后 token 的嵌入来预测下一个值。在示例中，42 是「生命、宇宙和万物终极问题的答案是什么」的答案，根据《银河系漫游指南》一书，大多数现代 LLMs 都会回答 42，应该验证了整个代码。

logits = torch.matmul (final_embedding [-1], model ["output.weight"].T) logits.shape
torch.Size ([128256])

模型预测 token 数 2983 为下一个 token，这是 42 的 token 数吗？以下是最后的代码单元。

next_token = torch.argmax (logits, dim=-1) next_token
tensor (2983)

最后，启动。

20240614

万米重回40分钟，只要愿意，总还是可以做到的。

原计划没有打算这么快，衣服鞋子都没有换，起步只有五分配，XR中途跟来，小家伙总不肯老实地跟在后面，把我逼得越来越快。我估摸着夏天4’10"的配速对他来说应该会有些艰难，果不其然，不到3000米他就崩了，休息了会儿，他又想跟上来，不到两圈又崩，再起不能。

前5000米用时20’49"，13圈后我停下喝口水稍作调整。后12圈提速，把平均配速拉回4分整，体感不错，八九成力，平均心率167bpm。伤痛没有复发，甚好，作为夏天的一次训练，比较满意。

关于SQL SERVER查看数据表：

SELECT TOP(100) * FROM [DATABASE_NAME].[TABLE_NAME]

SQL SERVER的语法和MYSQL还是有不少区别的，不能用SHOW指令，也不支持双引号包裹变量。第一次用这老东西有点吃瘪。

关于MDF文件，这是汽车工业领域常用的数据格式，配套的会有一个LDF文件作为日志，一般容量都非常大。

Python读取MDF文件的方法：

使用asammdf包：可以直接读取MDF文件，也可以读取通用的压缩包格式（zip, bz2, gzip）

>>> from asammdf import MDF
>>> mdf = MDF(version='3.30') # new MDF object with version 3.30
>>> mdf = MDF('path/to/file.mf4') # MDF loaded from file
>>> mdf = MDF(BytesIO(data)) # MDF from file contents
>>> mdf = MDF(zipfile.ZipFile('data.zip')) # MDF creating using the first valid MDF from archive
>>> mdf = MDF(bz2.BZ2File('path/to/data.bz2', 'rb')) # MDF from bz2 object
>>> mdf = MDF(gzip.GzipFile('path/to/data.gzip', 'rb')) # MDF from gzip object

使用mdfreader

>>> import mdfreader
>>> yop=mdfreader.Mdf('NameOfFile')
>>> yop.keys() # list channels names
# list channels grouped by raster or master channel
>>> yop.masterChannelList
>>> yop.plot('channelName') or yop.plot({'channel1','channel2'})
>>> yop.resample(0.1) or yop.resample()
>>> yop.export_to_csv(sampling=0.01)
>>> yop.export_to_NetCDF()
>>> yop.export_to_hdf5()
>>> yop.export_to_matlab()
>>> yop.export_to_excel()
>>> yop.export_to_xlsx()
>>> yop.convert_to_pandas() # converts data groups into pandas dataframes
>>> yop.write() # writes mdf file
# drops all the channels except the one in argument
>>> yop.keep_channels(['channel1','channel2','channel3'])
>>> yop.get_channel_data('channelName') # returns channel numpy array
>>> yop=mdfreader.Mdf()  # create an empty Mdf object
# add channel in Mdf object
>>> yop.add_channel(channel_name, data, master_channel, master_type, unit='lumen', description='what you want')
>>> yop.write('filename') # change version with yop.MDFVersionNumber or specifically use write3/4()

20240615~20240616

周末升温，34℃，夏天真的来了。昨晚去小姨那边吃点儿好的，连吃带拿，顺了三个鲜肉蛋黄粽子回来，以及半只红烧肉鸡。最近食堂越来越不合我意，真不知道吃啥好。

今天下午，跟吴征白嫖了张国际电影节的票去凑凑热闹，高仓健主演的《远山的呼唤》，说实话我根本不认识高仓健，也很少去影院看电影，但这两天也没啥急事，懒得在学校闷着。

日本的电影，总觉得格局不大。不止是电影，其他文学作品也大多如此。之前有个日本文学翻译家说，日本文学要少读，因为越读越小。当然就其电影艺术而言，这种典型的学院派拍摄手法确实没有可以挑剔的地方，但是故事还是缺少那种感染力，太小了，看完似乎也就结束了，也可能是时代太过久远的缘故，我们很难去产生很强的共鸣。

而且，日本文学中的一些价值观其实是有些畸形的，有些事情就很难让人理解，比如《火影忍者》里的鼬屠杀宇智波一族以防止其叛乱，一直很难理解这种事情怎么能让人称道的，再怎么牺牲小我成就大我，也不至于用这么离谱的手段，杀几个主使差不多得了，连老幼妇孺都不放过，弑父弑母这种事情无论如何也不能和正义挂钩。再比如《进击的巨人》里巨人能力继承的方式——吃人，这种就很离谱，为了继承这股力量，就需要儿子吃掉父亲，这种桥段在国内的文学中毫无疑问是不可能存在的。

总之，岛国的文化就挺极端的，再举个例子就是箱根驿传，其中六区20km的下坡，选手为了守住名次，几乎需要全程冲刺，损伤极大，几乎所有的选手跑完六区都是一头攒倒在地被担架抬走，而每个高校都必须派出一个牺牲者来承担六区的重担，几乎没有学校会把王牌放在六区上场。这种事情在中国也是挺难想象的，谁家孩子不是宝呢，哪个家长愿意让自己的孩子去做这种炮灰。

卷王LXY今天应该是刚回学校，就五分配跑了15km，看来回家一周没少练。晚上回来力量训练，30个正向箭步×4组+30个反向箭步×4组（+20kg），补5000米顺腿，跑得乱七八糟，心乱。

PS：最近童心佬在H5魔塔做出了实时弹幕系统，首次应用在《王座之上》，这个效果真的很好，极大地提升游戏体验，可惜这部作品还没完结，目前只有144层，美工顶级，剧情也超级棒，据说要到25年初才能做完，大约280层。这些人的创作热情真的可以。

亦童在集群上搭了个千问2的后端，目前外网可访问，推理速度在30秒左右，用了两张A100的卡就把72B的Qwen2给跑通了。

项目仓库在https://github/QwenLM/Qwen2，模型已在HUGGINGFACE上开源

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://10.2.170.105:43858/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="/gemini/data-1/Qwen2-72B-Instruct-GPTQ-Int4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me something about large language models."},
    ]
)
print(chat_response.choices[0].message.content)

目前，OPENAI提供LLM client封装功能，方便共享，挺好用的。

20240617

难得清闲的一段时期，追点前沿的事情，似乎已经很久不去深究，没有当年的热情。

难得清凉的夜，偶遇嘉伟，其实他最近还是有练的，就是不怎么遇得到，我6月前15天总跑量105km，平均配速4’21"，只有一两天没有跑，比不了三月巅峰期的训练质量，勉强也说得过去吧。

八点下去的时候，LXY带着嘉伟，YZZ，还有国教的QSH在遛弯，她今天又是15km，属实是有点离谱了，这是准备把回家一周落下的给全补回来么。

原计划今晚是带XR跑4分配节奏，看看他能顶多少，但是难得碰到嘉伟，他想跑速耐，我也不想扫他的兴，临时决定陪他进行倒金字塔间歇训练。嘉伟是1600米×1组+1200米×2组+800米×3组，我是1200米×1组+800米×1组+400米×5组，XR前两组分别被我拉爆一圈，就下了，当然我也没好到哪里去，每组也被嘉伟拉爆一圈，实在力不能及，尤其第二个800米跑到2分38秒，之后腿就完全软了，根本抬不起来，后面5个1’18"~1’20"的400米都是力竭，完全不可能多跑哪怕一圈。嘉伟还是一如既往的无敌，最后一组800米甚至跑到2分30秒，可怕，奖励他一瓶沁柠水，夏训当如此呀。

可惜，或许也是最后一个夏天了，当年那个大一刚入学的嘉伟，如今也将成为大四学长，跑得再快，跑不赢时间，罢了。

目前Qwen2-72B-int4勉强可以跑通，但是还是容易挂，免费的午餐并不好吃。亦童也测试了7B版本的全量模型推理，用时大概是72B的一半，性价比角度来看还是不如72B，从理论上来讲，int4和全量版本的输出误差应该不算很大，至少对于高偏差的输出结果当如此，其实看看这些。其实这个时代对于普通人来讲，大模型的区别不算太大，所谓的好坏不过是实证的结果，缺少理论支撑。

关于K-Sparse AutoEncoder与大模型可解释性研究

https://cdn.openai/papers/sparse-autoencoders.pdf
https://github/openai/sparse_autoencoder
https://openai/index/extracting-concepts-from-gpt-4/

这个其实是做模型压缩的操作，但是通过观察稀疏特征被激活的部分，可以粗略地解释大模型参数集合对输出的贡献情况。

LLAMA RAG Response Mode

https://docs.llamaindex.ai/en/latest/getting_started/concepts/
https://docs.llamaindex.ai/en/latest/getting_started/starter_example/
data => index (chunks => embedding vectors)
- embedding model: ‘text-embedding-ada-002’
  - POST https://api.openai/v1/embeddings
  - from llama_index.embeddings.openai import OpenAIEmbedding
- query => index (embedding vector)
- retrieve similarity_top_k (default 2)
- settings
  - embed_model: ‘text-embedding-ada-002’
  - llm: gpt-3.5-turbo

from llama_index.core.response_synthesizers.type import ResponseMode

print(ResponseMode.REFINE, ResponseMode.COMPACT, ResponseMode.SIMPLE_SUMMARIZE, ResponseMode.TREE_SUMMARIZE, ResponseMode.GENERATION)

# <ResponseMode.REFINE: 'refine'>,
# <ResponseMode.COMPACT: 'compact'>,
# <ResponseMode.SIMPLE_SUMMARIZE: 'simple_summarize'>,
# <ResponseMode.TREE_SUMMARIZE: 'tree_summarize'>,
# <ResponseMode.GENERATION: 'generation'>)

from dotenv import load_dotenv
load_dotenv('./.env')


import os
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'

# import logging
# import sys
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
documents[0].text_template, documents[0].metadata_seperator, documents[0].metadata_template

# ('{metadata_str}\n\n{content}', '\n', '{key}: {value}')

index = VectorStoreIndex.from_documents(documents, show_progress=True)
# index.docstore.docs['9655ddd8-a3e7-46a5-b709-dca099020c81']
# index.vector_store.data.embedding_dict['9655ddd8-a3e7-46a5-b709-dca099020c81']

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
response.response # 'The author worked on writing short stories and programming, starting with early programming experiences on an IBM 1401 using Fortran in 9th grade. Later, the author transitioned to working with microcomputers, building simple games and a word processor on a TRS-80 in the early 1980s.'

response.metadata 
"""
{'8aefc79e-2d08-4041-8f7e-294bf4f74b6a': {'file_path': '/home/whaow/workspaces/llm_aigc/tutorials/rag/data/paul_graham_essay.txt',
  'file_name': 'paul_graham_essay.txt',
  'file_type': 'text/plain',
  'file_size': 75042,
  'creation_date': '2024-06-15',
  'last_modified_date': '2024-06-15'},
 '4a01aa3a-d5ac-49f2-aed6-dbd0ad477bfe': {'file_path': '/home/whaow/workspaces/llm_aigc/tutorials/rag/data/paul_graham_essay.txt',
  'file_name': 'paul_graham_essay.txt',
  'file_type': 'text/plain',
  'file_size': 75042,
  'creation_date': '2024-06-15',
  'last_modified_date': '2024-06-15'}}
"""

len(response.source_nodes) # 2
response.response # 
'The author worked on writing short stories and programming, starting with early programming experiences on an IBM 1401 using Fortran in 9th grade. Later, the author transitioned to working with microcomputers, building simple games and a word processor on a TRS-80 in the early 1980s.'

20240618

果然不出我所料，院里是要分家，但是跟之前那个撤管科的考研点关系不大，wyl透露说是单纯觉得管科两年制的硕士没啥用，要钱没钱，要产出没产出，不如把名额划给MEM多捞点学费，好招点人进来。事实上是要新建一个新的学院，这是上面的要求，但是院里其实并不愿意分家，分家这事情在学校层面是好事，但学院层面是坏事，但小腿拗不过大腿，没辙，但这些年院里分化确实挺严重的，不如分了的好。

梅雨前最后一次顶强度，3组×400米变速（3圈快2圈慢，快圈1’20"以内，慢圈2’30"以内），剩下是2000米+1200米+800米的间歇。状态不算特别好，毕竟是开完会一个人顶强度，也没去换衣服鞋子，而且昨天也刚刚进行大强度的速度训练，但就是在扯着命去顶，因为400米这个距离怎么都是能扛得下来，冬训的时候就算再累，400米变速跑个十几组都是轻轻松松，而且每次跑完都贼爽，今天终于又感觉到那种很舒服的感觉了。

LXY今天一共跑了19K多，三天一共跑了50K，确实猛得可怕。XR一如既往的萎靡，完全不见五月份的锐气，而我感觉自己终于要彻底摆脱伤痛的阴影，或许还会复发，但活在当下，假以时日，重回巅峰，乃至超越，亦可期。

Whisper调用

import pytube
import moviepy.editor as mp
import whisper
import os
import subprocess


# 使用 yt-dlp 下载视频
def download_video(url, output_path='video.mp4'):
    subprocess.run(['yt-dlp', '-o', output_path, url])
    return output_path

# 从视频中提取音频
def extract_audio(video_path, output_path='audio.wav'):
    video = mp.VideoFileClip(video_path)
    video.audio.write_audiofile(output_path)
    return output_path

# 转录音频
def transcribe_audio_with_whisper(audio_path):
    model = whisper.load_model("large")
    # result = model.transcribe(audio_path, language='en')
    result = model.transcribe(audio_path)
    return result['text']


video_url = 'https://www.bilibili/video/BV12J4m1N7wU/'
video_path = './video.mp4'
audio_path = './audio.wav'

print("Downloading video...")
download_video(video_url, video_path)

print("Extracting audio...")
extract_audio(video_path, audio_path)

print("Transcribing audio with Whisper...")
transcription = transcribe_audio_with_whisper(audio_path)

print("Transcription:")
print(transcription)

# 清理临时文件
# os.remove(video_path)
# os.remove(audio_path)

输出：

Downloading video...
[BiliBili] Extracting URL: https://www.bilibili.com/video/BV12J4m1N7wU/
[BiliBili] 12J4m1N7wU: Downloading webpage
[BiliBili] BV12J4m1N7wU: Extracting videos in anthology
[BiliBili] 1254605812: Extracting chapters
[BiliBili] Format(s) 1080P 高码率, 1080P 高清, 720P 高清 are missing; you have to login or become premium member to download them. Use --cookies-from-browser or --cookies for the authentication. See  https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp  for how to manually pass cookies
[info] BV12J4m1N7wU: Downloading 1 format(s): 100110+30280
[download] Destination: ./video.f100110.mp4
[download] 100% of  425.01KiB in 00:00:00 at 7.64MiB/s     
[download] Destination: ./video.f30280.m4a
[download] 100% of  274.45KiB in 00:00:00 at 10.31MiB/s    
[Merger] Merging formats into "./video.mp4"
Deleting original file ./video.f30280.m4a (pass -k to keep)
Deleting original file ./video.f100110.mp4 (pass -k to keep)
Extracting audio...
MoviePy - Writing audio in ./audio.wav
                                                                    
MoviePy - Done.
Transcribing audio with Whisper...
Transcription:
 Saying goodbye, I don't think is ever nice, but saying goodbye without Feeling sad or feeling hurt or whatever that would that would just mean that the time we spent together was Right great and had a great time. So it was always clear that it will be tough and I know it will be tough

这里依赖需要安装：

!pip install pytube moviepy openai-whisper
!pip install you-get
!pip install yt-dlp

尤其是这个yt-dlp，各大视频平台的视频下载都封装好了，就是更新的比较频繁。

20240619

入梅首日，雨还没有下得太大，据说要一直下到七月份，但闷得不行，然而，GXJ大小姐这几天在实验室，咳嗽得厉害，搞得谁都不敢去开空调（每次她来第一件事就是关空调…），只能开窗开门通风，但还是热得不行，CHT已经开始用折扇降温，苦不堪言。

晚上吃完饭本想去打球，其实今天嘉伟和XR都考完最后一门了，但去了之后还是觉得实验室要凉快点儿。八点雨停，下楼环校4圈慢跑，10km@4’26"，平均心率166bpm，第三圈雨开始下得有点大，便装，穿的旧鞋子，整体节奏还行，心率维持得比较稳定，但后面T恤湿透，体感还是太吃力了。

慢跑积累必不可少，但是夏天的慢跑，尤其是出梅之后，是煎熬且费时。这个夏训，除了为下半年破三做准备外，一个很重要的目标是把16km跑进一个小时，最终12月8日总决赛，形式是10人×16km的接力，而就往年情况来看，16km跑进1小时（平均配速3’45"），才能算是有竞争力的成绩，去年总决赛，50支队伍，除了前五的学校，第6~第12名都是9小时55分出头的成绩，从第15名北航开始，总用时超出10小时（平均配速3’45"），其中第12名浙江大学，160km总用时9:55:30，平均配速3’43”，他们的队长YLS就是刚好卡在这个平均值上。然而到目前为止，即便对于嘉伟，一小时跑完16km也并非易事，对其他人来说就更加困难了。我自己最接近这个指标的一次，是在三个月前的扬州半马，前16km用时62分50秒，比这个要求还是差不少。

今年上海站前八的门槛，估计会在全队10km平均38分半左右，至少这个水平我们还是很有机会达到的。好想好想能进一回总决赛呀嘿，但也不是一个人就能做到，像去年一样，各种岔子。若是最后一回依然不得，总归不太圆满，还是得平常心。

https://pytorch/get-started/previous-versions/

2.0以前的torch最多只能用py39版本，但是很多新的加速包（accelerate，deepspeed）都要py310以后的版本。

最近回头把之前狂神的ES教程又看了一遍，我觉得其实就检索这件事情而言，老方法未必不如新方法，即稀疏检索未必不如现在这种主流的稠密检索，至少从解释性上来看，稠密检索毫无解释性可言，尤其是token级别的，所谓的product-similarity，其实效果也没有说就比sentence级别的好到哪里去。

本教程基于ElasticSearch7.6.1, 注意ES7的语法与ES6的API调用差别很大, 教程发布时最新版本为ES7.6.2(20200401更新);
ES是用于全文搜索的工具:

SQL: 使用like %关键词%来进行模糊搜索在大数据情况下是非常慢的, 即便设置索引提升也有限;
ElasticSearch: 搜索引擎(baidu, github, taobao)
一些ES涉及的概念:
- 分词器 ik
- Restful操作ES
- CRUD
- SpringBoot集成ES

Lucene: java写成的为各种中小型应用软件加入全文检索功能;
Nutch: 一个建立在Lucene核心之上的网页搜索应用程序, Nutch的应用比Lucene要更加广泛
大数据解决存储与计算(MapReduce)两个问题:

2004年Doug Cutting基于GFS系统开发了分布式文件存储系统;
2005年Doug Cutting基于MapReduce在Nutch搜索引擎实现了这种算法;
加入Yahoo后, Doug Cutting将MapReduce和NDFS结合创建了Hadoop, 成为了Hadoop之父;
Doug Cutting将BigTable集成到Hadoop中

回到主题:

Lucene是一套信息检索工具包, jar包, 不包含搜索引擎系统;
Lucene包含索引结构, 读写索引的工具, 排序, 搜索规则, 工具类;
Lucene和ES的关系:
- ES是基于Lucene做了一些封装和增强, 上手是比较简单的, 比Redis要简单

分布式的全文搜索引擎, 高扩展性;
接近实时更新的查询搜索;
ES是基于Restful的(即用get, post, delete, put来访问);
ES进行复杂的数据分析, ELK技术(elastic+logstash+kibana)

当使用索引时, solr会发生io阻塞, 查询性较差, elastic则在索引情况下的优势明显;
elastic的效率在传统项目下一般有50倍的提升;
elastic解压即可用, solr需要配置
solr用zookeeper进行分布式管理, elastic自带分布式
solr支持更多格式的数据, json, xml, csv, elastic只支持json
solr比elastic的功能更强大
solr查询快, 但是更新索引时慢(如插入和删除慢), elastic查询慢, 但是实时性查询快, 用于facebook新浪等搜索
solr是传统搜索应用的解决方案, elastic适用于新兴的实时搜索应用
solr比较成熟, elastic目前更新换代快;

要求jdk1.8以上, 这是最低要求;
elastic客户端, 界面工具;
springboot添加依赖时默认没有到ES7, 需要自己修改
下载链接: https://www.elastic.co/cn/downloads/elasticsearch
可以直接在windows或linux上学习, 而无需使用虚拟机或者其他配置;
elastic 7.x.x的解压目录:

bin: 启动文件
config: 配置文件
- log4j2: 日志配置
- jvm.options: java虚拟机相关配置
- elasticsearch.yml: ES配置文件, 默认端口是9200
lib: 相关jar包
logs: 日志
modules: 功能模块
plugins: 插件(如ik)

启动, 访问9200端口

双击: bin/elasticsearch.bat

访问: localhost:9200

可以看到如下json字符串

{
    "name" : "CAOYANG",
    "cluster_name" : "elasticsearch",
    "cluster_uuid" : "xEDZ4q2JSxq54mJM2UiTxQ",
    "version" : {
  	  "number" : "7.9.2",
  	  "build_flavor" : "default",
  	  "build_type" : "zip",
  	  "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
  	  "build_date" : "2020-09-23T00:45:33.626720Z",
  	  "build_snapshot" : false,
  	  "lucene_version" : "8.6.2",
   	  "minimum_wire_compatibility_version" : "6.8.0",
  	  "minimum_index_compatibility_version" : "6.0.0-beta1"
    },
    "tagline" : "You Know, for Search"
}

安装可视化插件 ES head:

下载地址: https://github/mobz/elasticsearch-head
- 安装步骤:
```
git clone git://github/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start
open http://localhost:9100/
```
- 连接测试: 访问http://localhost:9100/
  - 会发生跨域问题, 需要在ES解压目录下的bin/elasticsearch.yml中配置跨域信息, 在配置文件末尾添加:
    - http.cors.enable: true
    - http.cors.allow-origin: "*"
- 重新连接测试: 访问http://localhost:9100/
  - 出现可视化界面, 一些健康状况为显示在页面上

20240620

时隔十个月，我又回到了129运动场。

前几天嘉伟说今年还没去过129训练，得抽空去一趟。今天的课表是2000米×5组，间歇3分钟，精英组配速3’25"，高级组配速3’40"，肉眼可见的虐，而且天气恶劣，下午五点开始一阵一阵的暴雨，下完就出太阳，蒸得极其难受，但夏天就是这样一个冲极限的季节，事实上最近一周几乎都在速度训练，每天都在跟自己玩命。

蜡烛说今晚没有精英组，我们只好一起跟跑高级组，PACER是勇哥和蜡烛。

前2组，嘉伟全程在二道跟跑，明显对这个速度不是很满意，第二组最后一圈直接超越PACER，一骑绝尘。

第3组开始时，我就感觉自己应该是不行了，果然1200米就崩，提前下场休息。

第4组，强行跟完2000米，心肺难受到了极限，最后一圈嘉伟依然是一骑绝尘超越PACER，我也试着冲上去跟住，但最后100米还是被他拉爆，但也算是体验了一把拉爆PACER的快感。

第5组，蜡烛表示不能再跑了，今天4组结束，到此为止，有个哥们儿还想跑，蜡烛就让嘉伟带他，我想想还是去跟了一段，依然是1200米崩，最后一组2000米嘉伟甚至跑到了7’10"以内，其实，对他来说前面4组根本就没有用尽全力，他现在这个状态应该是超越了之前的巅峰期，今晚如果有精英组，相信他至少也能干个三四组。

整体来说，今晚是高质量的间歇训练，今年第一回赤膊上阵，操场上升腾的雾气，跑完一组，就跟从水里爬出来一样，擦完汗接着冲，不知脱了多少水，一天喝了有13杯500ml保温杯的水，怎一个痛快了得。

kibana是一个针对ES的开源分析及可视化界面, 用来搜索查看和交互存储在ES索引中的数据
下载地址: https://www.elastic.co/cn/kibana
下载完毕后, 解压需要一段时间, 是一个标准的工程

好处: ELK基本上都是拆箱即用;
启动bin/kibana.bat, 然后访问localhost:5601即可
在页面上可以直接在console里写查询语句执行, 查询结果直接显示在页面上

汉化: 直接在kibana.yml末尾添加: i18n.locale: "zh-CN", 即SpringBoot中的国际化

ES核心概念理解

elasticsearch是面向文档的, 索引和搜索数据的最小单位是文档, 与关系型数据库比较:

数据库(database) v.s. 索引(indices, 理解为数据库)
表(tables) v.s. 类型(types, 7.x弃用, 8.x会完全丢掉)
行(rows) v.s. 文档(documents)
字段(columns) v.s. 字段(fields)
ES所有数据都是json

ES在后台把索引划分成多个切片, 每份分片可以在集群中不同服务器间迁移
ES一个人就是一个集群, 默认的集群名称就是elasticsearch
ES文档(可以理解为json或是yml中的数据格式)的重要属性:

自我包含, 一篇文档同时包含字段和对应的值, 也就是同时包含keyvalue
可以是层次型的, 一个文档中包含子文档, 复杂的逻辑实体, 就是一个json对象
灵活的结构, 文档不依赖预先定义的模式, 我们知道关系型数据库中, 要提前定义字段才能使用, elasticsearch中, 对于字段是灵活的, 有时可以忽略该字段, 或是动态添加一个新的字段

ES索引: 就是数据库

ES索引是一个非常大的文档集合, 索引存储了映射类型的字段和其他设置, 然后存储到各个分片上, 一般是存在集群的不同节点上, 一旦某个节点挂了, 数据不会丢失;
分片是一个Lucene索引, 一个包含==倒排索引(inverted index)==的文件目录, 倒排索引的结构使得elasticsearch在不扫描全部文档的情况下, 就可以告诉你哪些文档包含特定的关键字;
- 通过各个关键字的权重(可以理解为score)之和来对查询结果进行排序
- 使用倒排索引可以过滤掉完全无关的数据

总结:

ES中, 索引(库)这个词被频繁使用, 就是数据库, 索引被分为多个分片, 每份分片是一个Lucene的索引, 所以一个ES索引是由多个Lucene索引组成, Lucene索引是一种倒排索引;

IK分词器详解

分词: 类似python的jieba和nltk
IK分词器下载地址: https://github/medcl/elasticsearch-analysis-ik-releases

注意与ES版本对应

下载完毕后直接解压即可, 然后移动到elasticsearch7.x.x里的plugins/的ik目录下

IK项目的README中写道:

拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik
注意别下载错了, 解压或里面应该是一个config文件夹, 1个properties文件, 1个policy文件, 还有5个jar包

配置好后重启ES, 可以看到ik分词器被加载了;

可以通过elasticsearch-plugin来查看加载的插件(把ES目录的bin文件夹添加到Path里)

开始使用IK, 在kibana的界面控制台里写:

查看的不同的分词效果:

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "中国人民军队"
}

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "中国人民军队"
}

+ ```ik_smart```为最少切分, 返回结果只有包含中国人民军队的结果, 就是尽可能少的切分;
+ ```ik_max_word```为最细粒度划分, 穷尽词库的可能, 如会划分中国, 人民, 军队, 及这三个分词;
+ 如果搜索```超级喜欢狂神说```, 会发现即使用最少切分, 狂神说三个字都被分开了;
+ 所以需要配置用户字典: 配置文件路径在ik/config/IKAnalyzer.cfg.xml
  * 可以新建一个kuang.dic文件, 然后在IKAnalyzer.cfg.xml中添加配置
    - ```<entry key="ext_dict">kuang.dic</entry>```
    - ```<entry key="ext_stopwords">kuang_stop.dic</entry>```

Rest风格操作

PUT: 创建文档(指定文档id), localhost:9200/索引名称/类型名称/文档ID

示例:

PUT /test1/type1/1
{
	"name": "狂神说",
	"age": 3
}

命令执行返回结果: 完成了自动增加索引, 数据也成功添加了

{
  _index: test1,
  _type: type1,
  _id: 1,
  _score: 1,
  name: "狂神说",
  age: 3
}

数据类型:
- 字符串类型: text和keyword
- 数值类型: long, integer, short, byte, double, float, half float, scaled float
- 日期类型: date
- te布尔值类型: boolean
- 二进制类型: binary

指定字段类型: 添加一个库并添加字段规则(类似mysql建表)

PUT /test2
{
  "mappings": {
    "properties": {
	  "name": {
	    "type": "text"
	  },
	  "age": {
	    "type": "long"
	  },
	  "birthday": {
	    "type": "date"
	  }
	}
  }
}

返回结果发现acknowledged是true, 说明规则创建成功

POST: 创建文档(随机文档id), localhost:9200/索引名称/类型名称
POST: 修改文档, localhost:9200/索引名称/文档id/_update

示例:

POST /test3/_doc/1/_update
{
  "doc": {
    "name": "法外狂徒张三"
  }
}

则索引test3中的_doc类型里的name值被更新

注意如果不创建文档则默认是_doc

DELETE: 删除文档, localhost:9200/索引名称/类型名称/文档id

DELETE test1: 删除索引
通过请求来判断是删除索引还是删除文档记录

GET: 查询文档(通过文档id), localhost:9200/索引名称/类型名称/文档id

获得索引表的规则: GET test2
- GET 表就是拿到表的信息, GET 索引就是拿到索引的信息
GET _cat/health: 查看健康信息
GET _cat/indices?v: 索引状态
ge _cat可以获得es的当前的很多信息

POST: 查询所有数据, localhost:9200/索引名称/类型名称/_search

20240621

绿叶今天有红烧小黄鱼，已经很久没有看到食堂煮鱼了，就是个头小了些，不过才3块钱一条，难得良心。

说实话，最近饿得不行，但还是尽量克制住非饭点吃东西的欲望。我有半个月没到绿叶吃砂锅，阿姨还是记得给我装两碗堆得尖尖的饭，属实是吃撑。晚上八点，慢跑10km，平均配速4’31"，平均心率162bpm，配速压的比较稳，也很吃力，这种天气跑起来就肯定不会轻松。

PS：昨晚脱了衣服，跟别人一比自己真就一只白斩鸡（跟这些常年晒太阳的人比，还真比他们白不少），最近核心腰腹做少了，腹肌完全消失，加上胸肌又松弛得不行，侧面视角完全不能看。但体重并不算高，前两天晚饭后称毛重，67kg整，净重大约持平扬马前的水平，但就是看起来要比别人肥得多（其实我觉得自己已经不能再更轻了，净重低于65kg是肯定不行的）。
PS：嘉伟确实跑起来是太帅了，我觉得自己练这么久，虽然比以前快了很多，但跑姿还是一塌糊涂，难绷。最后一张名场面，嘉伟：各位不会都没吃饭吧，跑这么慢？其实昨晚嘉伟还真没吃晚饭，离谱。

把Qwen1.5和Qwen2的几个派生模型都测了一下，其实结果差不太多，目前没有看到明显的区别，但是小模型一个典型的问题就是输出的误差太大，很容易发生中英混杂的问题。大模型的输出显然是要流畅得多。

至于quantize得版本，之前在ChatGLM1和2上做测试就发现，其实输出结果真的没什么变化，我到现在

值得一提的是，Qwen2-72B-int4依然可以很好地作答比较复杂的算术问题，不容易被之前几个典型的陷阱问题误导（比如年龄问题的若干变体），最近看了K-Sparse AutoEncoder（arXiv:1312.5663，这是个很古老的理论，有点类似onthot），这个是可以用来做解释性的，就是事实上在一次推理中，大模型真正被激活的参数是很少的，海量的参数中事实上只有稀疏部分的参数是在起作用的，比如在回答法律问题，模型只会使用一部分的参数，回答物理问题，那又是另一部分的参数，这其实也是符合直觉的，人脑中或许也是不同的部分掌管不同的知识。

但是无论如何，这种解释都是缺乏理论支撑以及可复现、可迁移性的，我觉得直到目前为止，我们对LLM依然所知甚少，我们不知道它为什么这么做，也不知道它到底能做到什么地步，仅仅是依赖人类的调教经验，在无脑地增加越来越多的参数，企图使用这种“人海战术”，来穷竭量化这个世界。其实这个是挺愚蠢的一件事情，本质就是在超高维的空间，让世间一切不同的事物都是显著可分，但是时间线上的实例永远是无穷无尽的。

现在的研究无非都是在讲一个自圆其说的故事，所谓的挑战性的benchmark花样百出，甚至都不需要花费太多时间验证。除非现在能造出一个AI，让他在人类社会中生存一天、一周、一月、一年、乃至一整个生命周期，甚至不需要有实体，只需要在网络上生存一段时间，不被发现破绽，这样的图灵测试才是真正有意义的。

还有一个遐想，就是我们现在走的这套autoregressive的道路真的是正确的吗？人类在组织输出的时候，其实我觉得还是偏向于连点成线，线扩成面的，即应该是先根据问题，联想到几个点，然后想办法把他们串起来这样，如果说autoregressive是一种单向的生成，那这种就是多中心多向的生成方式，呈现在模型架构上，输出就应该是先是做“点”的生成，然后根据这些点，再进行多向的生成，这是个两步走的事情。因为目前autoregressive呈现出明显的缺陷，除了解释性不足以外，也是难以控制的，而这种多中心多向的生成，看起来似乎更合理一些。

Qwen1.5简易测试脚本

def qwen_1_5_demo():	
	import torch
	import time
	with open("cuda_avail.txt", 'w') as f:
		f.write(str(torch.cuda.is_available()))
	from transformers import AutoModelForCausalLM, AutoTokenizer
	device = "cpu" # the device to load the model onto
	model_path = r"F:\model\huggingface\Qwen\Qwen1.5-0.5B-Chat"
	model_path = "/nfsshare/home/caoyang/code/model/huggingface/Qwen/Qwen1.5-0.5B-Chat"
	model = AutoModelForCausalLM.from_pretrained(
		model_path,
		torch_dtype="auto",
		device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	system_prompt = "你是一个数学教师，你需要对算术问题进行作答"
	user_prompt = "我今年18岁，我的妹妹的年龄是我的一半，那么当我60岁时，我的妹妹多少岁？"
	messages = [
		{"role": "system", "content": system_prompt},
		{"role": "user", "content": user_prompt}
	]
	text = tokenizer.apply_chat_template(
		messages,
		tokenize=False,
		add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(device)
	t = time.time()
	generated_ids = model.generate(
		model_inputs.input_ids,
		max_new_tokens=512
	)
	generated_ids = [
		output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]
	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	t = time.time() - t
	with open("resp.txt", 'w') as f:
		f.write(str(response) + '\n' + str(t))
qwen_1_5_demo()

20240622

各种校友会，以及下周的典礼，在这个并不算很好的天气里，也是多了几分生气。

跑休，计划力量训练。早上没有下雨，暑气郁结，极其沉闷，即便如此依然有人不知死活地上午跑步。中午下雨，稍许改善，下午田径场一直有人踢球，看起来外面下得并不算小，五点准备去拿个壶铃做力量，但师傅想要提前下班，便也罢了，少练一回也不会死，不是么？

晚上回来把许久不练的核心腰腹和俯卧撑认真练一回，真的难受，不开空调根本坚持不下来，越来越觉得地球不适合人类居住了，印象里极端的高温是从22年开始出现，那时候7月底8月初白天都不敢在外面走路，如今每年都在上演，今年出梅后肯定又是极端高温。

关于文档的基本操作(重点!)

基本操作
```
PUT /kuangshen/user/1
{
	"name": "xxx",
	"age": 12,
	"desc": "xxx",
	"tags": {"a","b","c"}
}
```
- kuangshen就是_index, user就是_type, 1就是_id
- 简单搜索GET kuangshen/user/1: 取出kuangshen库中的user表里的id为1的文档结果;
  - 有个_version字段, 表明被更新了几次
  - 可以添加条件来搜索: GET kuangshen/user/_search?q=name:狂神说Java, 注意这个是精确搜索, 少一些(比如搜索"狂神说")就找不到"狂神说Java"
  - 之前提到字符串有keyword和text的区别, keyword是不会被分词处理, text会被分词处理
- POST kuangshen/user/1/_update {"doc":{"name": "xxxx"}}: 更新数据, 注意更新的数据放在doc键下, 是一个字典格式的, 一次可以同时更新多个字段

复杂操作: select(排序, 分页, 高亮, 模糊查询, 精准查询)

hits字段下是所有查询结果, 如果存在多条查询出来的结果, 则每个结果有_score值返回, 指匹配度, 会降序列出;

例1: 查询参数体(一个json对象), 把name字段包含"狂神"的结果都搜索出来

GET kuangshen/user/_search
{
  "query": {
    "match": {
      "name": "狂神"
    }
  },
  
  "_source": ["name","desc"], // 结果过滤, 只返回name和desc字段
  "sort": [ //排序
    {
      "age": {
  	  order: "asc"
  	}
    }
  ],
  
  // 分页参数: 总第from个数据开始, 返回多少条数据(当前页面)
  "from": 0,
  "size": 2
  
}

例2: 多条件查询, 通过bool值字段

bool下的must命令, 下面的所有条件都要符合(and)
bool下的should命令, 下面的所有条件都要符合(or)
bool下的must_not命令, 下面的所有条件都不能符合
bool下的filter字段, 过滤条件, 包括range下的lte, lt, gt, gte字段为大于小于等等
match字段是包含这个字符串的结果都会被搜索出来

GET kuangshen/user/_search
{
  "query": {
    "bool": {
      "must": [
  	  {
  	    "match": {
  	      "name": "狂神说"
  	    }
  	  },
  	  {
  	    "match": {
  		  "age": 23
  		}
  	  }
  	],
  	"should": [
  	  {
  	    "match": 13
  	  },
  	  {
  	    "match": {
  		  "age": 12
  		}
  	  }
  	],
  	"filter": { // 过滤条件
  	  "range": {
  	    "age": {
  		  "gte": 10,
  		  "lt": 25
  		}
  	  }
  	}
    }
  }
}

例3: 匹配多个条件

GET kuangshen/user/_search
{
  "query": {
    "match": {
      "tags": "a b" // 会把tag字段包含(指match)"a"和"b"的都拿出来
    }
  }
}

多个条件用空格隔开, 只要满足一个结果就被查出, 会有得分结果返回, 得分越高匹配度越高

精确查询!
- term查询就是直接通过倒排索引指定的词条进程精确查找的
关于分词:
- term, 直接查询精确的, 把上面例子中的match换成term就是精确而非包含的查询
- match, 会使用分词器解析(先分析文档, 然后再通过分析的文档进行查询)
- text字符串会被分词解析, keyword则不会被分词解析
多个值匹配的精确查询: bool.should + term

高亮:

GET kuangshen/user/_search
{
  "query": {
    "match": {
      "tags": "a b" // 会把tag字段包含(指match)"a"和"b"的都拿出来
    }
  },
  "highlight": { // 搜索结果高亮name
    "pre_tags": "<p class='key' style='color:red'>",
    "post_tags": "</p>", // 自定义高亮的tag
    "fields": {
      "name": {}
    }
  }
}

SpringBoot集成ES详解

官方文档: https://www.elastic.co/guide/index.html

找到 Java Rest Client 里的高级客户端

① 引入原生依赖: sts4直接选nosql里的elasticsearch就行了, 但是有可能默认是ES6, 要改成ES7的依赖才行;

<dependency>
	<groupId>org.elasticsearch.client</groupId>
	<artifactId>elasticsearch-rest-level-client</artifactId>
	<version>7.6.2</version>
</dependency>

② 初始化

创建: 可以参数里是多个new(表示多个集群, 一般本地测试一个就行了)

RestHighLevelClient client = new RestHighLevelClient(
								RestClient.builder(
									new HttpHost("localhost",9200,"http"),
									new HttpHost("localhost",9201,"http")));

关闭: client.close();

③ ES的配置文件:

@Configuration
public class ElasticSearchConfig {
	@Bean
	public RestHighLevelClient restHighLevelClient() {
		RestHighLevelClient client = new RestHighLevelClient(
										RestClient.builder(
											new HttpHost("127.0.0.1",9200,"http"),
											));		
		return client;
	}
}

④ 测试文件:

@SpringBootTest
class KuangshenEsApiApplicationTests {
	@Autowired
	private RestHighLevelClient restHighLevelClient;
	
	@Test
	void contextLoads() {
	}
}

关于索引的API操作详解

测试文件(续)

例1: 直接运行西面的测试代码, ES中创建的一个新的空索引kuang_index, 然后判断存在. 最后删除

@SpringBootTest
class KuangshenEsApiApplicationTests {
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;
	
	@Test
	void testCreateIndex() { // 测试索引创建
		// 1. 创建索引请求
		CreateIndexRequest request = new CreateIndexRequest("kuang_index");
		
		// 2. 客户端执行请求
		CreateIndexResponse createIndexResponse = client.indices().create(request,RequestOptions.DEFAULT);
		
		System.out.println(createIndexResponse);
	}
	
	@Test 
	void testExistIndex() { // 测试获取索引
		GetIndexRequest request = new GetIndexRequest("kuang_index");
		boolean exists = client.indices.exist(request,RequestOptions.DEFAULT);
		
		System.out.println(exists);
	}
	
	@Test
	void testDeleteIndex() { // 测试删除索引
		DeleteIndexRequest request = new DeleteIndexRequest("kuang_index");
		AcknowledgedResponse delete = client.indices.delete(request,RequestOptions.DEFAULT);
		System.out.println(delete.isAcknowledged());			
	}
}

20240623

开荤，放纵吃了一天，其实，昨晚就开始放纵地吃，而且几乎没有消耗，光吃不练，有些罪过。

早上依然有些小雨，但是明天就是毕业典礼，实验楼依然在如火如荼地布置和排演，其实他们对这里又能有什么感情呢？研究生自不必说，不过是人生短暂的一站，而本科生经年种种，少些怨尤就不错了，也就我们这些呆得太久的人会有些感情。但是吧，家丑不外扬，自己人说说也则罢了，外人是没资格谈论这里的——这是个糟糕的总统，但它是我选的总统（致敬阿连德）。

好在这雨，总算是祛除了些暑气，晚上非常凉快，可惜九点多才到学校，去管理室拿20kg的壶铃做力量，30个正向箭步×4组+30个反向箭步×4组（+20kg），补2000米慢跑放松。XR几个人之前都已经跑完了，LXY和YZZ跑了一大圈学校，但只有6K多一点，不像是她的风格，这两天下雨不得给她憋坏了，就跑这点怎么够本。

不过，我还是看到安迪和AX在跑，安迪应该跑了10K多一些，他们跑姿都很好，让我很是羡慕。尤其是AX，他肯定是跑了很久，我做完力量陪他遛了5圈以放松，4’25"的配速，他显然说起话来游刃有余，我倒觉得有些吃力。不太清楚AX的真实水平，但或许不在我之下，若果真如此，那实在太好了。

休息两天也好，调整状态，晚上也算是顺了腿，明天应该不下雨，晚上也会挺凉快，顺利的话，准备测一下万米成绩，也是检测近期训练成果，可能已经恢复到巅峰期了。

K-SAE的本质是将输入映射到维度高得多的空间，然后进行一轮上采样（稀疏编码），将上采样的结果拼接成稠密的、低维的、隐层表示进行传播。

还有一个kv-cache问题，就是为什么BERT的双向注意力机制，现在不再流行，原因在于双向注意力机制无法使用kv-cache来减少计算开销。

KV cache 会显著地提升 inference/generate 的性能，降低时延；
generate 的 seq 越长，占用的显存增长得也会更多；
- gpt 8K vs. 32k, input/output prices 是翻倍的关系
KV-cache Memory Usage

2 × precision × n layers × d model × seqlen × batch 2 \times \text{precision} \times n_{\text{layers}} \times d_{\text{model}} \times \text{seqlen} \times \text{batch} 2×precision×nlayers×dmodel×seqlen×batch
- 2 = two matrices for K and V
- precision = bytes per parameter (e.g., 4 for fp32)
- n layers n_{\text{layers}} nlayers = layers in the model
- d model d_{\text{model}} dmodel = dimension of embeddings
- seqlen = length of context in tokens
- batch = batch size
- OPT-30B: 2 ∗ 2 ∗ 48 ∗ 128 ∗ 1024 ∗ 7168 2*2*48*128*1024*7168 2∗2∗48∗128∗1024∗7168
  - precision：2（fp16 inference）
  - 48 layers，128 batch
  - K/V shape: seqlen 1024, d_model 7168 (7*1024)
    - https://github/meta-llama/llama3/blob/main/llama/model.py#L129-L144
Bidirectional vs. Unidirectional
- BERT：Bidirectional Encoder Representations from Transformers），双向注意力
- GPT：Unidirectional，单向注意力；
以多轮对话为例，从计算复杂度的角度探索为什么 decoder-only 更优
定义
- L L L: past sequence length
- ℓ \ell ℓ: 新的输入的长度
- d d d：embedding dimension
decoder only
- KVcache: K p a s t , V p a s t K_{past}, V_{past} Kpast,Vpast
- 每次新输入时，计算键值（ K n e w , V n e w K_{new}, V_{new} Knew,Vnew），时间复杂度为 O ( ℓ ⋅ d ) O(\ell\cdot d) O(ℓ⋅d)，也需要计算 Query Q n e w Q_{new} Qnew
- 计算注意力，
  - Q = Q n e w ∈ R ℓ ⋅ d Q=Q_{new}\in \mathbb R^{\ell \cdot d} Q=Qnew∈Rℓ⋅d
  - K = [ K p a s t , K n e w ] ∈ R ( L + ℓ ) ⋅ d K=[K_{past}, K_{new}]\in \mathbb R^{(L+\ell)\cdot d} K=[Kpast,Knew]∈R(L+ℓ)⋅d
  - V = [ V p a s t , V n e w ] ∈ R ( L + ℓ ) ⋅ d V=[V_{past}, V_{new}]\in \mathbb R^{(L+\ell)\cdot d} V=[Vpast,Vnew]∈R(L+ℓ)⋅d
  - A = Q K T ∈ R ℓ ⋅ ( ℓ + L ) A=QK^T\in \mathbb R^{\ell\cdot(\ell+L)} A=QKT∈Rℓ⋅(ℓ+L)
    - q i q_i qi 要跟 L + i L+i L+i 的 K 计算 score vector；
  - softmax ( A ) ⋅ V ∈ R ℓ ⋅ d \text{softmax}(A)\cdot V\in \mathbb R^{\ell\cdot d} softmax(A)⋅V∈Rℓ⋅d
对于 encoder-decoder
- At every turn, the new input has to be encoded again; for unidirectional attention, only the newly added message needs to be encoded.

具体一个例子：

L, l, d = 5, 2, 3
K_past = np.random.randn(L, 3)
V_past = np.random.randn(L, 3)
Q_past = np.random.randn(L, 3)

Q_new = np.random.randn(l, 3)
K_new = np.random.randn(l, 3)
V_new = np.random.randn(l, 3)

def create_custom_matrix(n):
    # 创建一个全为负无穷的矩阵
    matrix = np.full((n, n), -np.inf)
    
    # 将下三角部分（包括对角线）设置为0
    lower_triangle_indices = np.tril_indices(n)
    matrix[lower_triangle_indices] = 0
    
    return matrix

M1 = create_custom_matrix(5)
M1
"""
array([[  0., -inf, -inf, -inf, -inf],
       [  0.,   0., -inf, -inf, -inf],
       [  0.,   0.,   0., -inf, -inf],
       [  0.,   0.,   0.,   0., -inf],
       [  0.,   0.,   0.,   0.,   0.]])"""

import scipy as sp

import scipy as sp
sp.special.softmax((Q_past.dot(K_past.T))/np.sqrt(3) + M1, axis=1)
"""array([[1.   , 0.   , 0.   , 0.   , 0.   ],
       [0.622, 0.378, 0.   , 0.   , 0.   ],
       [0.592, 0.352, 0.056, 0.   , 0.   ],
       [0.629, 0.271, 0.022, 0.079, 0.   ],
       [0.532, 0.147, 0.039, 0.079, 0.203]])"""

M2 = create_custom_matrix(7)
M2
"""array([[  0., -inf, -inf, -inf, -inf, -inf, -inf],
       [  0.,   0., -inf, -inf, -inf, -inf, -inf],
       [  0.,   0.,   0., -inf, -inf, -inf, -inf],
       [  0.,   0.,   0.,   0., -inf, -inf, -inf],
       [  0.,   0.,   0.,   0.,   0., -inf, -inf],
       [  0.,   0.,   0.,   0.,   0.,   0., -inf],
       [  0.,   0.,   0.,   0.,   0.,   0.,   0.]])"""

Q = np.concatenate([Q_past, Q_new], axis=0)
K = np.concatenate([K_past, K_new], axis=0)
sp.special.softmax((Q.dot(K.T))/np.sqrt(3) + M2, axis=1)
"""array([[1.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.622, 0.378, 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.592, 0.352, 0.056, 0.   , 0.   , 0.   , 0.   ],
       [0.629, 0.271, 0.022, 0.079, 0.   , 0.   , 0.   ],
       [0.532, 0.147, 0.039, 0.079, 0.203, 0.   , 0.   ],
       [0.245, 0.136, 0.156, 0.119, 0.122, 0.222, 0.   ],
       [0.162, 0.211, 0.233, 0.112, 0.13 , 0.079, 0.072]])"""

可以看到两次的sp.special.softmax计算QKV注意力的结果，下三角的数值是相同的（这是decoder-only的attention），因此可以缓存，但是如果是双向：

M2 = create_custom_matrix(7)
Q = np.concatenate([Q_past, Q_new], axis=0)
K = np.concatenate([K_past, K_new], axis=0)
sp.special.softmax((Q.dot(K.T))/np.sqrt(3) + M2, axis=1)
"""array([[1.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.622, 0.378, 0.   , 0.   , 0.   , 0.   , 0.   ],
       [0.592, 0.352, 0.056, 0.   , 0.   , 0.   , 0.   ],
       [0.629, 0.271, 0.022, 0.079, 0.   , 0.   , 0.   ],
       [0.532, 0.147, 0.039, 0.079, 0.203, 0.   , 0.   ],
       [0.245, 0.136, 0.156, 0.119, 0.122, 0.222, 0.   ],
       [0.162, 0.211, 0.233, 0.112, 0.13 , 0.079, 0.072]])"""
import scipy as sp
sp.special.softmax((Q_past.dot(K_past.T))/np.sqrt(3), axis=1)
"""array([[0.353, 0.169, 0.114, 0.157, 0.206],
       [0.218, 0.132, 0.537, 0.067, 0.046],
       [0.287, 0.171, 0.027, 0.141, 0.374],
       [0.443, 0.191, 0.015, 0.055, 0.296],
       [0.532, 0.147, 0.039, 0.079, 0.203]])"""
sp.special.softmax((Q.dot(K.T))/np.sqrt(3), axis=1)
"""array([[0.227, 0.109, 0.074, 0.101, 0.132, 0.148, 0.21 ],
       [0.159, 0.097, 0.393, 0.049, 0.033, 0.188, 0.081],
       [0.229, 0.137, 0.022, 0.113, 0.3  , 0.044, 0.156],
       [0.406, 0.175, 0.014, 0.051, 0.272, 0.012, 0.07 ],
       [0.404, 0.112, 0.029, 0.06 , 0.154, 0.063, 0.178],
       [0.201, 0.112, 0.128, 0.097, 0.1  , 0.182, 0.18 ],
       [0.162, 0.211, 0.233, 0.112, 0.13 , 0.079, 0.072]])"""

即输入的QK多加一维进来之后，结果是完全不同的，因为整个注意力权重整体都发生变化了。

20240624

一个烂会注册费要两三百欧，虽然能报，但还是真的离谱，瑞士那屁大点的地方，简单吃顿饭都要四五百，但去还是得去一趟，看看风景也是好的，有人说去趟那边，就算下辈子在那儿做狗，也不回来当牛作马了。

有人生来就在罗马，有人生来就是骡马。之前有段时间特别喜欢听小约翰的奇葩小国系列，看小约翰还是从硬核狠人入坑，当时也是当段子听，但是听到奇葩小国系列时，有不少都很有触动（智利的阿连德，古巴的卡斯特罗、切格瓦拉，布基纳法索的桑卡拉，埃及的纳赛尔），小国的领导者也有这些正人君子，但大多数都是些独夫民贼（非洲四大仁君、卡扎菲、海地的杜瓦利埃）以及井底之蛙（阿尔巴尼亚、冈比亚），后者当然让人唾弃，但前者往往过刚易折（除了菲律宾的杜特尔特，不过他似乎也算不得君子，只是以暴制暴，因地制宜了属于是），反而是那些中庸之辈（贝宁左右横跳、苏里南傍荷兰大腿，还有泰国这种王室与军政府相制约的神奇体制）能活得更好。总之，许多烂事都深究不得，光鲜之下都是肮脏的人血馒头。

晚上hmy毕业散伙饭，她最后还是去了深圳华为，wyl难得慷慨大方地请我们吃了一顿，以前聚餐他都是点一堆绿色蔬菜，虽然考虑到他老人家年纪大了，吃不了太多油腻，但还是觉得太抠搜了，今天他看我们好像没吃饱，主动又点了三四个硬菜，真的吃撑了。

回来一看操场居然没关门，顿时就后悔了，赶紧消化，准备还是要认真跑一下。结果刚热身完，大雨滂沱，我跟师傅说今晚很凉快，真的想练一下，请求暂缓关门，便与XR赤膊上阵，但是毕竟雨战，操场熄灯，时间太晚，就一个人在黑暗中猛冲，而且吃太饱没完全消化完，最后只跑了5000米，19分32秒，师傅也急着在催我们，XR不到7圈就爆了，变强的路上怎能没有辣🍔？他的天赋显然比我好得多，若是他能坚持下来这个夏训，一定是可以超越我的。

PS：最近SXY对跑步莫名上心，下雨天都跑，不知是受啥刺激了。但没有些追求，是无法长久地坚持跑步的。

triviaqa_unfiltered数据说明

包含三个JSON文件，unfiltered-web-train.json, unfiltered-web-dev.json, unfiltered-web-test-without-answers.json

JSON文件的键都是一样，具体如下：

Data <class 'list'> 87622
Domain <class 'str'> unfiltered-web
Split <class 'str'> train
VerifiedEval <class 'bool'> False
Version <class 'float'> 1.0
---
Data <class 'list'> 11313
Domain <class 'str'> unfiltered-web
Split <class 'str'> dev
VerifiedEval <class 'bool'> False
Version <class 'float'> 1.0
---
Data <class 'list'> 10832
Domain <class 'str'> unfiltered-web
Split <class 'str'> test
VerifiedEval <class 'bool'> False
Version <class 'float'> 1.0

有用的信息都在Data字段：这是一个List，其中每个元素是Dict，字段构成为['Answer', 'EntityPages', 'Question', 'QuestionId', 'QuestionSource', 'SearchResults']，test集上缺少Answer这个键

import os
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
data_root = r"D:\data\TQA"
data_root_unfiltered = os.path.join(data_root, "triviaqa-unfiltered")
data_root_rc = os.path.join(data_root, "triviaqa_rc")

data_paths_unfiltered = {
    "train": os.path.join(data_root_unfiltered, "unfiltered-web-train.json"),
    "dev": os.path.join(data_root_unfiltered, "unfiltered-web-dev.json"),
    "test": os.path.join(data_root_unfiltered, "unfiltered-web-test-without-answers.json"),
}
# 检查每个JSON文件的
def easy_check_json(data):
    for key in data:
        datum = data[key]
        data_type = type(datum)
        print(key, data_type, end=' ')
        if isinstance(datum, list):
            print(len(datum))
        elif isinstance(datum, list):
            print(list(datum.keys()))
        else:
            print(datum)
            
# 检查JSON文件中Data键下数据的情况
# ['Answer', 'EntityPages', 'Question', 'QuestionId', 'QuestionSource', 'SearchResults']
def easy_check_data(data):
    for datum in data:
        answer = datum["Answer"]
        entity_pages = datum["EntityPages"]
        question = datum["Question"]
        question_id = datum["QuestionId"]
        question_source = datum["QuestionSource"]
        search_results = datum["SearchResults"]
        splitline = '-' * 32 + '\n'
        print(splitline + "Answer: ")
        print(answer)
        print(splitline + "EntityPages: ")
        print(entity_pages)
        print(splitline + "Question: ")
        print(question)
        print(splitline + "QuestionId: ")
        print(question_id)
        print(splitline + "QuestionSource: ")
        print(question_source)
        print(splitline + "SearchResults: ")
        print(search_results)
        a = input()
        if a: break

20240625

境外汇款，对方是公司账户，线上渠道是不能审核通过的，大雨天害我往外面跑两趟，结果估计还是白跑，共同SHA，还被对面银行吃掉10%的汇额，得想法子补齐，早知道还是找人用信用卡帮忙付一下了。但是感觉用OUR还是要被吃，特意还问了一下柜员，条款里写的OUR并不会承担收款行的额外费用，只是一次性多收25刀，这波学费估计是要交到家了。

晚上小跑10K，均配4’32"，平均心率159bpm，今早出门就没穿袜子，用的还是高战损版的VAPORFLY，鞋底磨损严重，极其不稳，转弯把我脚底都快磨没了，人还是有点累。不过今晚还是AK超神，直接穿人字拖跑了10K，比我还快一点点，真特么穿拖鞋都拉爆你。

PS：SXY给出的理由与我预想的不同，本不该妄加揣测，但或许是4月14日的四分马，让她有些触动。许多人都是经历挫败后才开始跑步，虽然我倒并不完全如是，但自21年5月后，显著加训，很快就在21年6月跑出41分半的万米PB，此后直到去年年底才算是正式PB，在那之前，跑步更多都只是个打发时间的事罢了。想走远，不一定非得跑起来。

Tensorboard graph demo

from IPython.display import Image

import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.tensorboard import SummaryWriter
from torchvision.models import resnet50

# 创建一个 TensorBoard writer
writer = SummaryWriter('runs/model_visualization')

# 创建一个模型
model = resnet50(pretrained=True)

# 创建一个随机数据张量来代表输入数据
inputs = torch.randn(1, 3, 224, 224)

# 将模型和输入添加到 TensorBoard
writer.add_graph(model, inputs)

# 关闭 writer
writer.close()

!tensorboard --logdir=runs

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        # 28*28 ==> 784
        # fc1 => fc21
        # fc1 => fc22
        self.fc1 = nn.Linear(784, 400)
        # mu
        self.fc21 = nn.Linear(400, 20)
        # logvar
        self.fc22 = nn.Linear(400, 20)
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 784)
    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std
    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))
    # DAG
    def forward(self, x):
        mu, logvar = self.encode(x.view(-1, 784))
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar
writer = SummaryWriter('runs/model_visualization')
vae = VAE()
inputs = torch.randn(1, 28, 28)
writer.add_graph(vae, inputs)
!tensorboard --logdir=runs
writer = SummaryWriter()
# writer.add_hparams(vars(args), {'0': 0})

20240626

有所喧扰，有所平息。

楼下晚会灯红酒绿，楼上门可罗雀。晚上雨势骤起，倒是让人安心了许多。

下午雨停了一阵子，溜出去剪了个头发。回来时雨势渐起，驱散了操场许多刚参加完典礼的黑衣、蓝衣、红衣。即兴赤膊顶跑了两组5000米，中间休息了2分钟，第一组是前掌跑@4’13"，第二组是后跟跑@4’05"，因为穿的平底鞋，前掌跑磨得不太舒服，手表上的最大摄氧量重回62，心率尚可，整体满意，就算还没到巅峰，也不算太远了。

PS：MD到十一点半还下这么大，让不让人回去了！！！

CVPR 2024 BEST PAPER (arxiv.2309.07906v3)

使用DIFFUSION MODEL对2D动作信息进行预测。虽然这个BEST PAPER是刚刚公布的，但其实已经是很早的工作了，本身是一个视频生成任务，现在来看做得比这个惊艳的也不在少数。

但是有趣的是，不同于传统的生成，这更像是一个预测任务，它预测原始图像中每一个像素点将会在下一个时间点中出现在哪个位置。当然这不可能做到一一对应，否则可想而知的粗糙，需要在傅里叶空间进行平滑平滑渲染。

现在DIFFUSION的研究点，静态生成主要还是在无训练上，因为加入训练微调，其实是可以很容易的做到许多图片编辑的操作，关键在于如何不通过训练，而是直接通过在语义表征上进行编辑，来实现高效可行的图像编辑。

之前亦童那边的实验，发现如果只是去替换一些简单物体其实是有比较可观的概率编辑成功的——即在不改变图像其他部分，将狗替换成猫，将西装替换为工装之类的，但是一旦涉及颜色、外形、数量时，就不奏效了。

在DIFFUSION这块，有一个与NLP反直觉的操作，就是图象语义的融合。图像的生成通过positive prompt和negative prompt构成，目前主流的操作是直接把两者进行凸组合（权重是预先设置好的超参，不过也有训练这个权重超参的，但是为了更加快速的部署应用，一般这东西都是不训练的），在文本上就很难想象会直接用这种暴力的加权和操作来融合语义，令人费解的是，这在图像生成的表现是很好的。

之前有一篇工作是做图象分块生成，比如希望左上是一轮圆月，右上是一棵树，左下是一个人，右下是一片湖泊，即位置控制的输出，他们的操作就是做了4个prompt（对应四个部位的语义），分别采样得到初始噪声，然后就非常简单的把这4个初始噪声按照所需的位置裁剪了一下拼起来，令人瞋目结舌，但是结果却很意外的好，但是复现也很难达到论文中宣称的效果。

另一个就是如何加快DIFFUSION的效率，一些研究发现是可以在早期进行跳步——即无需慢慢去噪，可以在扩散的过程中跳步进行，这样更快，对图象的生成结果影响并不大，因为早期步骤的图象中噪声比例很大，并不包含过多的信息。

之前gpt-4o的tokenizer小结：

import tiktoken
import langdetect
import regex as re

# https://platform.openai/tokenizer
T1 = tiktoken.get_encoding('cl100k_base')
T2 = tiktoken.get_encoding('o200k_base')

T1.encode('i love chatgpt') # [72, 3021, 6369, 70, 418]
T2.encode('i love chatgpt') # [72, 3047, 7999, 70, 555]

T1.n_vocab, T2.n_vocab # (100277, 200019)

len(T1.encode('你好，我的名字是GPT-4o。我是一种新型的语言模型，很高兴见到你!')) # 34
len(T2.encode('你好，我的名字是GPT-4o。我是一种新型的语言模型，很高兴见到你!')) # 24

可见gpt-4o的tokenizer做了token的压缩，之前都是按照char级别来tokenizer的，现在合并了更多的多音节词

token_ids_len1 = 0
token_ids_len2 = 0
with open('./xyj.txt', 'r', encoding='utf-8') as f:
    for line in f.readlines():
        token_ids_len1 += len(T1.encode(line))
        token_ids_len2 += len(T2.encode(line))
token_ids_len1, token_ids_len2, token_ids_len1/token_ids_len2, token_ids_len2/token_ids_len1
# (990508, 703303, 1.408365953223575, 0.7100427255509294)

用一篇长文测试，大约是71%的压缩率。

https://chat.openai/share/f4227bf3-bc46-43d1-a982-f7dea78702ed
- “给主人留下些什么吧”这句话翻译成英文
出于压缩tokens的目的，先在最广泛的基础语料里，训练了一把tokenizer，但没做过多的过滤，而在实际训练 transformer 的过程中，又过滤了基础语料的低质量数据集，导致那些只在基础语料中大量出现的token 未被充分训练；
https://gist.github/ctlllll/4451e94f3b2ca415515f3ee369c8c374

T2.encode("给主人留下些什么吧") # [177431]
T1.encode("给主人留下些什么吧") # [90112, 36668, 17792, 40198, 247, 17297, 98184, 6271, 222, 82696, 7305, 100]

这句话被预训练的tokenizer编码成了一个token，很神奇，其实这也是目前的一个困扰，我们当然可以容忍一些错误，这是无法避免的，但是希望这些错误是和人类偏好一致的，即人也会犯这些错误，我们不希望做出一个我们自己也看不懂的agent。

length_dict = {}

for i in range(T2.n_vocab):
    try:
        length_dict[i] = len(T2.decode([i]))
    except:
        pass
      
# Sort by length
length_dict = dict(sorted(length_dict.items(), key=lambda item: -item[1]))
# print(length_dict)
# Print the top 100 chinese wordsÅ
tot = 0
for item in length_dict:
    try:
        if langdetect.detect(T2.decode([item])) == "zh-cn":
            print(item, T2.decode([item]))
            tot += 1
    except:
        pass
    if tot == 20:
        break

"""185118 _日本毛片免费视频观看
116852  中国福利彩票天天
128031 久久免费热在线精品
154809 无码不卡高清免费v
172750  大发快三大小单双
177431 给主人留下些什么吧
181679  qq的天天中彩票
184969 _日本一级特黄大片
187822  大发快三开奖结果
49649  彩神争霸邀请码
89409 免费视频在线观看
122333 无码不卡高清免费
122712 无码一区二区三区
128600  大发时时彩计划
133274 】【：】【“】【
135161  大发时时彩开奖
149168  大发时时彩怎么
160029  大发快三是国家
160131  大发快三是不是
176039 精品一区二区三区"""

离大谱，但是英文里的长token就没这么离谱：

length_dict = {}

for i in range(T2.n_vocab):
    try:
        length_dict[i] = len(T2.decode([i]))
    except:
        pass
      
# Sort by length
length_dict = dict(sorted(length_dict.items(), key=lambda item: -item[1]))
# print(length_dict)
# Print the top 100 chinese words
tot = 0
pattern = r'^[\s\W_=+\\-]*$'
for item in length_dict:
    try:
        # print(T2.decode([item]), re.match(pattern, T2.decode([item])))
        if not re.match(pattern, T2.decode([item])):
            print(item, T2.decode([item]))
            tot += 1
    except:
        pass
    if tot == 20:
        break

"""161518 abcdefghijklmnopqrstuvwxyz
184150 ABCDEFGHIJKLMNOPQRSTUVWXYZ
130756  verantwoordelijkheid
150141  สำนักเลขานุการองค์กร
106123  telecommunications
133739  selbstverständlich
135127  วิเคราะห์บอลวันนี้
154976 .onreadystatechange
166459  significativamente
184611  Telecommunications
193348  Wahrscheinlichkeit
197767  disproportionately
88004  unterschiedlichen
100106  interdisciplinary
117361 .githubusercontent
132622  responsabilidades
134381  Herausforderungen
135128  multidisciplinary
138955  STDMETHODCALLTYPE
198090  commercialization"""

20240627

尺有所短，寸有所长。对面两位已经快被运筹整麻，很难想象，奔三的人还在被期末考折磨。不过测试了一下，把题目转成latex，给gpt-4o求解，理论分析能说出点东西，但是证明题不行。测了一下glm4，效果其实差不太多，但是一些更复杂理论的大段推导，很容易发生前后符号不一致的行为（幻觉）。另外也测了一些数论题的证明，更惨不忍睹。

恢复训练第26天，重回巅峰！晚上复旦南区雨战129，所有人都被我拉爆（嘉伟缺席，不然能轮到我撒野？），课表为倒金字塔，精英组4K@3’55"+3K@3’45"+2K@3’35"+1K@3’25"，休息时间4min+3min+2min，高级组每组配速慢20秒。高级组即便中间不休息我也有把握轻松跑完，但是没有人跑精英组，那就一个人顶精英组！

第一组4K，状态爆炸，只跟了高级组半圈，就提速拉开大部队，不到15分钟跑完4K，甚至比精英组的配速还要快10秒多，并明显有余力，这已经不逊于三月巅峰期的表现，何况这是在夏天的雨夜。
第二组3K，感觉有些吃力，但还是以3’45"的配速艰难地顶完3000米，平精英组的配速。
第三组2K，强弩之末，但是都顶完了70%，剩下30%怎么能放弃？无氧是弱项，已经很难扛住3’40"以内的配速，这一组已经掉出精英组的配速，但也是远远超过高级组。
第四组1K，完全冲不起来，腿上灌铅，甚至只跑一圈就想放弃，第二圈稍微放了一点缓了缓，3’37"收尾，差强人意，虽然比精英组是慢了许多，但总算是完成了课表。

真的又行了，不是吗？我们会赢。（虽然XR依然拉胯，带不动，带不动）

看几个大模型解答运筹题的案例，效果其实差不太多。这是一个解答Rosenbrock函数：

另外让它做内点法步骤说明，也是不够准确的，里面有很多数学符号前后不一致：

试了试数论证明，直接是依托答辩，像极了当年数论题证不出来疯狂伪证的场面：

20240628

昨晚睡前脚踝有一点点不适，有些担心强度太大导致伤痛复发，不过今早已无碍。

去小姨家大强度吃肉，三文鱼、牛肉、烧鸡、骨头汤，还是光吃不练的那种，最近强度太大，是该多吃点。最近三文鱼上市，还挺便宜。

本来打算明早去世纪公园跟黑马训练，这是迄今为止，我看到的最简单的黑马课表，因为我的水平通常在D组到E组这个区间上，但这个课表的D组对我真的很容易，甚至有机会跟C组碰一碰，然后AK告诉我这其实是轻松跑，我？？？

明早雷暴，那就躺平呗，何必去自找不自在。欲速不达，适得其反，前路漫漫，越是临近巅峰，越要放慢脚步。

关于位置编码：

传统Transformer中使用的是绝对位置编码：

attention mechanism （Transformer 最特色的）
- X ∈ R ℓ × d X\in\mathbb R^{\ell\times d} X∈Rℓ×d
- W k ∈ R d × d k , W q ∈ R d × d k , W v ∈ R d × d v W_k\in\mathbb R^{d\times d_k},W_q\in\mathbb R^{d\times d_k},W_v\in\mathbb R^{d\times d_v} Wk∈Rd×dk,Wq∈Rd×dk,Wv∈Rd×dv
- Q = X W q ∈ R ℓ × d k , K = X W k ∈ R ℓ × d k , V = X W v ∈ R ℓ × d v Q=XW_q\in\mathbb R^{\ell\times d_k}, K=XW_k\in\mathbb R^{\ell\times d_k}, V=XW_v\in\mathbb R^{\ell\times d_v} Q=XWq∈Rℓ×dk,K=XWk∈Rℓ×dk,V=XWv∈Rℓ×dv

Attention ( Q , K , V ) = softmax ( Q K T d k ) V \text{Attention}(Q,K,V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V Attention(Q,K,V)=softmax(dk QKT)V

A i j = exp ⁡ ( q i T k j d k ) ∑ j ′ exp ⁡ ( q i T k j ′ d k ) A_{ij}=\frac{\exp(\frac{q^T_ik_j}{\sqrt{d_k}})}{\sum_{j'}\exp(\frac{q^T_ik_{j'}}{\sqrt{d_k}})} Aij=∑j′exp(dk qiTkj′)exp(dk qiTkj)

A i j A_{ij} Aij （attention weights， Q K T QK^T QKT: attention scores）表示的是位置 i i i 的词（token）与位置 j j j 的词（token）的注意力权重，
- 就是如果 x i x_i xi/ x j x_j xj 或者 q i q_i qi/ q j q_j qj( k i k_i ki/ k j k_j kj) 没有编码位置信息，那么的话，这个weight就跟位置无关，显然在 seq modeling 中是有很大缺陷的
- 也就是一句话的含义，肯定跟 token 的组织有序有关

BERT: 加性 (absolue) position encoding （learnable position encoding）

# modeling_bert.py
embeddings = inputs_embeds + token_type_embeddings + self.position_embeddings(position_ids)

GPT: 加性（absolute）position encoding（learnable position encoding）

# modeling_gpt.py
if inputs_embeds is None:
    inputs_embeds = self.wte(input_ids)
position_embeds = self.wpe(position_ids)
hidden_states = inputs_embeds + position_embeds + token_type_embeds

但是到Llama，就变成相对位置编码（旋转编码，RoPE）

从绝对位置编码到相对位置编码
- 绝对位置编码，位置 pos_i 的编码仅取决于 pos_i 的值；
- 相对位置编码，（一般不需要对每个位置进行单独的编码），而是直接对位置之间的相对距离进行编码
  - pos=0 与 pos=1 的相对位置 f ( ∣ 0 − 1 ∣ ) f(|0-1|) f(∣0−1∣)
  - pos=1 与 pos=3 的相对位置 f ( ∣ 1 − 3 ∣ ) f(|1-3|) f(∣1−3∣)
  - 偏差构成的矩阵，称为 id 矩阵；
RoPE
- 旋转位置编码，为相对位置编码，非加性位置编码，直接嵌入到 attention mechanism 的计算中；
- R Θ , m d R^d_{\Theta,m} RΘ,md：位置 m m m 对应的旋转矩阵 not learnable：非学习的，全局固定的；
  - m θ m\theta mθ：frequency

f ( q , m ) T f ( k , n ) = ( R m q ) T ( R n k ) = q T ( R m T R n ) k = q T R n − m k \begin{split} f(q,m)^Tf(k,n)&=(R_mq)^T(R_nk)\\ &=q^T(R^T_mR_n)k\\ &=q^TR_{n-m}k \end{split} f(q,m)Tf(k,n)=(Rmq)T(Rnk)=qT(RmTRn)k=qTRn−mk

# freqs_cis 是一个全局的旋转矩阵
xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
xq, xk = apply_rotary_emb(xq, xk, freqs_cis=freqs_cis)
xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(3)
xk_out = torch.view_as_real(xk_ * freqs_cis).flatten(3)
xq, xk, xv

20240629

一日蒸笼，不少人去龙阳路的ISPO薅羊毛，这种天气真不如老老实实家里蹲。

晚上箭步8组（+20kg），正向反向各4组，热到小臂上都是密密麻麻的水滴。补10圈慢跑，本来第一个1000米跑了5’03"，后面胡哥到场，给了我很大压力，后面硬冲了一段，胸闷得难受，跑完还手残把记录删了，难绷。

这个月194.2km（@4’17"/km），本来指着今晚能跑到200km，明天预计一日大雨（明天有人要办会，也真是挑了个好日子），有丶强迫症，明天得想办法补掉这5.8km。

关于文档的API操作详解

先创一个用户BEAN类, name和age两个字段
继续编写测试文件(汗)

例1: 测试添加文档

@SpringBootTest
class KuangshenEsApiApplicationTests {
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;
	
	
	// 测试添加文档
	@Test
	void testAddDocument() {
		// 创建对象
		User user = new User("狂神说",3);
		// 创建请求
		IndexRequest request = new IndexRequest("kuang_index");
		
		// 规则 put /kuang_index/_doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(1));
		request.timeout("1s");
		
		// 将我们的数据放入请求 json （核心本质！！！）
		request.source(JSON.toJSONString(user),XContentType.JSON);
		
		// 客户端发送请求, 获取响应的结果
		IndexResponse indexResponse = client.index(request,RequestOptions.DEFAULT);
		
		System.out.pringln(indexResponse.toString());
		System.out.pringln(indexResponse.status()); // 对应命令返回的状态 CREATED
		
		
	}

}

例2: 测试获取文档, 判断是否存在

@Test
void testExistsDocument() throws IOException {
	GetRequest getRequest = new GetRequest("kuang_index","1");
	
	// 不获取返回的_source的上下文: 不必要
	getRequest.fetchSourceContext(new FetchSourceContext(false));
	getRequest.storedFields("_none_");
	
	boolean exists = client.exists(getRequest,RequestOptions.DEFAULT);
	
	System.out.pringln(exists);
	
}

例3: 测试获取文档信息

@Test
void testGetDocument() throws IOException {
	GetRequest getRequest = new GetRequest("kuang_index","1");
	GetResponse getResponse = client.get("getRequest",RequestOptions.DEFAULT);
	System.out.pringln(getResponse.getSourceAsString());
	System.out.pringln(getResponse); // 返回的全部内容与命令式一样
	
}

例4: 测试更新文档信息

@Test
void testUpdateDocument() throws IOException {
	UpdateRequest updateRequest = new UpdateRequest("kuang_index","1");
	updateRequest.timeout("1s");
	
	User user = new User("狂神说Java",18)
	
	updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);
	
	UpdateResponse update = client.update(updateRequest,RequestOptions.DEFAULT);
	System.out.pringln(updateRequest.status());
	
}

例5: 测试删除文档信息

@Test
void testDeleteDocument() throws IOException {
	DeleteRequest request = new DeleteRequest("kuang_index",3);
	deleteRequest.timeout("1s");
	
	DeleteResponse deleteResponse = client.delete(request,RequestOptions.DEFAULT);

	System.out.pringln(deleteResponse.status());
	
}

例6: 特殊的, 真的项目一般会批量插入数据

@Test
void testBulkRequest() throws IOException {
	BulkRequest bulkRequest = new BulkRequest();
	bulkRequest.timeout("10s");
	
	ArrayList<User> userList = new ArrayList<>();
	
	userList.add(new User("kuangshen1",3));
	userList.add(new User("kuangshen2",3));
	userList.add(new User("kuangshen3",3));
	userList.add(new User("kuangshen4",3));
	userList.add(new User("kuangshen5",3));
	userList.add(new User("kuangshen6",3));
	
	// 批处理请求
	for (int i = 0; i<userList.size(); i++) {
		bulkRequest.add(
			new IndexRequest("kuang_index").id(""+(i+1)).source(JSON.toJSONString(userList.get(i),XContentType.JSON));
		);
	}
	
	BulkResponse bulkResponse = client.bulk(bulkRequest,RequestOptions.DEFAULT);
	System.out.pringln(bulkResponse.hasFailures()); // 是否失败, 返回false表示成功
}

例7: 查询

SearchRequest 搜索请求
SearchSourceBuilder 条件构造
HighlightBuilder 构造高亮
TermQueryBuilder 构造精确查询
MatchQueryBuilder
xxxQueryBuilder 对应上面非SpringBoot部分看到的那些控制台的命令

@Test
void testSearch() throws IOException {
	SearchRequest searchRequest = new SearchRequest(kuang_index);
	// 构建搜索条件
	SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
	//sourceBuilder.highligher(); 高亮
	// 查询条件, 可以用QueryBuilders工具实现
	// QueryBuilders.termQuery 精确
	// QueryBuilders.matchAllQuery() 匹配所有
	
	TermQueryBuilder termQueryBuilder = QueryBuilder.termQuery("name","kuangshen1")
	
	// MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
	sourceBuilder.query(termQueryBuilder);
	sourceBuilder.timeout(new TimeValue(60,TimeUnit.SECONDS));
	
	SearchResponse searchResponse = searchRequest.source(sourceBuilder,RequestOptions.DEFAULT);
	
	System.out.pringln(JSON.toJSONString(searchRequest.getHits())); // 记得那个hit键了吗?
	
	
	for (SearchHit documentFields : (searchRequest.getHits())) {
		System.out.pringln(documentFields.getSourceAsMap());
	}
}

20240630

白天大雨滂沱不息。万万没想到，最后还是让我完成了200K计划（六月总跑量200.2km，平均配速4’16"50；最近一年1696.3km，平均配速4’24"40）。

今晚状态实在不行，早上八点冒雨过来，一直坐到晚上七点，大雨下个不停，中间只是出去吃了两顿饭，根本不想动，身体僵硬得不行。晚上下去吃饭才发现雨已经停了，而且出了很红很红的晚霞，即兴就想穿拖鞋去跑会儿，又怕把脚磨得太厉害，还是算了。

分三段，3K@408+2K@408+1K@357，（心率156bpm+151bpm+159bpm），这两天被薅得太狠，身心疲劳，回去连核心都不想练，根本坚持不下去，而且没多久就湿透，还不能脱衣服，跑得很不舒服，如果不是硬凑这200K，跑两圈就想撤了。

PS：SXY终于打开了5K30分的大门，功夫不负有心人，说是新手福利期可能有点不尊重她这两周的疯癫，虽然这个水平依然捉襟见肘，但较于其过去的表现而言已经算是奇迹了。但凡事越上头，摔得越狠，一口吃不成胖子，不过，不吃也永远成不了胖子，除非打肿脸充胖子。

Llama源码分析

源码地址：https://github/facebookresearch/llama.git
- 默认 main 分支为 llama2
- llama_v1 分支为 llama1
- weights 申请地址（非常容易）：
  - https://ai.meta/resources/models-and-libraries/llama-downloads/
参数
- 7B
  - dim: 4096
  - multiple_of: 256 (SwiGLU)
  - n_layers: 32
  - n_heads: 32

# 4096 / 32 == 2^12/2^5 == 2^7 == 128
self.head_dim = args.dim // args.n_heads

llama2: https://github/facebookresearch/llama
rmsnorm: https://github/bzhangGo/rmsnorm
- layernorm: https://pytorch/docs/stable/generated/torch.nn.LayerNorm.html
Swish: SwiLU, SiLU
- https://arxiv/abs/1710.05941v1
- https://pytorch/docs/stable/generated/torch.nn.functional.silu.html
参考：
- https://akgeni.medium/llama-concepts-explained-summary-a87f0bd61964

关于RMSNorm

Pre-normalization Using RMSNorm llama
- root mean square norm
- before attn
- before ffn
- 相比较 layernorm（re-centering，re-scaling），优化了计算时间；
  - rms norm 只关注 re-scaling
  a ˉ i = a i RMS ( a ) ⊙ g i , where RMS ( a ) = 1 n ∑ i = 1 n a i 2 . \begin{align} \begin{split} & \bar{a}_i = \frac{a_i}{\text{RMS}(\mathbf{a})} \odot g_i, \quad \text{where}~~ \text{RMS}(\mathbf{a}) = \sqrt{\frac{1}{n} \sum_{i=1}^{n} a_i^2}. \end{split}\nonumber \end{align} aˉi=RMS(a)ai⊙gi,where RMS(a)=n1i=1∑nai2 .
- 我们自然地可以得到一个结论就是，经过 rms norm 之后的 a ˉ \bar{\mathbf{a}} aˉ 其 ℓ 2 \ell_2 ℓ2 norm 为 n \sqrt n n

∥ a ˉ ∥ 2 = ∑ a i 2 1 n ∑ a i 2 = n \|\bar{\mathbf{a}}\|_2=\sqrt{\frac{\sum a_i^2}{\frac1n\sum a_i^2}}=\sqrt{n} ∥aˉ∥2=n1∑ai2∑ai2 =n

class RMSNorm(torch.nn.Module):
    def __init__(self, dim: int, eps: float = 1e-6):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(dim))

    def _norm(self, x):
        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

    def forward(self, x):
        output = self._norm(x.float()).type_as(x)
        return output * self.weight

import numpy as np
import torch
from torch import nn

bs, seq_len, embedding_dim = 20, 5, 10
x = torch.randn(bs, seq_len, embedding_dim)

ln = nn.LayerNorm(embedding_dim)

x_ln = ln(x)

print(x_ln[1, 3, :].mean())
print(x_ln[1, 2, :].std(unbiased=False))

class RMSNorm(torch.nn.Module):
    def __init__(self, dim: int, eps: float = 1e-6):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(dim))

    def _norm(self, x):
        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

    def forward(self, x):
        output = self._norm(x.float()).type_as(x)
        return output * self.weight

关于SwiGLU

Inspiration of using SwiGLU in LLaMA is taken from PaLM.
也叫 SiLU (Sigmoid Linear Unit)，

silu ( x ) = x ⋅ σ ( x ) = x ⋅ 1 1 + exp ⁡ ( − x ) = x 1 + exp ⁡ ( − x ) \text{silu}(x)=x\cdot\sigma(x)=x\cdot\frac{1}{1+\exp(-x)}=\frac{x}{1+\exp(-x)} silu(x)=x⋅σ(x)=x⋅1+exp(−x)1=1+exp(−x)x

def sigmoid(x):
    return  1/(1 + np.exp(-x))

def swish(x):
    return x*sigmoid(x)

import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 120

x = np.arange(-5, 5, .01)
plt.plot(x, swish(x))

20240701

上海的夏天正式开始了！夏训，就是要用最火热的速度，吃最冰冷的辣🍔。

今天是10km热身+800米冲刺，其实是垃圾跑量，状态非常不好，420的配速跑到190的心率（炎热，而且原计划就是慢跑，穿的衬衫跑，差点没给闷死），中途看表没把我吓死。好不容易分三段顶完10km，中途数次想把衬衫脱掉，但还是没好意思。最后等来嘉伟，跟他一起赤膊顶了两圈，一冲小腿就抽，极度拉胯，等我好好调整两天状态，再跟嘉伟决一死战（自以为目前勉强能跟嘉伟掰两下手腕，虽然他今天起步就是320以内的配速，晚上吃多差点没给我噎死）。

手头三副眼镜居然全坏了，一副断了两条腿，一副从鼻梁中间断开，一副断了一条腿，还有一副备用的，就是度数不太够，可能是因为最近半年经常跑步跑到一半就把眼镜甩场上，但这也太不耐摔了，难绷。反正，这两周顺利地话把签证赶紧办了，准备回去两天，在这也是给wyl硬薅，烦。不如回去养两天老，调养调养身体，不能这么瞎逼熬了。

PS：LXY六月总跑量228km（平均配速5’19"），离校一周还比我多跑这么多，输。上半年总跑量1180km（平均配速5’14"），比我多150km，输麻了。

Latent Diffusion Model v.s. Pixel-space Diffusion Model

前者将高维度特征映射到低维的隐层表示，进行训练，然后解码还原，优势在于计算效率高，但图像质量不如后者。后者则一般用于高质量图像的生成。

关于一些简短的视频生成，比如给物体一个轻微的作用力，其短时间内产生的运动状态一般是简谐振动，可以通过快速傅里叶变换（FFT）来建模其振动模式。

也有那种单纯使用Diffusion方法进行视频生成的手段，但难以确保其：

不一致的动量行动 incoherent motion
结构上存在不现实的时间变化 unrealistic temporal variation in textures
违反物理规律（如质量守恒） violations of physical constraints like preservation of mass

近期一篇关键的VLM的关键paper

What matters when building vision-language models? arxiv.2405.02246

主题: Diffusion Models for Robotics

Title	Arxiv	GitHub
Learning Universal Policies via Text-Guided Video Generation	https://arxiv/abs/2302.00111	https://universal-policy.github.io/unipi/
Scaling Robot Learning with Semantically Imagined Experience	https://arxiv/abs/2302.11550	https://diffusion-rosie.github.io/
Synthetic Experience Replay BV1wx4y1P7hD	https://arxiv/abs/2303.06614	https://github/conglu1997/SynthER
Diffusion Model-Augmented Behavioral Cloning	https://arxiv/abs/2302.13335	https://nturobotlearninglab.github.io/dbc/
Planning with Diffusion for Flexible Behavior Synthesis	https://arxiv/abs/2205.09991	https://diffusion-planning.github.io/
Is Conditional Generative Modeling all you need for Decision-Making?	https://arxiv/abs/2211.15657	https://anuragajay.github.io/decision-diffuser/
Imitating Human Behaviour with Diffusion Models	https://arxiv/abs/2301.10677	https://github/microsoft/Imitating-Human-Behaviour-w-Diffusion
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning	https://arxiv/abs/2208.06193	https://github/Zhendong-Wang/Diffusion-Policies-for-Offline-RL
Efficient Diffusion Policies for Offline Reinforcement Learning	https://arxiv/abs/2305.20081	https://github/sail-sg/edp
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion BV1Cu411Y7d7	https://arxiv/abs/2303.04137	https://github/real-stanford/diffusion_policy
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning	https://arxiv/abs/2307.01849	https://github/LostXine/crossway_diffusion

20240702

跟他们分享了一下Generative Image Dynamics，讲一遍还是很有收获的，事后感觉这玩意儿还是个缝合怪，15年也是他们实验室做的理论研究，用的C++和OpenGL实现，然后光流法本身也是CV这块老生常谈的东西，但是不得不承认，确实做得很有解释性，比那些烂大街的猛堆参数的傻缺操作有意义多了，还是比较通俗易懂的。想想还是物理数学这些基础学科有意思，可惜自己天赋不够，只能来卷这个烂赛道。

晚上下会九点一刻，到操场，嘉伟和XR都还没走，嘉伟赤膊狂奔，说这天气4分配都坚持不下来，断断续续才凑了10km。XR看起来似乎依然拉胯得不行，沦落到跟LXY一个水平了都快，越来越没追求，指望他下半年能赶上我，亏我还对他抱有厚望，唉，今年高百是没指望咯，我远在天国的宋某快失踪两个月了都，希望下一次他出现会是全盛归来。

时间太晚，只跑了4.36km，4’26"的均配，心率157bpm，约嘉伟明天不下雨，跟他认真地大干一场，虽有些许不自量力，但不拼尽全力，谁又晓得能不能做到呢？这个夏天，我只想干死诸位，或者被诸位干死。

CLIP因为是在VQA的图文对上进行训练，因此天生地对VQA形式的图像语义把握更好，但是如果要做到对图中一块物体进行处理，反而没那么好用。

关于pinned data transfer

Host (CPU)
- pinned memory 定义在 host（cpu）上；
HtoD: host to device
DtoH: device to host

As you can see in the figure, pinned memory is used as a staging area for transfers from the device to the host. We can avoid the cost of the transfer between pageable and pinned host arrays by directly allocating our host arrays in pinned memory.

import time
import torch

# 创建一个大的Tensor以便看到明显的时间差异
size = (20000, 20000)

# 普通内存Tensor
normal_tensor = torch.FloatTensor(*size)
# 将普通Tensor复制到GPU并计时
t0 = time.time()
normal_tensor_gpu = normal_tensor.to("cuda")
time.time() - t0

# Pinned内存Tensor
pinned_tensor = torch.FloatTensor(*size).pin_memory()
# 将Pinned Tensor复制到GPU并计时
t0 = time.time()
pinned_tensor_gpu = pinned_tensor.to("cuda", non_blocking=True)
time.time() - t0

device to host

size = (20000, 20000)
gpu_tensor = torch.randn(*size, device="cuda")

# 复制到普通内存并计时
t0 = time.time()
normal_tensor_cpu = gpu_tensor.to("cpu")
time.time() - t0

# 为了使用pinned memory，首先在CPU上创建一个pinned memory Tensor
pinned_tensor_cpu = torch.randn(*size).pin_memory()

# 确保GPU操作完成
torch.cuda.synchronize()

# 使用非阻塞方式复制到Pinned内存并计时
t0 = time.time()
pinned_tensor_cpu.copy_(gpu_tensor, non_blocking=True)
torch.cuda.synchronize()  # 等待数据传输完成
time.time() - t0

non-blocking

Use tensor.to(non_blocking=True) when it’s applicable to overlap data transfers
- 使用non_blocking=True将异步地将数据移动到GPU，而不会阻塞CPU，

cudaMemcpy(d_a, a, numBytes, cudaMemcpyHostToDevice);
increment<<<1,N>>>(d_a)
cudaMemcpy(a, d_a, numBytes, cudaMemcpyDeviceToHost);

d_a: device
第一行是将数据从Host（CPU内存）拷贝到device（GPU显存）。注意此时还是在Host上执行的，也就是说这个时候Host上的CPU在将数据拷贝到Device上，所以必须得等到第一行运行结束后，才会进入到第二行代码
- cudaMemcpy(void* dst, const void* src, size_t count, cudaMemcpyKind kind)
第二行代码是在Device上启动(launch)和执行(execute)的。注意分成启动和执行两步骤。一旦第二行启动后，主机上的CPU就会立马执行第三行，并不会再去等执行了
第三行代码是将数据从Device拷贝到Host，但是此时的data transfer需要等到第二行Device执行结束才能开始。

model.train()
# Reset the gradients to None
optimizer.zero_grad(set_to_none=True)
scaler = GradScaler()

for i, (features, target) in enumerate(dataloader):
    # these two calls are nonblocking and overlapping
    features = features.to('cuda:0', non_blocking=True)
    target = target.to('cuda:0', non_blocking=True)

    # Forward pass with mixed precision
    with torch.cuda.amp.autocast(): # autocast as a context manager
        output = model(features)
        loss = criterion(output, target)

当您设置non_blocking=True时，数据传输（CPU到GPU的复制）是异步的，这意味着它不会阻塞程序的执行。因此，在features和target被复制到GPU的同时，CPU可以继续执行下面的代码，直到实际需要使用这些变量的值进行计算。
在异步数据传输的情况下，当执行到model(features)时，如果features和target还没有完全复制到GPU完成，GPU会等待这个复制结束，然后开始计算。这个等待过程是自动管理的。如果复制过程在模型开始计算之前完成，则不会有任何等待时间。

CUDA编程

https://github/NVIDIA-developer-blog/code-samples.git
- code-samples/series/cuda-cpp/optimize-data-transfers/bandwidthtest.cu

$ nvcc bandwidthtest.cu -o a.out
$ ./a.out

Device: NVIDIA GeForce RTX 4090
Transfer size (MB): 16

Pageable transfers
  Host to Device bandwidth (GB/s): 5.959241
  Device to Host bandwidth (GB/s): 5.124604

Pinned transfers
  Host to Device bandwidth (GB/s): 13.453977
  Device to Host bandwidth (GB/s): 13.369578

20240703

一个人跑崩的夜，2k@355+4k@403+1k@347+0.45k@356+0.45k@342+0.9k@343+1.2k@350
晚上7.40突然雷阵雨，下了有半个小时，之后就是经典的高温高湿，昨晚好好休息了一下，以为今天至少能4分配扛下来10km，结果2km左胸就开始疼了，第二个4k带的XR和YY一起跑，跑了4圈我问XR怎么样，能不能顶下来，他说小意思，然后6圈不到他就报销了。
前面2+4跑完之后，想一鼓作气再顶一个4k收尾，结果已是强弩之末，跑得七零八落，一个人根本不可能顶得下去，太难受了。
很不爽，我会卷土重来的，这种天气起码得能顶完一个40分钟以内的万米。
PS：今晚食堂的菜品不错，有茼蒿和木耳炒蛋，都是今年没出过的菜品。吃完才想起来应该是中午给参加夏令营的小朋友们吃剩下的，后知后觉。

刷新页面

from selenium import webdriver
chrome_driver_path = '/path/to/chromedriver' # 设置Chrome驱动器路径
driver = webdriver.Chrome(chrome_driver_path) # 创建一个Chrome浏览器实例
driver.get('http://www.example') # 打开页面
driver.refresh() # 刷新页面
driver.quit() # 关闭浏览器

在Selenium中，如果你想在新标签页中打开链接，你可以使用Keys.CONTROL（对于Windows/Linux）或Keys.COMMAND（对于Mac）来模拟点击链接的行为。以下是一个Python示例代码，展示了如何在新标签页中打开链接：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# 设置Chrome驱动器路径
driver_path = 'path/to/your/chromedriver'
# 启动Chrome浏览器
driver = webdriver.Chrome(executable_path=driver_path)
# 打开网页
driver.get('http://example')
# 定位到链接元素
link = driver.find_element_by_link_text('your_link_text')
# 使用Keys.CONTROL（或Keys.COMMAND）在新标签页中打开链接
link.send_keys(Keys.CONTROL + 'a')  # 选择链接文字（可选）
link.send_keys(Keys.CONTROL + 'click')  # 在新标签页中打开链接
 
# 如果需要等待新标签页加载完成，可以使用以下代码
# 切换到新打开的标签页
new_window = driver.window_handles[-1]
driver.switch_to.window(new_window)
print(driver.title) # 做一些操作，比如检查标题或其他内容
driver.quit() # 关闭浏览器

回滚页面

from selenium import webdriver
driver = webdriver.Chrome() # 启动浏览器
driver.get('http://www.example/page1') # 打开第一个页面
driver.get('http://www.example/page2') # 打开第二个页面
driver.back() # 回退到第一个页面
driver.back() # 回退到首页
driver.quit() # 关闭浏览器

青绿数据（ESG，这年头经济学总是喜欢做一些花里胡哨的课题）研报PDF爬取（未调试，主要问题在于如何处理PDF加载不出来的问题，需要检查alerting）

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@stu.sufe.edu

import sys
sys.path.append("../")

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

from base import BaseCrawler

class ESG(BaseCrawler):
	url_host = "https://i-esg/"
	url_report = url_host + "esg/esgReport"

	xpaths = {
		"report_table": "//table[@class=\"vxe-table--header\"]",	# <table> which lists all the PDF reports
		"report_link": "//a[@class=\"line-clamp-1\"]",	# <a> which links to the PDF reports
		"download_div": "//viewer-download-controls[@id=\"download\"]",	# <div> which contains the download button 
		
		"next_page_icon": "//i[@class=\"vxe-pager--btn-icon vxe-icon-arrow-right\"]",	# Icon which is clicked to the turning page
		"current_page_button": "//button[@class=\"vxe-pager--num-btn is--active\"]",	# Button which show the current page number (unique)
		"other_page_button": "//button[@class=\"vxe-pager--num-btn\"]",	# Button which show other page numbers (many)
		"download_button": "//cr-icon-button[@id=\"download\"]",	# Button which is clicked to download
		"close_icon": "//span[@class=\"is-icon-close\"]",	# Icon which is clicked to close
	}
	
	def __init__(self):
		pass

	def run(self):
		driver = self.initialize_driver(browser = "chrome",
										headless = False,
										timeout = 60,
										)
		driver.get(self.url_report)
		BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["report_table"])
		report_links = driver.find_elements_by_xpath(self.xpaths["report_link"])
		for report_link in report_links:
			report_link.click()
			# TODO: Check warning if PDF is not successfully loaded
			BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["download_button"])
			driver.find_element_by_xpath(self.xpaths["download_button"]).click()	# 下载
		driver.quit()

esg = ESG()
esg.run()

20240704

今日训练：3000米@4’00"+10组×400米@3’20"+800米放松（@345）

正式出梅第一日，就是38℃的火炉，白天外面根本不能在太阳底下活动，太可怕了，以前这个温度至少得要到七月中下旬才开始，今年一上来就是这个温度，地球真的已经不适合生存了。

晚上原计划慢跑，衣服鞋子都没有换，因为昨天强度不低，跑得很累。但是现在就算想慢跑，起手也至少是40X的配速，12’02"顶完一个3000米，进入第二形态，冲了5个400米后（3组1’20"，另外两组放了，只有1’26"和1’29"），明显感觉已经不行了，好在嘉伟及时赶到，他跑5×800米间歇，刚好我就跟着他跑400米间歇（他2圈，我1圈，第1圈我带他），于是后5组400米间歇全部跑进1’20"，嘉伟的5组800米也全部跑进2’40"。

最后跟嘉伟遛了800米收尾，3分整，训练质量不错，比较满意的一晚。

夏训这才刚刚开始，最近各位都开始这么懒了嘛。不过时隔4天，SXY今晚终于跑了会儿，距离还不短，不过肯定是跑到力竭了，连起点都没跑回就停了，不必勉强。

ESG备份，关于iframe，如果switch_to_frame不管用，那么直接拿iframe tag里的src属性的URL去访问得了。

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@stu.sufe.edu

import sys
sys.path.append("../")

import time
import logging
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

from base import BaseCrawler

# Initialize a logger
def initialize_logger(file_path, mode = 'w'):
	logger = logging.getLogger()
	logger.setLevel(logging.INFO)
	formatter = logging.Formatter("%(asctime)s | %(filename)s | %(levelname)s | %(message)s")
	file_handler = logging.FileHandler(file_path, mode=mode, encoding="utf8")
	file_handler.setFormatter(formatter)
	logger.addHandler(file_handler)
	console = logging.StreamHandler()
	console.setLevel(logging.INFO)
	console.setFormatter(formatter)
	logger.addHandler(console)
	return logger

# Terminate the given logger
def terminate_logger(logger):
	for handler in logger.handlers[:]:
		logger.removeHandler(handler)


class ESG(BaseCrawler):
	url_host = "https://i-esg/"
	url_report = url_host + "esg/esgReport"

	xpaths = {
		"report_table": "//table[@class=\"vxe-table--header\"]",	# <table> which lists all the PDF reports
		"report_link": "//a[@class=\"line-clamp-1\"]",	# <a> which links to the PDF reports
		"download_div": "//viewer-download-controls[@id=\"download\"]",	# <div> which contains the download button 
		
		"next_page_icon": "//i[@class=\"vxe-pager--btn-icon vxe-icon-arrow-right\"]",	# Icon which is clicked to the turning page
		"current_page_button": "//button[@class=\"vxe-pager--num-btn is--active\"]",	# Button which show the current page number (unique)
		"other_page_button": "//button[@class=\"vxe-pager--num-btn\"]",	# Button which show other page numbers (many)
		"download_button": "//cr-icon-button[@id=\"download\"]",	# Button which is clicked to download
		"close_icon": "//span[@class=\"is-icon-close\"]",	# Icon which is clicked to close
		"alert_div": "//div[@slot=\"body\"]",	# <div> which contains the alerting text which indicates that the PDF is not successfully loaded
		"alert_button": "//cr-button[@class=\"action-button\"]",
		"pdf_id_span": "//span[@id=\"title\"]",
		"pdf_iframe": "//iframe[@class=\"vc-iframe-page\"]",	# <iframe> which contains the report PDF content
	}
	
	def __init__(self):
		pass

	def run(self):
		# Start Chromedriver
		driver = self.initialize_driver(browser = "chrome",
										headless = False,
										timeout = 60,
										)
		driver.get(self.url_report)	# Visit the report URL

		logging.info
		BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["report_table"])

		def _is_pdf_page_loaded(_driver):
			_flag_1 = _driver.find_element_by_xpath(self.xpaths["alert_div"]).is_displayed()
			_flag_2 = _driver.find_element_by_xpath(self.xpaths["download_button"]).is_displayed()
			return _flag_1 or _flag_2
		
		report_links = driver.find_elements_by_xpath(self.xpaths["report_link"])
		for report_link in report_links:
			report_link_html = report_link.get_attribute("outerHTML")
			report_title = self.tag_regex.sub(str(), report_link_html)
			logging.info(f"Click into report: {report_title}")
			report_link.click()	# Click to view the PDF pages
			while True:
				BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["pdf_iframe"])
				logging.info("Switch to iframe ...")
				pdf_iframe = driver.find_element_by_xpath(self.xpaths["pdf_iframe"])
				pdf_iframe_html = pdf_iframe.get_attribute("outerHTML")

				with open("iframe.html", 'w', encoding="utf8") as f:
					f.write(pdf_iframe_html)
				driver.switch_to_frame()

				driver.quit()
				exit()
				logging.info("  - ok!")

				time.sleep(5)
				# WebDriverWait(driver, 10).until(_is_pdf_page_loaded)
				
				flag_pdf_page_loaded = None
				try:
					alert_div = driver.find_element_by_xpath(self.xpaths["alert_div"])
					flag_pdf_page_loaded = False
					logging.info("Page fails to be loaded ...")
				except:
					try:
						download_button = driver.find_element_by_xpath(self.xpaths["download_button"])
						flag_pdf_page_loaded = True
						logging.info("Page loaded successfully ...")
					except:
						html = driver.page_source
						with open("debug.html", 'w', encoding="utf8") as f:
							f.write(html)
						logging.error("Cannot find alert box or download button, see debug.html for details")
				if flag_pdf_page_loaded:
					pdf_id_span = driver.find_element_by_xpath(self.xpaths["pdf_id_span"])
					pdf_id_span_html = pdf_id_span.get_attribute("outerHTML")
					pdf_id = self.tag_regex.sub(str(), pdf_id_span_html)
					logging.info(f"PDF id: {pdf_id}")
					logging.info("Click download button ...")
					driver.find_element_by_xpath(self.xpaths["download_button"]).click()	# Click the download button
					logging.info("  - ok!")
					break
				else:
					alert_div_html = alert_div.get_attribute("outerHTML")
					alert_information = self.tag_regex.sub(str(), alert_div_html)
					logging.info(f"Page alert: {alert_information}")
					logging.info("Click reload button ...")
					driver.find_element_by_xpath(self.xpaths["alert_button"]).click()
					logging.info("  - ok!")				
		driver.quit()

time_string = time.strftime("%Y%m%d%H%M%S")
logger = initialize_logger(f"esg_{time_string}.log")
esg = ESG()
esg.run()
terminate(logger)

20240705~20240706

无解的酷暑。每天最长的路，是从三门路骑车到学校的路，为了让这段路变得稍微可人一些，昨天我决定要早起——七点前出门，至少这样路上可以被晒得舒服些，后来我放弃了，笑死，根本起不来。
昨晚补力量训练，30箭步×8组（+20kg），因为实在太热，全部做的正向箭步，结束补2000米@4’54"。因为是刚从外面吃完饭回来，撑得胃很难受，做完力量整个身体都不舒服了，后面2000米跑得很痛苦。
今晚回来补5000米节奏@4’13"+1000米放松@4’19"，最后100多米跟嘉伟冲了一段，他今晚超神，8×400米+4×800米间歇，目测圈速至少是1’20"以内，卧槽，我赤膊跑完5000米感觉上半身都快烧起来了，他居然一个人顶了这么多组间歇，无情铁肺，无情铁腿。
XR这两天也顶了不少组400米间歇，昨天应该有6组，都基本上跑到1’20"上下，今天他也跟嘉伟顶了有6组的样子，小老弟忽然又变得勤奋起来了，我还是很看好他的，下半年能多一个主力，也就多一点机会。
最令人难以置信的还是SXY，昨晚在北京西路那边跑了5k@7’51"，虽然不快，但是明显是中间休息了很多次，配速最快已经跑到4’36"。然后今晚又在滨江北线跑了个往返，15.79k，刚好2个小时，今晚这个气温，连续这么长时间，不是嗑药就是疯了吧。

PPO：

∇ f ( x ) = f ( x ) ∇ log ⁡ f ( x ) \nabla f(x)=f(x)\nabla \log f(x) ∇f(x)=f(x)∇logf(x)

DQN -> TRPO -> PPO

DQN (2014)
- unstable & offline method
TRPO（2015）: Trust Region Policy Optimization
PPO

∇ θ J ( π θ ) = E τ ∼ π [ ∑ t = 0 T ∇ θ log ⁡ π θ ( a t ∣ s t ) G t ] G t = R t + γ R t + 1 + γ 2 R t + 2 + ⋯ = ∑ k = t T γ k − t R k \begin{split} \nabla_\theta J(\pi_\theta)=E_{\tau\sim \pi}\left[\sum_{t=0}^T\nabla_\theta\log \pi_\theta(a_t|s_t)G_t\right]\\ G_t=R_t+\gamma R_{t+1}+\gamma^2 R_{t+2} + \cdots = \sum_{k=t}^T\gamma^{k-t}R_k \end{split} ∇θJ(πθ)=Eτ∼π[t=0∑T∇θlogπθ(at∣st)Gt]Gt=Rt+γRt+1+γ2Rt+2+⋯=k=t∑Tγk−tRk

G t G_t Gt: reward-to-go (RTG).
- R k R_k Rk：表示时刻 k k k 的即时回报；
- 从当前时刻（ t t t）起，未来某个时间（ T T T）点之前的所有回报（reward）的累计和。
- 常用于策略梯度（policy gradient）中；
- 计算每个行动的优势函数时，RTG减去基线可以更有效地估算行动的价值

def compute_rtgs(self, batch_rews):
    # The rewards-to-go (rtg) per episode per batch to return.
    # The shape will be (num timesteps per episode)
    batch_rtgs = []
    # Iterate through each episode backwards to maintain same order
    # in batch_rtgs
    for ep_rews in reversed(batch_rews):
        discounted_reward = 0 # The discounted reward so far
        for rew in reversed(ep_rews):
            discounted_reward = rew + discounted_reward * self.gamma
            batch_rtgs.insert(0, discounted_reward)
    # Convert the rewards-to-go into a tensor
    batch_rtgs = torch.tensor(batch_rtgs, dtype=torch.float)
    return batch_rtgs

Advantage function.

A π ( s , a ) = Q π ( s , a ) − V ϕ k ( s ) A^\pi(s,a)=Q^\pi(s,a)-V_{\phi_k}(s) Aπ(s,a)=Qπ(s,a)−Vϕk(s)

def evaluate(self, batch_obs):
    # Query critic network for a value V for each obs in batch_obs.
    V = self.critic(batch_obs).squeeze()
    return V
  
# Calculate V_{phi, k}
V = self.evaluate(batch_obs)
# ALG STEP 5
# Calculate advantage
A_k = batch_rtgs - V.detach()

# Normalize advantages
A_k = (A_k - A_k.mean()) / (A_k.std() + 1e-10)

on policy vs. off policy

Policy gradient的方法，一般是on policy的，ppo通过importance sampling的方式，将其变为off policy

20240707

跟小姨学了两手，早上搞了些肉馅和淀粉粘了些肉圆，感觉还行其实，至少是实打实的肉。现在这个时段，不管是在学校，还是去外面吃，都是开盲盒，假期食堂薛定谔的快餐，而夏天外面的东西总也不是很新鲜，上上周去蜀地源吃就发现菌类成色不对了都，夏天菜是真的放不住，但也没辙，我觉得学校附近没有比邯郸路这家蜀地源更良心的店了，在这吃了四五年，基本上每周会去一次，至少没闹过肚子，emmm。
晚上想认真练一下，想试试能不能一口气跑完一次40分钟以内的10km，但是最后6圈不到就顶不住了，均配3’50"，后面乱七八糟跑了好几组，勉强凑了个10km。这种天气真的很难很难一口气跑完长距离，脱水脱得太狠，不补水很快心肺就扛不住，多少给个稍许凉快点儿的夜吧。
目前，7月第一周，总跑量51.3km，平均配速4’07"，间歇是主旋律，但实际上只有周三和今晚是穿了碳板训练，日常还是少穿碳板，减少依赖，夏天本来也跑不出什么成绩，穿碳板肯定就是想认真跑一下的，虽然这两个晚上都不是很满意，可惜今晚嘉伟没来，要不然有人一起可能是可以扛得下来的，但他昨天强度太大我也不想硬叫他出来跑。
LXY自从1号之后，连续6天没有跑操场，如果不是变性，就是在黑练，今晚好像是有在，给XR，YY，ZYY几个小家伙买了水，我大概只是顺带呗。虽然不带眼镜我连人都看不清楚，回来想补个谢谢，但又觉得太唐突，算了。
希望下周签证能顺利搞掉，虽然今年暑假wyl逮得很紧，但还是想先回去呆两天，把这个夏天的好东西先吃完，回来再安心吃土，emmm，对，大概就是这样。想想，这个夏天最怕的是再次受伤，各种意义上都是，大概不会有下一回，也不太想再有下一回，理想和现实总是要权衡的。

ADB杂记

（1）拨号
adb shell am start -a android.intent.action.CALL tel:<对方号码>
（2）发送短信
adb shell am start -a android.intent.action.SENDTO -d sms:<对方号码> --es sms_body <短信内容>
adb shell input keyevent 22
adb shell input keyevent 66
（3）判断是否黑屏？
输入：adb shell dumpsys window policy|findstr mShowingLockscreen
输出：mShowingLockscreen=true mShowingDream=false mDreamingLockscreen=true mDreamingSleepToken=null（正在处于锁屏）

V —— Verbose（最低，输出得最多）
D —— Debug
I —— Info
W —— Warning
E —— Error
F —— Fatal
S —— Silent（最高，啥也不输出）
按某级别过滤日志则会将该级别及以上的日志输出。
比如，命令：adb logcat *:W

（4）截屏
os.system('adb shell screencap -p /sdcard/1.png')
os.system('adb pull /sdcard/1.png')

（5）滑动

adb shell input swipe {x1} {y1} {x2} {y2} {duration}

"adb shell input keyevent {}",										 # keyevent事件 
"adb shell input swipe {} {} {} {}",								 # 滑动事件						
"adb shell screencap -p /sdcard/{}",								 # 截屏事件
"adb pull /sdcard/{}",												 # 加载图片至电脑
"adb shell dumpsys window policy|findstr mShowingLockscreen",		 # 判断是否黑屏

adb shell input swipe 539 1800 541 1800 1

adb shell input keyevent 26 # wakeup
adb shell input keyevent 82 # unlock

按键代码

电话键
KEYCODE_CALL 		拨号键 		5
KEYCODE_ENDCALL 	挂机键 		6
KEYCODE_HOME 		按键Home 	3
KEYCODE_MENU 		菜单键 		82
KEYCODE_BACK 		返回键 		4
KEYCODE_SEARCH 		搜索键 		84
KEYCODE_CAMERA 		拍照键 		27
KEYCODE_FOCUS 		拍照对焦键 	80
KEYCODE_POWER 		电源键 		26
KEYCODE_NOTIFICATION 	通知键 		83
KEYCODE_MUTE 		话筒静音键 	91
KEYCODE_VOLUME_MUTE 	扬声器静音键 	164
KEYCODE_VOLUME_UP 	音量增加键 	24
KEYCODE_VOLUME_DOWN 	音量减小键 	25
 
控制键
KEYCODE_ENTER 		回车键 			66
KEYCODE_ESCAPE 		ESC键 			111
KEYCODE_DPAD_CENTER 	导航键 确定键 		23
KEYCODE_DPAD_UP 	导航键 向上 		19
KEYCODE_DPAD_DOWN 	导航键 向下 		20
KEYCODE_DPAD_LEFT 	导航键 向左 		21
KEYCODE_DPAD_RIGHT 	导航键 向右 		22
KEYCODE_MOVE_HOME 	光标移动到开始键 	122
KEYCODE_MOVE_END 	光标移动到末尾键 	123
KEYCODE_PAGE_UP 	向上翻页键 		92
KEYCODE_PAGE_DOWN 	向下翻页键 		93
KEYCODE_DEL 		退格键 			67
KEYCODE_FORWARD_DEL 	删除键 			112
KEYCODE_INSERT 		插入键 			124
KEYCODE_TAB 		Tab键 			61
KEYCODE_NUM_LOCK 	小键盘锁 		143
KEYCODE_CAPS_LOCK 	大写锁定键 		115
KEYCODE_BREAK 		Break/Pause键 		121
KEYCODE_SCROLL_LOCK 	滚动锁定键 		116
KEYCODE_ZOOM_IN 	放大键 			168
KEYCODE_ZOOM_OUT 	缩小键 			169

组合键
KEYCODE_ALT_LEFT 	Alt+Left
KEYCODE_ALT_RIGHT 	Alt+Right
KEYCODE_CTRL_LEFT 	Control+Left
KEYCODE_CTRL_RIGHT 	Control+Right
KEYCODE_SHIFT_LEFT 	Shift+Left
KEYCODE_SHIFT_RIGHT 	Shift+Right

20240708

下午把护照先弄下来，然后再想法子去搞签证，只有两个多月的时间，时间搞得有点紧，但是本来也卡得很紧，有点没辙。
昨天又被GXJ大小姐逮到把柄，其实我已经很多天没有把鞋子放在隔间，昨天是单纯因为下去跑步忘记带卡，回来人走门锁，到一楼又找不到师傅帮我开门，我就直接回去了，想着今早过来再收拾，结果今早过来就发现鞋子全被塞到我抽屉里。关键我出电梯的时候还刚好碰到大小姐，做贼心虚地提速赶紧先到房间，结果已经迟了，巨难绷。
LXY开始发力了，今晚估计一共干了有15K的样子。今天真的有点不太想跑，感觉有点厌跑，确切地说是怕了这个天气。连续七八天蒸笼下顶强度，有些疲倦了。过去先跟嘉伟顶了800米，2’33"，这已经是我目前跑得最快的一个800米了，嘉伟则是更加可怕，这是他的第二组（2’30"），他的第一组跑了2’20"，真给整自闭了。后面补到5K，很疲累，实在不想再被折磨了。

LoRA: Low-Rank Adaption of large language models
- A random projection to a smaller subspace
- parameter efficient
  - PEFT
- https://arxiv/abs/2106.09685
实现细节上
- 是一个 adaptor of pretrained model
  - adaptor 是小的
  - pretrained model 是大的
    - large language models
    - large vision models
- freezes pre-trained model weights
- injects trainable rank decomposition matrices
  - into each layer of transformer Architecture
基本思想
- 对于 transformer，最为重要的 self attention module
  - Wq、Wk、Wv、Wo 表示 learnable query/key/value/output projection matrices
  - 将这些记为模型的参数 Φ \Phi Φ
- 在 full fine-tune （不进行任何的 freeze）时，model 会初始化为预训练好的权重 Φ \Phi Φ，最终 fine-tune 之后，调整为 Φ + Δ Φ \Phi+\Delta\Phi Φ+ΔΦ（基于反向传播和梯度下降）
  - 每一个下游任务都要学习对应的 Δ Φ \Delta\Phi ΔΦ（ ∣ Δ Φ ∣ = ∣ Φ ∣ |\Delta\Phi|=|\Phi| ∣ΔΦ∣=∣Φ∣）
- LoRA 作为一个 parameter efficient 将与（进一步）下游任务相关的 Δ Φ = Δ Φ ( Θ ) \Delta\Phi=\Delta\Phi(\Theta) ΔΦ=ΔΦ(Θ)，进一步编码（encode）为规模更小的参数 Θ \Theta Θ
  - ∣ Θ ∣ ≪ ∣ Δ Φ ∣ = ∣ Φ ∣ |\Theta| \ll |\Delta\Phi|=|\Phi| ∣Θ∣≪∣ΔΦ∣=∣Φ∣
  - LoRA 采用 low-rank representation 来 encode Δ Φ \Delta\Phi ΔΦ

W ∈ R A × B W\in\mathbb R^{A\times B} W∈RA×B

Δ W = W A W B , W A ∈ R A × r , W B ∈ R r × B \Delta W=W_AW_B,\\ W_A\in \mathbb R^{A\times r}, W_B\in \mathbb R^{r\times B} ΔW=WAWB,WA∈RA×r,WB∈Rr×B

最终的参数量由 $A\times B $ 降至 r × ( A + B ) r\times (A+B) r×(A+B)
- A=100，B=500，r=5
- 5*(100+500) / (100*500) == 3000/50000 == 6%
- 这就叫 Parameter efficiency

pseudo代码：

input_dim = 768  # e.g., the hidden size of the pre-trained model
output_dim = 768  # e.g., the output size of the layer
rank = 8  # The rank 'r' for the low-rank adaptation

W = ... # from pretrained network with shape input_dim x output_dim

W_A = nn.Parameter(torch.empty(input_dim, rank)) # LoRA weight A
W_B = nn.Parameter(torch.empty(rank, output_dim)) # LoRA weight B

# Initialization of LoRA weights
nn.init.kaiming_uniform_(W_A, a=math.sqrt(5))
nn.init.zeros_(W_B)

def regular_forward_matmul(x, W):
    h = x @ W
return h

def lora_forward_matmul(x, W, W_A, W_B):
    h = x @ W  # regular matrix multiplication
    h += x @ (W_A @ W_B) * alpha # use scaled LoRA weights
return h

20240709

独自顶了一晚强度，3K@345+1K@348+1K@331+2K@408+2K@407+1K@336，第一个3000米跑得非常满意（11分16秒）。今晚穿的两年前的飞飙361，前掌碳板已经断了，用的后跟跑。想不到现在后跟跑也挺快的，感觉并没有怎么用力就能轻松跑到350以内，可能天天跑强度确实还是有点提升的。
嘉伟和XR今天都是跑休，LXY依然是12K向上的量，有点可怕，一出手就完全不知疲倦的，安迪最近跑得也很多。
我发现最近不是怕天气热，是怕跟嘉伟一起跑。因为跟他一起，我肯定就不在自己的节奏上，需要顶全力，完全无法保留，就非常难受，尤其在这种天气下。今天没人约束我，感觉很畅快，可以自由支配节奏，但是最近跑完都大腿一直感觉是要抽筋，而且飞飙用后跟跑不太稳，它前后掌是分开的，过渡的时候鞋底有点不太稳。
PS：破事尽快搞定，感觉也不用找代理，DIY也能解决，明天先去久事确认一下需要的材料，文体签应该比旅游签容易得多。

ESG脚本备份（目前的问题是用JS操纵滚轮有时候滚不到适合的位置，暂时只有很硬的解决方案，不知道怎么写得软一点）：

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@stu.sufe.edu

import sys
sys.path.append("../")

import time
import logging
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

from base import BaseCrawler

# Initialize a logger
def initialize_logger(file_path, mode = 'w'):
	logger = logging.getLogger()
	logger.setLevel(logging.INFO)
	formatter = logging.Formatter("%(asctime)s | %(filename)s | %(levelname)s | %(message)s")
	file_handler = logging.FileHandler(file_path, mode=mode, encoding="utf8")
	file_handler.setFormatter(formatter)
	logger.addHandler(file_handler)
	console = logging.StreamHandler()
	console.setLevel(logging.INFO)
	console.setFormatter(formatter)
	logger.addHandler(console)
	return logger

# Terminate the given logger
def terminate_logger(logger):
	for handler in logger.handlers[:]:
		logger.removeHandler(handler)


class ESG(BaseCrawler):
	url_host = "https://i-esg/"
	url_report = url_host + "esg/esgReport"

	xpaths = {
		"report_table": "//table[@class=\"vxe-table--header\"]",	# <table> which lists all the PDF reports
		"report_link": "//a[@class=\"line-clamp-1\"]",	# <a> which links to the PDF reports
		"download_div": "//viewer-download-controls[@id=\"download\"]",	# <div> which contains the download button 
		"next_page_icon": "//i[@class=\"vxe-pager--btn-icon vxe-icon-arrow-right\"]",	# Icon which is clicked to the turning page
		"current_page_button": "//button[@class=\"vxe-pager--num-btn is--active\"]",	# Button which show the current page number (unique)
		"other_page_button": "//button[@class=\"vxe-pager--num-btn\"]",	# Button which show other page numbers (many)
		"download_button": "//cr-icon-button[@id=\"download\"]",	# Button which is clicked to download
		"close_icon": "//div[@class=\"vc-tabs__item is-top is-active is-closable\"]//span[@class=\"is-icon-close\"]",	# Icon which is clicked to close
		"alert_div": "//div[@slot=\"body\"]",	# <div> which contains the alerting text which indicates that the PDF is not successfully loaded
		"alert_button": "//cr-button[@class=\"action-button\"]",	# Button which is clicked to reload the page in alert 
		"pdf_id_span": "//span[@id=\"title\"]",	# <span> which contains the pdf id (i.e. filename)
		"pdf_iframe": "//iframe[@class=\"vc-iframe-page\"]",	# <iframe> which contains the report PDF content
		"scroll_to_top_icon": "//i[@class=\"vc-icon ico-bx:arrow-to-top iconfont\"]",	# Icon which is clicked to return to the top of the report table
		# As report entries (i.e. <tr> in <table>) in one page cannot be all visible in the browser,
		# so that you need to scroll to somewhere to make the remained entries visible.
		# The following XPaths is used to indicate where to scroll
		# As for page k (k = 1, 2, ..., here k means the times when we visit the page, not the page number), the row ids range from 12 + (k - 1) * 50 ~ 61 + (k - 1) * 50, totally 50 on each page
		# //*[@id="root"]/section/section/div/div/div[2]/div[2]/div/div[1]/div[1]/div[2]/div[1]/div[2]/table/tbody/tr
		# "intermediate_tr_formatter": "//tr[@rowid=\"row_{}\"]".format,	# Old XPath, which is deprecated
		"intermediate_tr_formatter": "//tbody/tr[{}]".format,	# range from 1 to 50, i.e. tr[1]-tr[50]
	}
	
	def __init__(self):
		pass


	def run(self,
			start_page = 1,
			start_tr = 12,
			):
		# Start Chromedriver
		driver = self.initialize_driver(browser = "chrome",
										headless = False,
										timeout = 60,
										)
		driver.get(self.url_report)	# Visit the report URL
		BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["report_table"])	# Check if the table 
		
		with open("pdf_url_list.txt", 'w', encoding="utf8") as f:
			f.write("title\turl\n")

		current_page = 0	# page number: 1, 2, ...
		n_scroll = 4

		skip_flag = True
		
		while True:
			current_page += 1	# Ture over the page
			logging.info(f"Current page: {current_page}")
			if current_page >= start_page:
				if current_page > start_page:
					skip_flag = False
				report_links = driver.find_elements_by_xpath(self.xpaths["report_link"])
				n_links = len(report_links)
				logging.info(f"{n_links} entries on page {current_page}")
				scroll_at = [n_links // n_scroll * x for x in range(1, n_scroll + 1)]
				for i, report_link in enumerate(report_links):
					if skip_flag and i < start_tr:
						continue
					if i in scroll_at:
						logging.info(f"  - Scrolling at tr {i} ...")
						if not i == scroll_at[-1]:
							intermediate_tr = driver.find_element_by_xpath(self.xpaths["intermediate_tr_formatter"](min(n_links, i)))	# i - 2 is from empirical practice
						else:
							intermediate_tr = driver.find_element_by_xpath(self.xpaths["next_page_icon"])
						driver.execute_script("arguments[0].scrollIntoView(true);", intermediate_tr)
						logging.info("  - ok!")
					report_link_html = report_link.get_attribute("outerHTML")
					report_title = self.tag_regex.sub(str(), report_link_html)
					logging.info(f"Click into report: {report_title}")
					report_link.click()	# Click to view the PDF pages
					try:
						BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["pdf_iframe"])
						is_pdf = True
					except:
						is_pdf = False
					if is_pdf:
						pdf_iframe = driver.find_element_by_xpath(self.xpaths["pdf_iframe"])
						pdf_iframe_html = pdf_iframe.get_attribute("outerHTML")
						pdf_iframe_soup = BeautifulSoup(pdf_iframe_html, "html.parser")
						pdf_iframe_url = pdf_iframe_soup.find("iframe").attrs["src"]
						time.sleep(2)
					logging.info("Close report ...")
					driver.find_element_by_xpath(self.xpaths["close_icon"]).click()	# Close the report
					logging.info("ok!")
					time.sleep(2)
					with open("pdf_url_list.txt", 'a', encoding="utf8") as f:
						f.write(f"{report_title}\t{pdf_iframe_url}\n")

			logging.info("Scrolling to the bottom ...")
			next_page_icon = driver.find_element_by_xpath(self.xpaths["next_page_icon"])
			driver.execute_script("arguments[0].scrollIntoView(true);", next_page_icon)
			logging.info("ok!")
			time.sleep(2)
			logging.info("Click to the next page ...")
			next_page_icon.click()
			logging.info("ok!")
			time.sleep(2)
			logging.info("Scroll to the top ...")
			driver.find_element_by_xpath(self.xpaths["scroll_to_top_icon"]).click()
			logging.info("  - ok!")
			time.sleep(2)	
			# TODO: How to define termination conditions?


		driver.quit()

time_string = time.strftime("%Y%m%d%H%M%S")
logger = initialize_logger(f"./logging/esg_{time_string}.log")
esg = ESG()
esg.run(start_page = 2,
		start_tr = 0,
		)
terminate(logger)

20240710

晚上前一秒还是晴天，后一秒直接狂风骤起，倾盆大雨，不得兴。
白天去久事的路上，感觉跟腱有点疼，顿感不妙，一个多月的恢复训练没出啥事，似又复发。本想休一日，但快九点还是去操场遛了会儿，刚好嘉伟也下课去走了两步，静香阿姨刚刚练完，大概是8组间歇吧，她明晚也要练，不过看起来明天一整天都是下雨。
5分配摇了会儿，感觉其实还好，应该不至于太坏，快7圈时突然变天，刚下的两滴还以为只是普通的小雨，想着雨战也是很痛快，结果没两秒钟豆大的雨滴直接就开始往脸上糊，狼狈不堪地逃到管理室，而且这阵雨很长时间都没停，最后还是淋着雨赶回了实验楼。
有点太偏执跑量和配速了，总觉得有一天不跑就达不到200K，最近一段时间都太用力了，总是慢不下来，真的不好，还是要多跑休的。可惜明后开始明显降温了。
打个球呗 ~

最近看了一下面试题，这年头NLP都快沦为八股文了。

llama-7b + lora

# https 协议
!pip install -q git+https://github.com/huggingface/transformers.git
# ssh 协议
!pip install -q git+ssh://git@github.com/huggingface/transformers.git

确认transformers版本：

import transformers
transformers.__version__	# '4.30.0.dev0'

hf（huggingface）中使用 llama
llama => alpaca
lora on alpaca
inference：推理
- alpaca 标准 prompt 格式
https://github/tloen/alpaca-lora/blob/main/generate.py

from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig

# import os
# os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
# os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'

model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto",
)

"""
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/whaow/anaconda3/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
CUDA SETUP: CUDA runtime path found: /home/whaow/anaconda3/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/whaow/anaconda3/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]
"""

LlamaModel架构

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=31999)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear8bitLt(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
"""
LlamaTokenizer(name_or_path='decapoda-research/llama-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True)}, clean_up_tokenization_spaces=False)
"""

from peft import PeftModel
model = PeftModel.from_pretrained(model, "tloen/alpaca-lora-7b") # 可以输出打印，这个要更复杂一些

from peft import mapping
from peft.utils import other

print('model_type', model.config.model_type)
print(model.peft_config['default'].target_modules)

#默认的 target module
other.TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING

"""
model_type llama
['q_proj', 'k_proj', 'v_proj', 'o_proj']
{'t5': ['q', 'v'],
 'mt5': ['q', 'v'],
 'bart': ['q_proj', 'v_proj'],
 'gpt2': ['c_attn'],
 'bloom': ['query_key_value'],
 'blip-2': ['q', 'v', 'q_proj', 'v_proj'],
 'opt': ['q_proj', 'v_proj'],
 'gptj': ['q_proj', 'v_proj'],
 'gpt_neox': ['query_key_value'],
 'gpt_neo': ['q_proj', 'v_proj'],
 'bert': ['query', 'value'],
 'roberta': ['query', 'value'],
 'xlm-roberta': ['query', 'value'],
 'electra': ['query', 'value'],
 'deberta-v2': ['query_proj', 'value_proj'],
 'deberta': ['in_proj'],
 'layoutlm': ['query', 'value'],
 'llama': ['q_proj', 'v_proj'],
 'chatglm': ['query_key_value']}
"""

alpaca examples:

def generate_prompt(instruction, input=None):
    if input:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

generation_config = GenerationConfig(
    temperature=1.5,
    # nucleus sampling
    top_p=0.8,
    num_beams=4,
)

def inference(instruction, input=None):
    prompt = generate_prompt(instruction, input)
#     print(prompt)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256
    )
    for s in generation_output.sequences:
        output = tokenizer.decode(s)
        print("Response:", output.split("### Response:")[1].strip())

inference(input("Instruction: "))

"""
Instruction: tell me some jokes.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
tell me some jokes.

### Response:
Response: Q: Why don't scientists trust atoms?
A: Because they make up everything!

Q: What do you call a bear with no teeth?
A: A gummy bear!

Q: What did the fish say when it swam into a wall?
A: Dam!
"""

20240711

今晚可能是这个夏天最凉爽的夜，讲道理应该要跑个成绩看看，但是静香阿姨那边约了个同济经管的EMBA来跑变速课表（这回可以大致估计静香的年龄了，首先EMBA是35岁以上才能报，她还管静香叫姐，那么…），而且伤痛也有点不太明确，还是放弃了测成绩的计划。
先陪嘉伟干了一段（其实算是热身，我俩折衷了一下，我想4分配遛3~5km，他想跑800米间歇），我是1500米@3’33"，嘉伟顶到了2000米（最后两圈1’16"+1’12"）。这段是我带他跑，我的状态很好，并不吃力，全程基本稳定在3’35"~3’40"的配速，天气一旦凉快下来，感觉3’40"以内的配速就不算太难跑，第4圈嘉伟提速甩开了我，我试图跟上但很快不支，但是感觉上稳定在3’40"的配速，我有把握至少能坚持到10圈。
后面带静香和Maggie跑变速，课表[(3min@4’10"+2min@5’30")×6小组]×2大组，大组间休息10分钟（慢跑恢复），这个对我和嘉伟当然是很简单，不过我们确也很久不跑有氧，难得凉快的天气跑点长距离有氧也是不错的。嘉伟带队头，我压队尾，后来XR过来跟了一段。Maggie第2大组跑了2组之后就崩了，实力确实差了一些。静香阿姨确实很强（她是代表交大安泰出战戈赛，目标是要破三，而且有私人教练），完整地跑完课表，最后一组嘉伟带到3’55"以内，一共差不多是12km左右，自我感觉不算特别轻松，因为最近很少跑这么长的距离，时间长了还是有点吃力。（中途下了一阵大雨，不过没昨晚突然，勉强还能顶一顶）

休两天，不能太抱有侥幸心理。

CLIP 模型的主要意义在于其跨模态学习能力，即能同时处理和理解图像及其文本描述。这种能力使得 CLIP 在处理视觉任务时不仅局限于固定的数据集和预定义的类别，而是能够理解在训练时未曾见过的概念或对象。
此外，CLIP 可以使用自然语言描述来进行零样本学习（zero-shot learning），即直接使用文本描述来进行图像识别，而不需要额外的模型训练。

零样本学习（zero-shot learning）是指模型尝试预测在训练数据中未出现过一次的类别。
例如，经过对狗和猫进行分类训练的图像分类器有望在我们赋予它的任务上表现出色，即对狗和猫进行分类。我们通常不会期望经过对狗和猫进行训练的机器学习模型能够很好地检测浣熊。而 CLIP 往往在它们没有直接接受过训练的任务上表现良好，这被称为 “零样本学习”。

CLIP 模型包括两个主要的组成部分：一个图像编码器和一个文本编码器，这两者共同工作来将图像和文本映射到一个共同的特征空间中。

图像编码器

通常使用卷积神经网络（CNN）或 Vision Transformer（ViT）架构。这些编码器被训练来处理图像数据，提取重要的视觉特征。

文本编码器

通常基于 Transformer 架构，设计用于处理文本数据。这些编码器被训练来处理文本数据，提取重要的文本特征。

两个编码器都输出嵌入向量（即高维特征表示），这些向量随后通过对比损失函数进行优化，确保图像与其相应的文本描述在特征空间中彼此接近，而与不相关文本的距离则较远。通过这种方式，CLIP 学习如何将图像和文本对齐到同一特征空间，实现跨模态的理解和处理。

假设给定一批 N 个图像和相应的文本描述，会生成 N*N 个图像和文本对，在这些对中，N 对应该具有较高的余弦相似度，而其余 N²-N 个不正确的配对应该具有较低的余弦相似度。

首先，我们通过图像编码器（ViT 或 ResNet 模型）以获取尺寸为 NxI 的图像嵌入。将文本通过文本编码器以获取尺寸为 NxT 的文本嵌入。

为了测量它们在表示上的相似性，我们希望对图像的嵌入和相应的文本的嵌入进行点积。但这两个向量分别是 I 维和 T 维。为了使它们达到相同的维度，我们引入了两个投影（线性）层，一个用于图像，一个用于文本，使它们达到相同的维度 D。经过投影层后，我们将得到两个形状为 NxD 的矩阵。

接下来，将两个矩阵相乘，从而得到一个 BxB 矩阵，其中行表示图像，列表示文本，其值代表了图像（embedding）与文本（embedding）的相似性。

CLIP 损失函数

我们知道，我们希望相应图像和文本的向量对齐。这意味着点积必须尽可能接近（矩阵中的对角线元素） 1。对于其他所有内容，我们需要将其推向 0。

因此，对于给定的标题，我们对所有图像的点积取 softmax，然后取交叉熵损失。

同样，对于给定的图像，我们对所有标题重复该过程。

接下来，我们对这两个损失取平均值。然后我们通过反向传播来更新权重。这就是 CLIP 的构建和训练方式。

def contrastive_loss(logits, dim):
    neg_ce = torch.diag(F.log_softmax(logits, dim=dim))
    return -neg_ce.mean()
    
def clip_loss(similarity: torch.Tensor) -> torch.Tensor:
    caption_loss = contrastive_loss(similarity, dim=0)
    image_loss = contrastive_loss(similarity, dim=1)
    return (caption_loss + image_loss) / 2.0

def metrics(similarity: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
    y = torch.arange(len(similarity)).to(similarity.device)
    img2cap_match_idx = similarity.argmax(dim=1)
    cap2img_match_idx = similarity.argmax(dim=0)

    img_acc = (img2cap_match_idx == y).float().mean()
    cap_acc = (cap2img_match_idx == y).float().mean()

    return img_acc, cap_acc

这种损失使得CLIP天然地可以应用于文图检索，如果与图像生成模型结合（如DALL·E），可以构建文生图

20240712

晚上8组箭步×30个（+20kg），正反各4组，反向的做得能更流畅一些了，减少对中间双脚站立过渡的依赖，最后两组没有间隔，做到力竭，现在20kg的壶铃基本上不算太吃力吧，补五圈慢跑放松，XR晚上大概跑了有8K多吧，他的速耐其实未必比我差（400间歇，8组，他都能达到1’15"上下的水平，相当不错了），但是缺跑量，有氧水平跟我还是有差距的，高百前还是期望他能达到我的水平。
跟腱情况并不是很乐观，需要一段时间过渡，还是不能操之过急。
PS：XR每次拍照都喜欢仰着头，emmm

pip install peft

使用源码安装可以自定义一些配置

pip install git+https://github/huggingface/peft.git

在Transformers中使用PEFT模型：

🤗 Transformers natively supports some PEFT methods, meaning you can load adapter weights stored locally or on the Hub and easily run or train them with a few lines of code. The following methods are supported:

Low Rank Adapters
IA3
AdaLoRA

If you want to use other PEFT methods, such as prompt learning or prompt tuning, or about the 🤗 PEFT library in general, please refer to the documentation.

使用PEFT的adapters

from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id)

You can load a PEFT adapter with either an AutoModelFor class or the base model class like OPTForCausalLM or LlamaForCausalLM.

You can also load a PEFT adapter by calling the load_adapter method:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "facebook/opt-350m"
peft_model_id = "ybelkada/opt-350m-lora"

model = AutoModelForCausalLM.from_pretrained(model_id)
model.load_adapter(peft_model_id)

量化quantize 4b / 8b

The bitsandbytes integration supports 8bit and 4bit precision data types, which are useful for loading large models because it saves memory (see the bitsandbytes integration guide to learn more). Add the load_in_8bit or load_in_4bit parameters to [~PreTrainedModel.from_pretrained] and set device_map="auto" to effectively distribute the model to your hardware:

from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True)

添加一个新的adapters

You can use [~peft.PeftModel.add_adapter] to add a new adapter to a model with an existing adapter as long as the new adapter is the same type as the current one. For example, if you have an existing LoRA adapter attached to a model:

from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import LoraConfig

model_id = "facebook/opt-350m"
model = AutoModelForCausalLM.from_pretrained(model_id)

lora_config = LoraConfig(
    target_modules=["q_proj", "k_proj"],
    init_lora_weights=False
)

model.add_adapter(lora_config, adapter_name="adapter_1")

To add a new adapter:

# attach new adapter with same config
model.add_adapter(lora_config, adapter_name="adapter_2")

Now you can use [~peft.PeftModel.set_adapter] to set which adapter to use:

# use adapter_1
model.set_adapter("adapter_1")
output = model.generate(**inputs)
print(tokenizer.decode(output_disabled[0], skip_special_tokens=True))

# use adapter_2
model.set_adapter("adapter_2")
output_enabled = model.generate(**inputs)
print(tokenizer.decode(output_enabled[0], skip_special_tokens=True))

启用adapters权重，直接设置参数init_lora_weights 即可

Once you’ve added an adapter to a model, you can enable or disable the adapter module. To enable the adapter module:

from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import PeftConfig

model_id = "facebook/opt-350m"
adapter_model_id = "ybelkada/opt-350m-lora"
tokenizer = AutoTokenizer.from_pretrained(model_id)
text = "Hello"
inputs = tokenizer(text, return_tensors="pt")

model = AutoModelForCausalLM.from_pretrained(model_id)
peft_config = PeftConfig.from_pretrained(adapter_model_id)

# to initiate with random weights
peft_config.init_lora_weights = False

model.add_adapter(peft_config)
model.enable_adapters()
output = model.generate(**inputs)

To disable the adapter module:

model.disable_adapters()
output = model.generate(**inputs)

训练PEFT的adapters

PEFT adapters are supported by the [Trainer] class so that you can train an adapter for your specific use case. It only requires adding a few more lines of code. For example, to train a LoRA adapter:

If you aren’t familiar with fine-tuning a model with [Trainer], take a look at the Fine-tune a pretrained model tutorial.

Define your adapter configuration with the task type and hyperparameters (see [~peft.LoraConfig] for more details about what the hyperparameters do).

from peft import LoraConfig

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

Add adapter to the model.

model.add_adapter(peft_config)

Now you can pass the model to [Trainer]!

trainer = Trainer(model=model, ...)
trainer.train()

To save your trained adapter and load it back:

model.save_pretrained(save_dir)
model = AutoModelForCausalLM.from_pretrained(save_dir)

在PEFT的adapters中添加可训练的层

You can also fine-tune additional trainable adapters on top of a model that has adapters attached by passing modules_to_save in your PEFT config. For example, if you want to also fine-tune the lm_head on top of a model with a LoRA adapter:

from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import LoraConfig
model_id = "facebook/opt-350m"
model = AutoModelForCausalLM.from_pretrained(model_id)
lora_config = LoraConfig(
    target_modules=["q_proj", "k_proj"],
    modules_to_save=["lm_head"],
)
model.add_adapter(lora_config)

20240713

早上下雨，SXY还是跟个疯子一样去世纪公园跑完了线上半马，2小时33分，真不知道说啥好。忽然想到一个词，又菜又爱玩。疯够了就消停一阵呗，淋雨跑了将近两个小时，真不把自己当回事。
中午雨停之后天际一抹白，很敞亮。今天新食堂关门，主校区只剩绿叶和清真，日子越发难过，中午绿叶还有5元一条的红烧鱼吃到饱，晚上两个地方都已经没啥能吃的了。想想还是出去吃，临走前去操场跑了会儿，LXY时隔多日再次出现（后来才知道是最近发烧了，躺了两天，然后今天就跑了17km，跑得还很快，第一个4km应该是440的配速，后面都是跟YZZ一起跑了12km多，5分多，还环校了一圈，有点羡慕说实话，但是真没勇气再去一起跑了）。
场上人意外地多，有跑团在训练，不像是MBA戈赛的队伍，应该是外来的，水平一般，有两个大哥超过我的时候，还很大声地说了句现在配速3’47"，感觉像是故意炫耀给人听，可惜我穿的是拖鞋，只跟了他俩一圈，左脚底太滑保持不了稳定，不然穿拖鞋拉爆他俩，给他们一点小小的上财震撼。
最后拖鞋4000米@4’24"，心率倒是很低，只有130bpm，不知道什么情况，感觉今天还行，明天最后一个凉快日，状态好的话，想测一下万米或者5000米的成绩，摸个底，激励一下也是好的。

一个小trick，批量生成变量

createVar = locals()
for i in range(10):
    createVar["Li" + str(i)] = [i]
    print(createVar["Li" + str(i)])
    print(f"Li{i} =", createVar["Li" + str(i)])

ADB键位代码（二）：

小键盘
KEYCODE_NUMPAD_0 		小键盘按键'0'
KEYCODE_NUMPAD_1 		小键盘按键'1'
KEYCODE_NUMPAD_2 		小键盘按键'2'
KEYCODE_NUMPAD_3 		小键盘按键'3'
KEYCODE_NUMPAD_4 		小键盘按键'4'
KEYCODE_NUMPAD_5 		小键盘按键'5'
KEYCODE_NUMPAD_6 		小键盘按键'6'
KEYCODE_NUMPAD_7 		小键盘按键'7'
KEYCODE_NUMPAD_8 		小键盘按键'8'
KEYCODE_NUMPAD_9 		小键盘按键'9'
KEYCODE_NUMPAD_ADD 		小键盘按键'+'
KEYCODE_NUMPAD_SUBTRACT 	小键盘按键'-'
KEYCODE_NUMPAD_MULTIPLY 	小键盘按键'*'
KEYCODE_NUMPAD_DIVIDE 		小键盘按键'/'
KEYCODE_NUMPAD_EQUALS 		小键盘按键'='
KEYCODE_NUMPAD_COMMA 		小键盘按键','
KEYCODE_NUMPAD_DOT 		小键盘按键'.'
KEYCODE_NUMPAD_LEFT_PAREN 	小键盘按键'('
KEYCODE_NUMPAD_RIGHT_PAREN 	小键盘按键')'
KEYCODE_NUMPAD_ENTER 		小键盘按键回车
 
功能键
KEYCODE_F1 	按键F1
KEYCODE_F2 	按键F2
KEYCODE_F3 	按键F3
KEYCODE_F4 	按键F4
KEYCODE_F5 	按键F5
KEYCODE_F6 	按键F6
KEYCODE_F7 	按键F7
KEYCODE_F8 	按键F8
KEYCODE_F9 	按键F9
KEYCODE_F10 	按键F10
KEYCODE_F11 	按键F11
KEYCODE_F12 	按键F12

多媒体键
KEYCODE_MEDIA_PLAY 		多媒体键 播放
KEYCODE_MEDIA_STOP 		多媒体键 停止
KEYCODE_MEDIA_PAUSE 		多媒体键 暂停
KEYCODE_MEDIA_PLAY_PAUSE 	多媒体键 播放/暂停
KEYCODE_MEDIA_FAST_FORWARD 	多媒体键 快进
KEYCODE_MEDIA_REWIND 		多媒体键 快退
KEYCODE_MEDIA_NEXT 		多媒体键 下一首
KEYCODE_MEDIA_PREVIOUS 		多媒体键 上一首
KEYCODE_MEDIA_CLOSE 		多媒体键 关闭
KEYCODE_MEDIA_EJECT 		多媒体键 弹出
KEYCODE_MEDIA_RECORD 		多媒体键 录音

手柄按键
KEYCODE_BUTTON_1 	通用游戏手柄按钮#1
KEYCODE_BUTTON_2 	通用游戏手柄按钮 #2
KEYCODE_BUTTON_3 	通用游戏手柄按钮 #3
KEYCODE_BUTTON_4 	通用游戏手柄按钮 #4
KEYCODE_BUTTON_5 	通用游戏手柄按钮 #5
KEYCODE_BUTTON_6 	通用游戏手柄按钮 #6
KEYCODE_BUTTON_7 	通用游戏手柄按钮 #7
KEYCODE_BUTTON_8 	通用游戏手柄按钮 #8
KEYCODE_BUTTON_9 	通用游戏手柄按钮 #9
KEYCODE_BUTTON_10 	通用游戏手柄按钮 #10
KEYCODE_BUTTON_11 	通用游戏手柄按钮 #11
KEYCODE_BUTTON_12 	通用游戏手柄按钮 #12
KEYCODE_BUTTON_13 	通用游戏手柄按钮 #13
KEYCODE_BUTTON_14 	通用游戏手柄按钮 #14
KEYCODE_BUTTON_15 	通用游戏手柄按钮 #15
KEYCODE_BUTTON_16 	通用游戏手柄按钮 #16
KEYCODE_BUTTON_A 	游戏手柄按钮 A
KEYCODE_BUTTON_B 	游戏手柄按钮 B
KEYCODE_BUTTON_C 	游戏手柄按钮 C
KEYCODE_BUTTON_X 	游戏手柄按钮 X
KEYCODE_BUTTON_Y 	游戏手柄按钮 Y
KEYCODE_BUTTON_Z 	游戏手柄按钮 Z
KEYCODE_BUTTON_L1 	游戏手柄按钮 L1
KEYCODE_BUTTON_L2 	游戏手柄按钮 L2
KEYCODE_BUTTON_R1 	游戏手柄按钮 R1
KEYCODE_BUTTON_R2 	游戏手柄按钮 R2
KEYCODE_BUTTON_MODE 	游戏手柄按钮 Mode
KEYCODE_BUTTON_SELECT 	游戏手柄按钮 Select
KEYCODE_BUTTON_START 	游戏手柄按钮 Start
KEYCODE_BUTTON_THUMBL 	Left Thumb Button
KEYCODE_BUTTON_THUMBR 	Right Thumb Button

20240714

回顾了一遍如沐东风的《咕工智障》，无论是难度、机制、美工、剧情、以及游戏性与完整性，任何方面都挺难找到能与之匹敌的（唯一能相提并论的，可能只有童心佬的塔）。之前只是通关，二刷仔细读了一遍文案。

前两章描述了一个荒诞的信奉钥匙的宗教，战场的血腥与贵族的糜烂，似乎没什么关联，但似又和后续剧情有千丝万缕的联系，有一种蒙太奇的手法。从第三章开始，主角9922（一个人工智障）开始与剿灭人工只能的械灵战斗，引出人工智能与人类社会冲突的思考（AI的善与恶）。一直到第四章前篇结束，整个故事主旨应该都是在对阶层分化的批判，谴责上位者对下位者的压榨，但是到了第四章后篇，笔锋一转，矛头指向那些不争的下位者，怪物名称（沸物、魔宇打工人、被害妄想症、社交恐惧者）与区域名称（荒烟村、拓岩镇、华炳街、览伟桥、摩宇城）无不在暗示这一点，在人工智能的冲击下，人类被分为三六九等，肉食者与高级械灵统治着的九等公民与低级械灵，到第四章后篇的尾声，兔耳娘晴芸伤重不治，怪物变成了七宗罪（傲慢、嫉妒、暴怒、懒惰、贪婪、暴食、色欲），逐渐抽象，暗示9922也逐渐疯狂，最终BOSS奥卡斯特大帝和奥波卢克斯大帝，JS像素游戏居然也能打得如此惊心动魄。9922和2211这两个数字、以及兔耳娘晴芸贯穿整个游戏，深刻中又兼具些幽默诙谐，极具冲突的风格对比，似乎暗示这一切只是9922的一场梦。

整个故事没有结束，但是东风佬已经摸了，或许再继续造就发不出来了，也是一种留白吧。

PS：绿叶今天有苦瓜卖，巨难吃，量少还贵。晚上凑了10km，除了第一个3000米@345跑得还行，后面真是逊毙了，慢跑430都上到170心率，上身肌肉有点疼，感觉是没睡好，但是确实也挺闷，没那么凉快，算了算了。

关于llama2-7b与llama3-8b的对比：

首先llama2-7b有meta版本和huggingface转换的版本

llama2-7b
- https://huggingface.co/meta-llama/Llama-2-7b/tree/main
  - checkpoint = torch.load(ckpt_path, map_location="cpu")
  - model = Transformer(model_args)
  - model.load_state_dict(checkpoint, strict=False)
  - 自动地走 bfloat16（如果支持的话）
- https://huggingface.co/meta-llama/Llama-2-7b-hf/tree/main
  - pytorch_model.bin.index.json
https://github/meta-llama/llama3/blob/main/MODEL_CARD.md

llama2_id = 'meta-llama/Llama-2-7b-hf'
llama3_id = "meta-llama/Meta-Llama-3-8B"

# bfloat16: 2 byte
# 7b => 14GB
llama2 = AutoModelForCausalLM.from_pretrained(llama2_id, torch_dtype=torch.bfloat16, device_map='auto')

# 8b => 16GB 
llama3 = AutoModelForCausalLM.from_pretrained(llama3_id, torch_dtype=torch.bfloat16, device_map='auto')

自动地模型并行；

if device_map != "sequential":
    # Compute a `max_memory` dictionary for [`infer_auto_device_map`] that will balance the use of each available GPU.
    max_memory = get_balanced_memory(
        model,
        dtype=target_dtype,
        low_zero=(device_map == "balanced_low_0"),
        max_memory=max_memory,
        **device_map_kwargs,
    )

打印可学习的参数：

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_params = 0
    for _, param in model.named_parameters():
        all_params += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"trainable params: {trainable_params} || all params: {all_params} || trainable%: {100 * trainable_params / all_params}")

print_trainable_parameters(llama2)
# trainable params: 6738415616 || all params: 6738415616 || trainable%: 100.0
print_trainable_parameters(llama3)
# trainable params: 8030261248 || all params: 8030261248 || trainable%: 100.0

其实都是全部可学习的，只是模型参数量的区别。

模型结构也基本相同，只是对应层的节点数有区别：

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

V.S.

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
)

这里注意一个LlamaSdpaAttention，这个是特有的注意力结构

torch支持3种sdpa内核，flash attention、memory efficient和math，结果会有不同。核心在于flash attention会使用fused operations，此外浮点数的加法和乘法不一定支持结合律/分配律（浮点数乘加操作在fused operations下因为顺序不同会有结果的不同），但这些应该不会对输出质量产生大的影响。

attention mechanism
- X ∈ R ℓ × d X\in\mathbb R^{\ell\times d} X∈Rℓ×d
- W k ∈ R d × d k , W q ∈ R d × d k , W v ∈ R d × d v W_k\in\mathbb R^{d\times d_k},W_q\in\mathbb R^{d\times d_k},W_v\in\mathbb R^{d\times d_v} Wk∈Rd×dk,Wq∈Rd×dk,Wv∈Rd×dv
- Q = X W q ∈ R ℓ × d k , K = X W k ∈ R ℓ × d k , V = X W v ∈ R ℓ × d v Q=XW_q\in\mathbb R^{\ell\times d_k}, K=XW_k\in\mathbb R^{\ell\times d_k}, V=XW_v\in\mathbb R^{\ell\times d_v} Q=XWq∈Rℓ×dk,K=XWk∈Rℓ×dk,V=XWv∈Rℓ×dv

Attention ( Q , K , V ) = softmax ( Q K T d k ) V \text{Attention}(Q,K,V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V Attention(Q,K,V)=softmax(dk QKT)V

# self attn
self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
# GQA，llama2: 32*(4096/32) = 4096
# GQA，llama3: 8*(4096/32) = 1024
self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
self.o_proj = nn.Linear(self.hidden_size, self.hidden_size, bias=config.attention_bias)

# GQA
# llama2: 32/32
# llama3: 32/8 = 4, 4对1
self.num_key_value_groups = self.num_heads // self.num_key_value_heads

# mlp
self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=config.mlp_bias)

# mlp forward
# hf
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
# meta
self.w2(F.silu(self.w1(x)) * self.w3(x))

model	heads	layers	dim	head_dim
llama2-7b	32	32	4096	4096/32
llama2-13b	40	40	5120	5120/40
llama2-70b	64	80	8192	8192/64

model	heads	layers	dim	head_dim
llama3-8b	32	32	4096	4096/32

vocab_size (Embedding):
- llama2: 32000
- llama3: 128256
GQA (k_proj, v_proj)
- head_dim: hidden_size/num_heads
  - llama2: 4096/32 = 128
  - llama3: 4096/32 = 128
- llama2: 32*(4096/32) = 4096
- llama3: 8*128 = 1024 (k_proj, v_proj 可以看到一个 learnable parameters 的一个降低)

20240715

开了趟车，从浦东开到嘉定再开回学校，上海这破路是真难开，明显比家那边要窄，而且一些旧路经常过一条马路道就偏好多，不开导航不熟悉路，一不注意就要逆行。关键这副眼镜度数稍微差了一些，后视镜也看得不是很清楚，希望没吃罚单，emmm

天气又开始不那么可人，LXY依然13K以上的量，跟YZZ已是固定搭子，有人一起跑确是舒服的。晚上九点半到学校，小跑3000米@4’41“，七月上半月跑量101.3km，平均配速4‘12”，勉强破百。虽然不太愿意接受，但目前正处于即将伤痛的临界点，依然旧伤，右脚内踝上方两公分处，连带右跟腱的疼痛。最近一周，强度已经比七月第一周降低不少，但愿能够缓和，这个弱点确实很难根除了，虽然目前还不是太影响，而且最近一个半月也没有复发，但就怕在关键时刻狠狠背刺一刀。人总是这样，不见棺材不掉泪的。

PS：某人不也是？我要是女的，让我妈知道下雨不好好呆家里，搁外面像个疯子淋两小时雨，回来还感冒，她不得打断我的腿。意志精神难得，但行为真不鼓励，虽说再不疯一回就老了，但搁疯成这样，真当自己超人了是。

Andrew NG Agentic Reasoning

有些人是完全不排斥Prompt Engineering的，觉得里面还是有内涵和巧思，我也觉得这个事情更像是在

事实上，Prompt Learning并不是LLM之后才出来的概念，大约七八年前就已经有人在做Prompt Learning，一个很经典的结果（也是不少工作都得出的结论）就是最有效的Prompt是不符合语法规则的。

当时的一些结论还是多在简单的NLP任务（SA，NLI）这个结论跟目前的一些实证结果是有一些差别的，不过本身模型也不一样，但是至少目前的经验来看，高效的Prompt也确实不那么符合直觉上的语感。这就引出一个思考，我们一方面在对齐价值，希望得到符合人类价值（或者说，符合人类习惯）的LLMs，还是说我们只是想通过某种具有解释性的手法，将LLMs中所蕴含的知识给巧妙地勾出来。

显然我们希望的是前者，因为后者的方法随模型的训练的更迭大概率是会过期的，而且就现在的经验，Prompt的设计在不同基座上有差异，实话说其中的理论科学性并没有令人信服的论证。

Prompt Engineering (LLM-based agents)
- modern：LLMs重写一切
- effective
- for engineer & research
今年4月份的 Agentic 的演讲，6月份的 translation-agent（截止到目前4k的star）的一个具体实践；
- https://github/andrewyng/translation-agent
  - https://github/andrewyng/translation-agent/blob/main/src/translation_agent/utils.py（这个utils文件里记录了一些prompt上的设计）
workflow
- 复杂任务的分解和抽象；
  - step by steps 的完成一些相对较为简单的子任务要比 LLM 直出的完成一个复杂任务，更为简单而有效；
  - 现实世界人类经验的镜像；
Agentic Reasoning design patterns
- Reflection
- Tool use
- Planning
- Multi-Agent Collaboration

下面主要以一个translation的agent为例，其实也很简单，这边就是三步走，先翻译translation_1，然后老师给一个reflection，最后在汇总得到最后的结果translation_2，这就是一个工作流

prompt & workflow

prompt: 具体的业务要求；
(agentic) workflow：对应着一种分解和抽象；

def translate(
    source_lang,
    target_lang,
    source_text,
    country,
    max_tokens=MAX_TOKENS_PER_CHUNK,
):
    if ...
        final_translation = one_chunk_translate_text(
                source_lang, target_lang, source_text, country
            )
    
        return final_translation
    else:
        source_text_chunks = text_splitter.split_text(source_text)

        translation_2_chunks = multichunk_translation(
            source_lang, target_lang, source_text_chunks, country
        )

        return "".join(translation_2_chunks)

onechunk

one_chunk_initial_translation
one_chunk_reflect_on_translation
one_chunk_translate_text

def one_chunk_translate_text(
    source_lang: str, target_lang: str, source_text: str, country: str = ""
) -> str:
    translation_1 = one_chunk_initial_translation(
        source_lang, target_lang, source_text
    )

    reflection = one_chunk_reflect_on_translation(
        source_lang, target_lang, source_text, translation_1, country
    )
    translation_2 = one_chunk_improve_translation(
        source_lang, target_lang, source_text, translation_1, reflection
    )
    return translation_2

multichunk

def multichunk_translation(
    source_lang, target_lang, source_text_chunks, country: str = ""
):
        translation_1_chunks = multichunk_initial_translation(
            source_lang, target_lang, source_text_chunks
        )
    
        reflection_chunks = multichunk_reflect_on_translation(
            source_lang,
            target_lang,
            source_text_chunks,
            translation_1_chunks,
            country,
        )
    
        translation_2_chunks = multichunk_improve_translation(
            source_lang,
            target_lang,
            source_text_chunks,
            translation_1_chunks,
            reflection_chunks,
        )
    
        return translation_2_chunks

split chunks；
from langchain_text_splitters import RecursiveCharacterTextSplitter

def calculate_chunk_size(token_count: int, token_limit: int) -> int:
    """
    Calculate the chunk size based on the token count and token limit.

    Args:
        token_count (int): The total number of tokens.
        token_limit (int): The maximum number of tokens allowed per chunk.

    Returns:
        int: The calculated chunk size.

    Description:
        This function calculates the chunk size based on the given token count and token limit.
        If the token count is less than or equal to the token limit, the function returns the token count as the chunk size.
        Otherwise, it calculates the number of chunks needed to accommodate all the tokens within the token limit.
        The chunk size is determined by dividing the token limit by the number of chunks.
        If there are remaining tokens after dividing the token count by the token limit,
        the chunk size is adjusted by adding the remaining tokens divided by the number of chunks.

    Example:
        >>> calculate_chunk_size(1000, 500)
        500
        >>> calculate_chunk_size(1530, 500)
        389
        >>> calculate_chunk_size(2242, 500)
        496
    """

    if token_count <= token_limit:
        return token_count

    num_chunks = (token_count + token_limit - 1) // token_limit
    chunk_size = token_count // num_chunks

    remaining_tokens = token_count % token_limit
    if remaining_tokens > 0:
        chunk_size += remaining_tokens // num_chunks

    return chunk_size

调用：translate(source_lang, target_lang, source_text, country)

20240716

短暂的清凉，无尽的炙烤。下午出去办材料，热得快化了都，等红绿灯找不到荫蔽，急得像热锅（物理意义上）上的蚂蚁。

最近疯子挺多。今晚LXY有18K+，高温战神，要么不跑，要么就往死里跑，YZZ应该只陪了10K出头，他之前跑得不算多，这被操练一个夏天，下半年高百是不会放过他的，多一个壮丁是一个。（真给我看傻了，一月初那阵子看来是收着了，反正这种天气让我跑18K，还不如杀了我）

晚上下会，便装渐加速跑，5K@4’19"+2K@4‘17“，不算太吃力，但防止伤痛加剧，没有加量，感觉再多2K右脚踝就要疼。下半月主要以有氧为主，不强求速度，但尽量保持住月跑200K，这是全马备赛的底线。

今天和亦童探讨了一个问题，LLM+需求分析。试了一个让GLM画UML图，事实上现在在图生成中，嵌入文字始终是一个老大难问题，目前相对可行的方法，依然是通过加box类型的prompt框来指定位置加文字。不过需求分析的难度会更大一些，需通过阅读需求文档，直接把UML图给成体系的画出来，目前还没有相关的benchmark用来评测这种能力，但是感觉上fewshot应该是可行的，UML本质还是一个结构化的东西，至少在LLM之前，很多还是通过模板规则来解决需求分析，其实wyl之前自基就是做异构需求处理的，可惜时代变了。

JS使用XPath定位元素，evaluate的结果，需要使用.iterateNext方法把所有的定位结果都给匹配到

function selectWithXPath(xpath) {
  var results = document.evaluate(xpath, document, null, XPathResult.ANY_TYPE, null);
  var nodes = [];
  var item;
  
  while (item = results.iterateNext()) {
    nodes.push(item);
  }
  
  return nodes;
}
 
// 使用例子
var elements = selectWithXPath("//div[@class='my-class']");
elements.forEach(function(element) {
  console.log(element); // 这里可以操作定位到的DOM元素
});

关于RoPE在长文本处理的细节（转自数据派THU），最近有一篇paper（https://arxiv/abs/2405.14591）从数学角度指出更大的训练长度本身就应该选择更大的底数，与训练策略无关。

RoPE频率的计算公式 θ i = b − 2 i / d \theta_i=b^{-2i/d} θi=b−2i/d，底数 b b b默认 10000 10000 10000。目前长文本主流做法之一，先在 b = 10000 b=10000 b=10000上用短文本预训练，然后调大 b b b并在长文本微调，本身有较好长度外推性，换用更大的 b b b再微调相比不加改动的微调，起始损失更小，收敛也更快。

调大 b b b完全是因为先短后长的训练策略，如果一直都用长文本训练似乎就没必要调大 b b b了？显然并不是这样

RoPE本质是一个分块对角阵，每个对角块是一个旋转角度为 θ i \theta_i θi的旋转矩阵：

利用

给 q , k q,k q,k添加绝对位置信息

除了给模型注入位置信息外，期望 RoPE 能具备两个理想性质，以达到更好的效果：

远程衰减，即位置相近的 Token 平均来说获得更多的注意力；（确实可以做到）
语义聚合，即语义相似的 Token 平均来说获得更多的注意力。

所谓语义聚合，指的是当 k , q k,q k,q相近时，不管它们的相对距离 n − m n-m n−m多大，其注意力 q ⊤ R n − m k q^\top R_{n-m}k q⊤Rn−mk平均来说都应该更大（至少要比随机的两个Token更大）。为了得到一个量化的结论，我们进一步简化问题，假设 q q q的每个分量iid，均值 μ \mu μ，方差为 σ 2 \sigma^2 σ2。

现在我们考虑两种不同的 k k k：一种是在 q q q的基础上，加上一个零均值的扰动 ϵ \epsilon ϵ，记 k ~ = q + ϵ \tilde k=q+\epsilon k~=q+ϵ，代表跟 q q q语义相近的Token；另一种则是假设 k , q k,q k,q独立同分布，这代表两个随机的 Token。根据第二点理想性质，我们希望有

实际上(3)式左端可以恒等变形：

因此如果训练长度最大为 L L L，则 n − m ≤ L − 1 n-m\le L-1 n−m≤L−1，根据第二点理想性质，有近似描述：

L L L时最大长度（超参）， d d d是hidden_size（Llama设置为128），唯一可调地就是 b b b，它越大衰减速度越慢，对应地连续负区间越大，因此总是存在一个最小的 b b b使得不等式成立，即最优解

后面就是数值求解的过程了，一些仿真代码：Jax + GPU搜索


from functools import partial
import numpy as np
import jax.numpy as jnp
import jax

@partial(jax.jit, static_argnums=(2,))
def f(m, b, d=128):
    i = jnp.arange(d / 2)
    return jnp.cos(m[:, None] * b ** (-2 * i[None] / d)).sum(axis=1)

@np.vectorize
def fmin(L, b):
   return f(np.arange(L), b).min()

def bmin(L):
    B = 1000 * L
    for k in range(1, 6):
        bs = np.linspace(0, 1, 10**k + 1)[1:] * B  
        ys = fmin(L, bs)
        for b, y in zip(bs, ys):
            if y >= 0:
                B = b
                break
  return B
bmin(1024 * 128)

20240717

原来今天是LXY生日，怪不得昨晚跑那么多，怕不是昨晚就想跑25k（今天比昨天少一点，但也有16k多），不过也许是跑18k永远18岁呢。中午在群里请人去吃蛋糕，我实在没好意思腆着个B脸去商学院蹭吃蹭喝，而且我觉得空手去也尬得要死，历史上有且仅有过一次这种经验，而且还是失败的，已经不想再自取其辱了。

今晚带静香姐跑变速，[(4min快@4’10"+2min慢@5’30")×5小组]×2大组，大组间慢跑7分钟，全程65分钟没有休息，平均配速4’34"，平均心率166bpm，总距离14.26km。这是近一个月以来最长距离的一次有氧跑，相当满意的质量，而且伤痛没有发作。（静香姐真的猛啊，以后不叫她阿姨，就叫姐了，不愧是越野赛站台选手，今天这个量对我来说都不算轻松，跑完胸闷难耐，她居然都能扛下来，全程也没有休息，虽然她也是很痛苦）

嘉伟中间7分钟是完全静止的，以至于他跑完后5小组依然很轻松，甚至冲了一个2’30"的800米。XR和AX都萎得不行，前者跟了不到三组就掉了，AX也跟了不到四组就吐了，说实话有点失望，我觉得他俩至少也得顶完1个大组，这个强度真不高，只是天气热难跑些而已。我们真的能跑进高百总决赛吗？船也不是一天就能造起来的。

PS：最近看到好多人去日本玩，出国还是得去欧洲（美国也没啥意思，而且还难签），日本跟中国真差不了多少，没啥意思。

最近安迪有点上膘了，他以前腿很细的，可能是视角问题（bushi

ESG脚本终版，这个已经相当稳定了：

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@stu.sufe.edu

import sys
sys.path.append("../")

import time
import logging
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

from base import BaseCrawler

# Initialize a logger
def initialize_logger(file_path, mode = 'w'):
	logger = logging.getLogger()
	logger.setLevel(logging.INFO)
	formatter = logging.Formatter("%(asctime)s | %(filename)s | %(levelname)s | %(message)s")
	file_handler = logging.FileHandler(file_path, mode=mode, encoding="utf8")
	file_handler.setFormatter(formatter)
	logger.addHandler(file_handler)
	console = logging.StreamHandler()
	console.setLevel(logging.INFO)
	console.setFormatter(formatter)
	logger.addHandler(console)
	return logger

# Terminate the given logger
def terminate_logger(logger):
	for handler in logger.handlers[:]:
		logger.removeHandler(handler)


class ESG(BaseCrawler):
	url_host = "https://i-esg/"
	url_report = url_host + "esg/esgReport"

	xpaths = {
		"report_table": "//table[@class=\"vxe-table--header\"]",	# <table> which lists all the PDF reports
		"report_link": "//a[@class=\"line-clamp-1\"]",	# <a> which links to the PDF reports
		"download_div": "//viewer-download-controls[@id=\"download\"]",	# <div> which contains the download button 
		"next_page_icon": "//i[@class=\"vxe-pager--btn-icon vxe-icon-arrow-right\"]",	# Icon which is clicked to the turning page
		"current_page_button": "//button[@class=\"vxe-pager--num-btn is--active\"]",	# Button which show the current page number (unique)
		"other_page_button": "//button[@class=\"vxe-pager--num-btn\"]",	# Button which show other page numbers (many)
		"download_button": "//cr-icon-button[@id=\"download\"]",	# Button which is clicked to download
		"close_icon": "//div[@class=\"vc-tabs__item is-top is-active is-closable\"]//span[@class=\"is-icon-close\"]",	# Icon which is clicked to close
		"alert_div": "//div[@slot=\"body\"]",	# <div> which contains the alerting text which indicates that the PDF is not successfully loaded
		"alert_button": "//cr-button[@class=\"action-button\"]",	# Button which is clicked to reload the page in alert 
		"pdf_id_span": "//span[@id=\"title\"]",	# <span> which contains the pdf id (i.e. filename)
		"pdf_iframe": "//iframe[@class=\"vc-iframe-page\"]",	# <iframe> which contains the report PDF content
		"content_div": "//div[@class=\"zlibDetailWrapper\"]",	# Some reports are not PDF, you can directly find the content on the webpage
		"scroll_to_top_icon": "//i[@class=\"vc-icon ico-bx:arrow-to-top iconfont\"]",	# Icon which is clicked to return to the top of the report table
		# As report entries (i.e. <tr> in <table>) in one page cannot be all visible in the browser,
		# so that you need to scroll to somewhere to make the remained entries visible.
		# The following XPaths is used to indicate where to scroll
		# As for page k (k = 1, 2, ..., here k means the times when we visit the page, not the page number), the row ids range from 12 + (k - 1) * 50 ~ 61 + (k - 1) * 50, totally 50 on each page
		# //*[@id="root"]/section/section/div/div/div[2]/div[2]/div/div[1]/div[1]/div[2]/div[1]/div[2]/table/tbody/tr
		# "intermediate_tr_formatter": "//tr[@rowid=\"row_{}\"]".format,	# Old XPath, which is deprecated
		"intermediate_tr_formatter": "//tbody/tr[{}]".format,	# range from 1 to 50, i.e. tr[1]-tr[50]
	}
	
	def __init__(self):
		pass


	def run(self,
			start_page = 1,
			start_tr = 0,
			):
		global_interval = 3
		save_path = "./pdf_url_list.txt"
		# Start Chromedriver
		driver = self.initialize_driver(browser = "chrome",
										headless = False,
										timeout = 60,
										)
		driver.get(self.url_report)	# Visit the report URL
		BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["report_table"])	# Check if the table  
		with open(save_path, 'w', encoding="utf8") as f:
			f.write("title\turl\n")
	
		current_page = 0	# page number: 1, 2, ...
		skip_flag = True
		
		while True:
			current_page += 1	# Ture over the page
			logging.info(f"Current page: {current_page}")
			if current_page >= start_page:
				if current_page > start_page:
					skip_flag = False
				report_links = driver.find_elements_by_xpath(self.xpaths["report_link"])
				n_links = len(report_links)
				logging.info(f"{n_links} entries on page {current_page}")
				for i, report_link in enumerate(report_links):
					if skip_flag and i < start_tr:
						continue
					report_link_html = report_link.get_attribute("outerHTML")
					report_title = self.tag_regex.sub(str(), report_link_html)
					logging.info(f"Click into report: {report_title}")
					
					while True:
						# `report_link` may be invisible
						threshold = 1
						try:
							report_link.click()	# Click to view the PDF pages
							logging.info("  - Successful!")
							break
						except:
							threshold += 1
							logging.info(f"  - Failure: {report_link.is_displayed()} - {threshold}")
							driver.execute_script("arguments[0].scrollIntoView(true);", report_links[i - threshold])
					try:
						BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["pdf_iframe"], timeout=5)
						is_pdf = True
					except:
						is_pdf = False

					close_flag = True
					if is_pdf:
						pdf_iframe = driver.find_element_by_xpath(self.xpaths["pdf_iframe"])
						pdf_iframe_html = pdf_iframe.get_attribute("outerHTML")
						pdf_iframe_soup = BeautifulSoup(pdf_iframe_html, "html.parser")
						pdf_iframe_url = pdf_iframe_soup.find("iframe").attrs["src"]
						with open(save_path, 'a', encoding="utf8") as f:
							f.write(f"{report_title}\t{pdf_iframe_url}\n")
					else:
						logging.info("This is not a PDF")
						try:
							content_div =  driver.find_element_by_xpath(self.xpaths["content_div"])
							content_div_html = content_div.get_attribute("outerHTML")
							content_text = self.tag_regex.sub(str(), content_div_html)
							logging.info("  - Write to " + f"./notpdf/{report_title}.txt")
							with open(f"./notpdf/{report_title}.txt", 'w', encoding="utf8") as f:
								f.write(content_text)
							logging.info("  - ok!")
							with open(save_path, 'a', encoding="utf8") as f:
								f.write(f"{report_title}\tNone\n")
						except Exception as exception:
							exception_string = str(exception).replace('\n', ';')
							with open(save_path, 'a', encoding="utf8") as f:
								f.write(f"{report_title}\tException: {exception_string}\n")
							close_flag = False
					if close_flag:
						time.sleep(global_interval)
						logging.info("Close report ...")
						driver.find_element_by_xpath(self.xpaths["close_icon"]).click()	# Close the report
						logging.info("ok!")

			logging.info("Scrolling to the bottom ...")
			next_page_icon = driver.find_element_by_xpath(self.xpaths["next_page_icon"])
			driver.execute_script("arguments[0].scrollIntoView(true);", next_page_icon)
			logging.info("ok!")
			time.sleep(global_interval)
			logging.info("Click to the next page ...")
			next_page_icon.click()
			logging.info("ok!")
			time.sleep(global_interval)
			logging.info("Scroll to the top ...")
			while True:
				try:
					BaseCrawler.check_element_by_xpath(driver, xpath=self.xpaths["scroll_to_top_icon"])
					scroll_icon = driver.find_element_by_xpath(self.xpaths["scroll_to_top_icon"])
					break
				except:
					logging.info("  - `scroll_to_top_icon` not found!")
					next_page_icon = driver.find_element_by_xpath(self.xpaths["next_page_icon"])
					driver.execute_script("arguments[0].scrollIntoView(true);", next_page_icon)
					time.sleep(global_interval)
					driver.execute_script("arguments[0].scrollIntoView(true);", scroll_icon)
			scroll_icon.click()
			logging.info("  - ok!")
			time.sleep(global_interval)	
			# TODO: How to define termination conditions?
		driver.quit()

time_string = time.strftime("%Y%m%d%H%M%S")
logger = initialize_logger(f"./logging/esg_{time_string}.log")
esg = ESG()
esg.run(start_page = 93,
		start_tr = 48,
		)
terminate(logger)

上面这个只是抓PDF的URL的，然后通过下面的方法下载PDF：

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@stu.sufe.edu

import requests
import time
import random

def headers_to_dict(headers: str) -> dict:
	lines = headers.splitlines()
	headers_dict = {}
	for line in lines:
		key, value = line.strip().split(':', 1)
		headers_dict[key.strip()] = value.strip()
	return headers_dict


def download_pdf(url, filename = None):
	r = requests.get(url)
	if filename is None:
		filename = url.split('/')[-1]
	with open(f"./pdf/{filename}", "wb") as f:
		f.write(r.content)


with open("./pdf_url_list_2.txt", 'r', encoding="utf8") as f:
	lines = f.read().splitlines()

for i, line in enumerate(lines[1:]):
	if i < 13:
		continue
	print(i, line)
	title, url = line.split('\t')
	if url == "None" or url.startswith("Exception"):
		continue
	download_pdf(url, filename = None)
	time.sleep(random.randint(10, 30))

几个细节：

大部分是点进去查看PDF，少数点进去是HTML文档，需要自行复制页面内容，极少数点击会触发下载压缩包，里面是文档内容，需要设置浏览器自动下载避免繁琐操作。
- 关于ScrollIntoView方法，这个只能向下滚动，而不能向上滚动。想要向上滚动页面，得用window.scrollTo(0, -100)这样的负像素坐标，但是，针对ESG这种带iframe的嵌入页面里的滚动条，后者方法是不生效的，所以根本就没有办法（反正我是没找到）直接操控iframe嵌入页面的滚动条向上翻滚的。不过好在这个页面上有个按钮可以返回top，所以省了这破事。
- 最后一个繁琐的问题，目前将近500页的表格，一页页翻太慢了，尤其想从中间开始，就要翻很久，页面上没有找到比较简单的操作方法，尝试直接通过修改页面源代码的方法（即手动activate一个页面按钮）也是不生效的，暂时不知道有什么好办的方法，当然手动翻也就五页五页往后翻，也挺慢的。

20240718

老爹老娘最近不太叫人省心。

今日首蚌，前天晚上回去时去苹果园买水果。昨天早上出门，苹果园门口大喇叭喊着，店铺升级，全场88折，我心想也差不了多少，结果晚上回去已经变成68折（？？？），到今天早上再看已经是挂着全场5折的招牌，最近都是一次买一周的水果，吃到还剩一个就补仓，现在是不是该补波仓降降成本了（晕）。

晚上慢跑16圈，接下来都是慢跑维持，一周最多上一次强度，主要还是担心伤痛，目前静止时偶尔会疼，跑起来还行，不管用什么跑法，快或慢都不会有明显的疼痛感，但肯定是落下病根，挺无奈。嘉伟去同济SOLO精英组800米×8组（间歇150妙），前2组2’32“，最后2组2’38"，其实今晚不算太热，但也依然是很可观的表现了。感觉他的身材更加瘦削，确实也是不适合主攻马拉松，或许下半年他有机会把5000米跑进17分乃至二级的水平吧，万米以下我跟他的差距还是太大了。

FlashAttention详解

FlashAttn的关键创新点是使用类似Online Softmax来实现self-attn的平铺重计算（tile & recompute）
- 所谓tile，就是把大矩阵分成一个个的block，然后铺满目标矩阵所有需要计算的格子
其实这里面是有矩阵乘算法的问题研讨在里面的，但是因为实际情况中，这个隐层的规模不会特别大（参数量的提升其实多来自更多层的），所以那些所谓的好算法，其实未必有效（正常的矩阵乘法复杂度是 O ( n 3 ) O(n^3) O(n3)，但是通过一些分块技巧，可以大量的加法来减少乘法的运算，可以达到 O ( n 2.5 ) O(n^{2.5}) O(n2.5)以内的水平，但是这个低阶项的系数是大的可怕的）
operation fusing可以减少读写的次数，从而提升计算时间（事实上大部分时间浪费在读写上，而非计算）

以下转自：https://zhuanlan.zhihu/p/626079753

当输入序列（sequence length）较长时，Transformer的计算过程缓慢且耗费内存，这是因为self-attention的time和memory complexity会随着sequence length的增加成二次增长。

标准Attention的中间结果 S , P S,P S,P通常需要通过高带宽内存（HBM）进行存取，两者所需内存空间复杂度为 O ( N 2 ) O(N^2) O(N2)。本文分析：

FlashAttention: 对HBM访问的次数为 O ( N 2 d 2 M − 1 ) O(N^2d^2M^{-1}) O(N2d2M−1)

Attention: 对HBM访问的次数为 O ( N d + N 2 ) O(Nd+N^2) O(Nd+N2)

往往 N ≫ d N\gg d N≫d（例如GPT2中 N = 1024 N=1024 N=1024， d = 64 d=64 d=64），因此FlashAttention会快很多。下图展示了两者在GPT-2上的Forward+Backward的GFLOPs、HBM、Runtime对比（A100 GPU）：

GPU中存储单元主要有HBM和SRAM：HBM容量大但是访问速度慢，SRAM容量小却有着较高的访问速度。例如：A100 GPU有40-80GB的HBM，带宽为1.5-2.0TB/s；每108个流式多核处理器各有192KB的片上SRAM，带宽估计约为19TB/s。可以看出，片上的SRAM比HBM快一个数量级，但尺寸要小许多数量级。

综上，FlashAttention目的不是节约FLOPs，而是减少对HBM的访问。重点是FlashAttention在训练和预测过程中的结果和标准Attention一样，对用户是无感的，而其他加速方法做不到这点。

# 独立的内核调用
a = x + y  # 内核1
b = a * z  # 内核2
c = torch.relu(b)  # 内核3

# 优化后的内核（操作融合为一个内核）
# 定义操作融合的内核（使用 TorchScript）
@torch.jit.script
def fused_kernel(x, y, z):
    a = x + y
    b = a * z
    c = torch.relu(b)
    return c

下面就是所谓的分块流程

x = torch.tensor([1, 2, 3, 4], dtype=torch.float)
# 1
torch.softmax(x, dim=-1) # tensor([0.0321, 0.0871, 0.2369, 0.6439])
# 2 
m = torch.max(x)
f = torch.exp(x - m)
l = torch.sum(f)
f / l # tensor([0.0321, 0.0871, 0.2369, 0.6439])
# 3
x_1 = x[:2]
x_2 = x[2:]
m = torch.max(x)
m_1 = torch.max(x_1)
m_2 = torch.max(x_2)
f_1 = torch.exp(x_1 - m_1)
f_2 = torch.exp(x_2 - m_2)
l_1 = torch.sum(f_1)
l_2 = torch.sum(f_2)
f = torch.cat((torch.exp(m_1 - m) * f_1, torch.exp(m_2 - m) * f_2))
l = torch.exp(m_1 - m) * l_1 + torch.exp(m_2 - m) * l_2
f/l # tensor([0.0321, 0.0871, 0.2369, 0.6439])

20240719

跑休日，今天终于把莫名其妙的因果LLM本子交掉，同济syr还是老样子一点不近人情，我觉得wyl也被搞得挺无奈的。晚上放纵一顿，吃撑得要死，其实天热不是很想吃这么多，但是不多吃点又感觉不够本。回来力量训练，30箭步×8组（+20kg），正反各4组，补单杠和核心，慢跑2000米@500放松收尾。
晒了两三天，晚上也热得很难受。XR最近似乎一直在偷懒，这两天一直见不到他，LXY跑得挺快，目测有4’50"以内，量也至少有12K向上。我一直觉得自己慢跑时动作很别扭，核心不稳，上下左右振幅很大，而安迪，AX，LXY几个人慢跑动作就很自然，尤其安迪摆臂幅度很小很小，上半身几乎不动，看起来就很好。
PS：早起皇岗公园的计划应该是失败了，挺好。

从DDP到FSDP

FSDP: Fully Sharded Data Parallel, by Facebook;
- fsdp unit & sharding
  - fsdp unit：model parallel
  - sharding：os + g + p
- https://docs.google/presentation/d/1ntPSYgWphl8sErwjUl0AztOY1i4SZmQuvmGhkeRElA/edit#slide=id.g2318fd43235_0_292
通过这次的 tutorial 再整体回顾下整个系列关于分布式的基本概念/术语，以及方法；
Shard parameters, gradients, and optimizer states across all data-parallel processes
- GPU memory:
  - P: Parameters
  - G: Gradients
  - OS: Optimizer states
- 暂不考虑 features/activations/embeddings
- 都跟 optimizer 有关
  - 优化器的构造会封装 parameters：optimizer = optim.Adam(model.parameters(), lr=0.001)
  - loss.backward() => parameters.grad
  - optimizer.step() => optimizer states
    - momentum：gradient 的指数平均
    - variance：gradient square 的指数平均

for group in optimizer.param_groups:
    for p in group['params']:
        state = optimizer.state[p]

        # Exponential moving average of gradient values
        m = state['exp_avg']  # 动量参数

        # Exponential moving average of squared gradient values
        v = state['exp_avg_sq']  # 方差参数

混合精度下的 GPU memory 占用， x x x个模型参数（fp16）
- Parameters： 2 x 2x 2x
- Gradients： 2 x 2x 2x
- Optimizer states (Adam, all is fp32) : 12 x = 4 x + 4 x + 4 x 12x=4x+4x+4x 12x=4x+4x+4x
  - Parameters copy： 4 x 4x 4x
  - Momentum： 4 x 4x 4x
  - Variance:： 4 x 4x 4x
- 参考 https://arxiv/abs/1910.02054（ZeRO: Memory Optimizations Toward Training Trillion Parameter Models）
  - ZeRO：Zero Redundancy Optimizer (ZeRO)（零冗余的优化器）

import torch
from torch.distributed._fsdp import FullyShardedDataParallel as FSDP

torch.cuda.set_device(device_id)

sharded_module = FSDP(my_module)
optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)
sharded_module(input).sum().backward()
optim.step()

张量并行（tensor parallel）：分块矩阵并行计算

A = torch.arange(1, 7).reshape(2, 3).to(torch.float)
B = torch.arange(1, 7).reshape(3, 2).to(torch.float)
A, B, A@B

B1 = B[:, 0].view(-1, 1)
B2 = B[:, 1].view(-1, 1)

A @ B1, A @ B2

管道并行（pipeline parallel）

Model Parallelism using multiple GPUs

The figure represents a model with 4 layers placed on 4 different GPUs (vertical axis).
The horizontal axis represents training this model through time demonstrating that only 1 GPU is utilized at a time
总结：任何时刻，只有一张卡在做计算；

F i , j F_{i,j} Fi,j

i i i表示 gpu card，model parts
j j j表示 data splits
To alleviate this problem, pipeline parallelism splits the input minibatch into multiple microbatches and pipelines the execution of these microbatches across multiple GPUs.
The figure represents a model with 4 layers placed on 4 different GPUs (vertical axis). The horizontal axis represents training this model through time demonstrating that the GPUs are utilized much more efficiently. However, there still exists a bubble (as demonstrated in the figure) where certain GPUs are not utilized.

Pipelined Execution

from torch.distributed.pipeline.sync import Pipe

# Need to initialize RPC framework first.
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '29500'
torch.distributed.rpc.init_rpc('worker', rank=0, world_size=1)

# Build pipe.
fc1 = nn.Linear(16, 8).cuda(0)
fc2 = nn.Linear(8, 4).cuda(1)
model = nn.Sequential(fc1, fc2)
# chunks: number of micro-batches (default: 1)
model = Pipe(model, chunks=8)

input = torch.rand(16, 16).cuda(0)
output_rref = model(input)

fsdp

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
- https://arxiv/pdf/2304.11277
流程
- FSDP Unit [Vertically “Splitting”]
  - layer/module/stage
- Sharding [Horizontally “Splitting”]
  - os + g + p
- All-Gather
- Reduce-Scatter
split our FSDP-Unit parameters across GPUs
all-gather per FSDP-unit => Forward/Backward

torch做fsdp的api：

import torch
from torch.distributed._fsdp import FullyShardedDataParallel as FSDP

torch.cuda.set_device(device_id)

sharded_module = FSDP(my_module)
optim = torch.optim.Adam(sharded_module.parameters(), lr=0.0001)
sharded_module(input).sum().backward()
optim.step()

DDP每个节点上都放了完整的模型拷贝，然后还要每个节点上放batch，这样batch就很难放很大了。FSDP是每个节点上只放了一个子模型，相应地节点之间就要进行数据传播，但是这样时间换空间，每个节点就有更多的空间存储更多的batch了。

20240720

调整两日后，想要跑个强度，想试试能不能在三伏天把万米跑进40分钟。本来准备起早赶个凉快，结果不到六点半外面已经艳阳高照，还没到学校就已经出汗，在空调和操场之间，我还是选择了空调。
等到晚上八点半，出来吹过一阵风，还挺凉快，兴冲冲地做准备活动，前2000米体感都很良好，6圈的某一瞬，全身突然就烧了起来，肉眼可见的红温，很快就感觉不支，顶到3000米@11’44"停歇，很是不得兴。
嘉伟九点来操场（已经是练完了的），他这两天黑练不少，昨天下午四点顶着烈日，跟之前在129一起练过的消防员出去跑了10km@4’15"，完事又补了4组400米间歇，今天又陪静香姐闺蜜跑1600米的间歇，以及跟LXY环校慢跑了会儿（10km@???），到九点他从外面跑到操场，给我带了几块西瓜，强力补剂，让我成功补到10km的总量，差强人意，忘带毛巾，整个人都被拧干了似的，真的好难呀，我真的已经不想再在这种天气硬扛了哎。

LoRA基础：

LoRA: Low-Rank Adaption of large language models
- A random projection to a smaller subspace
- parameter efficient
  - PEFT
- https://arxiv/abs/2106.09685
实现细节上
- 是一个 adaptor of pretrained model
  - adaptor 是小的
  - pretrained model 是大的
    - large language models
    - large vision models
- freezes pre-trained model weights
- injects trainable rank decomposition matrices
  - into each layer of transformer Architecture
基本思想
- 对于 transformer，最为重要的 self attention module
  - Wq、Wk、Wv、Wo 表示 learnable query/key/value/output projection matrices
  - 将这些记为模型的参数 Φ \Phi Φ
- 在 full fine-tune （不进行任何的 freeze）时，model 会初始化为预训练好的权重 Φ \Phi Φ，最终 fine-tune 之后，调整为 Φ + Δ Φ \Phi+\Delta\Phi Φ+ΔΦ（基于反向传播和梯度下降）
  - 每一个下游任务都要学习对应的 Δ Φ \Delta\Phi ΔΦ（ ∣ Δ Φ ∣ = ∣ Φ ∣ |\Delta\Phi|=|\Phi| ∣ΔΦ∣=∣Φ∣）
- LoRA 作为一个 parameter efficient 将与（进一步）下游任务相关的 Δ Φ = Δ Φ ( Θ ) \Delta\Phi=\Delta\Phi(\Theta) ΔΦ=ΔΦ(Θ)，进一步编码（encode）为规模更小的参数 Θ \Theta Θ
  - ∣ Θ ∣ ≪ ∣ Δ Φ ∣ = ∣ Φ ∣ |\Theta| \ll |\Delta\Phi|=|\Phi| ∣Θ∣≪∣ΔΦ∣=∣Φ∣
  - LoRA 采用 low-rank representation 来 encode Δ Φ \Delta\Phi ΔΦ

ft v.s. lora

h = f W ( x ) ⇓ h = f W + Δ W ( x ) \begin{split} &h=f_W(x)\\ &\Downarrow\\ &h=f_{W+\Delta W}(x)\\ \end{split} h=fW(x)⇓h=fW+ΔW(x)

W ∈ R A × B W\in\mathbb R^{A\times B} W∈RA×B

Δ W = W A W B , W A ∈ R A × r , W B ∈ R r × B \Delta W=W_AW_B,\\ W_A\in \mathbb R^{A\times r}, W_B\in \mathbb R^{r\times B} ΔW=WAWB,WA∈RA×r,WB∈Rr×B

最终的参数量由 $A\times B $ 降至 r × ( A + B ) r\times (A+B) r×(A+B)
- A=100，B=500，r=5
- 5*(100+500) / (100*500) == 3000/50000 == 6%
- 这就叫 Parameter efficiency

伪代码：

r r r 会是一个超参: a trade-off between model complexity, adaptation capacity, and the risk of underfitting or overfitting
- A smaller r r r leads to a simpler low-rank matrix, which results in fewer parameters to learn during adaptation.
  - This can lead to faster training and potentially reduced computational requirements.
- However, with a smaller r r r, the capacity of the low-rank matrix to capture task-specific information decreases.
  - This may result in lower adaptation quality, and the model might not perform as well on the new task compared to a higher r r r.

input_dim = 768  # e.g., the hidden size of the pre-trained model
output_dim = 768  # e.g., the output size of the layer
rank = 8  # The rank 'r' for the low-rank adaptation

W = ... # from pretrained network with shape input_dim x output_dim

W_A = nn.Parameter(torch.empty(input_dim, rank)) # LoRA weight A
W_B = nn.Parameter(torch.empty(rank, output_dim)) # LoRA weight B

# Initialization of LoRA weights
nn.init.kaiming_uniform_(W_A, a=math.sqrt(5))
nn.init.zeros_(W_B)

def regular_forward_matmul(x, W):
    h = x @ W
return h

def lora_forward_matmul(x, W, W_A, W_B):
    h = x @ W  # regular matrix multiplication
    h += x @ (W_A @ W_B) * alpha # use scaled LoRA weights
return h

下面是一个例子，嵌入lora前后的模型结构差异：

20240721

type *.txt > a.txt

// 清理快捷方式病毒
attrib -s -h *. /S /D
attrib +s +h System~1
attrib +s +h Recycled
attrib +s +h +a ntldr

// 文件加密
cls
@ECHO OFF
title Folder Private
if EXIST "HTG Lock" goto UNLOCK
if NOT EXIST Private goto MDLock
:CONFIRM
echo 你确定要加密隐藏Private文件夹吗？(Y/N)
set/p "cho=>"
if %cho%==Y goto LOCK
if %cho%==y goto LOCK
if %cho%==n goto END
if %cho%==N goto END
echo Invalid choice.
goto CONFIRM
:LOCK
ren Private "HTG Lock"
attrib +h +s "HTG Lock"
echo Folder locked
goto End
:UNLOCK
echo 输入密码来解锁文件夹
set/p "pass=>"
if NOT %pass%== 在此设置密码 goto FAIL
attrib -h -s "HTG Lock"
ren "HTG Lock" Private
echo Folder Unlocked successfully
goto End
:FAIL
echo Invalid password
goto end
:MDLock
md Private
echo Private created successfully
goto End
:End

// 系统垃圾清理
@echo off
echo 请勿关闭本窗口
echo 正在清除系统垃圾文件...
del /f /s /q %systemdrive%\*.tmp 
del /f /s /q %windir%\prefetch\*.* rd /s /q %windir%\temp & md %windir%\temp
del /f /s /q "%userprofile%\Local Settings\Temp\*.*"
del /f /s /q %systempdrive%\*._mp
del /f /s /q %windir%\*.bak
del /f /s /q %systempdrive%\*.log
del /f /s /q %systempdrive%\*.gid
del /f /s /q %systempdrive%\*.chk
del /f /s /q %systempdrive%\*.old
del /f /s /q %systempdrive%\recycled\*.*
del /f /q %userprofile%\cookies\*.* 
del /f /q %userprofile%\recent\*.* 
del /f /s /q %userprofile%\recent\*.* 

echo 清除系统垃圾完成
pause >nul

关于quantize量化的一些记录：

HuggingFace bitsandbytes
GPTQ: data compression, GPU，https://arxiv/pdf/2210.17323
- GPTQ is a post-training quantization (PTQ) method for 4-bit quantization that focuses primarily on GPU inference and performance.
- to quantizing the weights of transformer-based models
- first applies scalar quant to the weights, followed by vector quant to the residuals
- The idea behind the method is that it will try to compress all weights to a 4-bit quantization by minimizing the mean squared error to that weight.
  - During inference, it will dynamically dequantize its weights to float16 for improved performance whilst keeping memory low.
GGUF: ggml, CPU
- c++,
- llama.cpp, https://github/ggerganov/llama.cpp
AWQ：activation aware quantization，https://arxiv/abs/2306.00978

环境配置：

# Latest HF transformers version for Mistral-like models
# !pip install git+https://github/huggingface/transformers.git
# !pip install accelerate bitsandbytes xformers

# GPTQ Dependencies
# !pip install optimum
# !pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
# 源码安装

# GGUF Dependencies
# !pip install 'ctransformers[cuda]'

# Load in your LLM without any compression tricks
model_id = "meta-llama/Meta-Llama-3-8B-Instruct" 
# model_id = "HuggingFaceH4/zephyr-7b-beta"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=bfloat16,
    device_map="auto"
)

后记

很不巧，似乎是错过了。

世间哪有什么巧合，都是精心设下的心计罢了。

你不是拐弯抹角的，向来直来直去。悄悄地走来，悄悄地离去，我便也就不宜多事了。

晚上起风，是该降温了。

本文标签： Memo cy

版权声明：本文标题：【完结】cyのMemo（20240609~20240721）内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dianzi/1725925347a1049302.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

【完结】cyのMemo（20240609~20240721）

序言

文章目录

20240611

20240612

20240613

20240614

20240615~20240616

20240617

20240618

20240619

20240620

20240621

20240622

20240623

20240624

20240625

20240626

20240627

20240628

20240629

20240630

20240701

20240702

20240703

20240704

20240705~20240706

20240707

20240708

20240709

20240710

20240711

20240712

20240713

20240714

20240715

20240716

20240717

20240718

20240719

20240720

20240721

后记

更多相关文章

【Alternate Memo(剪贴板软件)】Alternate Memo(剪贴板软件) V3.010官方版官方免费下载

树莓派清理空间（MEMO)

【完结】cyのMemo（20240209~20240312）

【完结】囚生CYの备忘录（20231014~20231117）

【完结】cyのMemo（20240722~20240819）

【完结】cyのMemo（20240609~20240721）

发表评论

推荐文章

爱的邮箱（@love.com）申请方法!

win10 取消系统保留20%网速，让网速马力全开

Chrome浏览器取消置顶

Chrome 浏览器倍速播放视频

chrome浏览器项目登陆页面卡死，XHR查看ajax请求一直在pending状态

热门文章

攻略：手把手教你如何看懂以太坊区块链浏览器（配图更清晰）

win10资源管理器一直自动重启，桌面和任务栏不断刷新，无法操作

计算机睡眠和休眠的区别win10,win10睡眠和休眠有何不同_win10休眠和睡眠的区别...

让Chrome浏览器下载而非打开文件

AMD平台配置安卓模拟器步骤

FFmpeg开发(七)——Qt视频播放器之播放列表类(参考了暴风影音、迅雷影音)

暴风影音内MEE引擎揭秘

Ubuntu14.10 安装搜狗拼音输入法 google输入法

U盘移动硬盘变本地硬盘怎么办 ，移动硬盘变本地硬盘的恢复方法

百度智能云“千帆大模型平台”升级，大模型最多，Prompt模板最全

最新文章

网吧XP无盘系统集成优化及母盘封装

制作系统启动光盘

锐起无盘精华100问！（包括3.1，3.0版本）

用GHOST备份ubuntu系统

新萝卜家园 GhostXP SP3 电脑城装机版 V2011.07

启动易(EASYBOOT)制作启动光盘

网吧母盘网上精华＋个人总结＝超详细

EeePC 901换装Windows XP的步骤

GHOST恢复盘.维护盘关键词

Ghost过程中出现GHOSTERR.TXT文件的解决方法

U盘移动硬盘变本地硬盘怎么办，移动硬盘变本地硬盘的恢复方法

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载