BitNet b1.58

1ビットで推論するLLM.同じモデルサイズとトレーニングトークンの従来の16ビットモデルと比較して,同等かそれ以上の性能を実現するものとしている.

このモデルは,各パラメータが三値 $(-1, 0, 1)$ を取る1.58ビットのLLMである.従来から,BitNetは知られていたが,この論文ではBitNetのパラメータに特徴フィルタリングを可能とする $0$ を導入し性能を向上させている.

なお,各パラメータが三値 $(-1, 0, 1)$ を取る場合,必要な bit数は,\[\log_{2}3=\frac{\log_{10}3}{\log_{10}2}=1.5835...\]となる.

出典:arXiv:2402.17764v1 [cs.CL]

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei: The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, arXiv.org, arXiv:2402.17764v1 [cs.CL] , Tue, 27 Feb 2024 18:56:19 UTC.
Junbum Lee:Beomi/BitNet-Transformers ,BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama$(2)$ Architecture,https://github.com/Beomi/BitNet-Transformers/tree/main

Mathematics is the language with which God has written the universe.