What is the hidden dimension of LLAMA-2 70b?
What is the hidden dimension of LLAMA-2 70b?
The Llama-2 models follow a very similar architecture to the original LLaMA models. For the largest model, Llama-2‑70B, the configuration is essentially the same as the LLaMA‑65B (which has 80 layers, 64 attention heads, and a hidden dimension of 8192), with only minor differences in parameter count (hence the “70B” name rather than 65B). In other words, the transformer “hidden size” (or model dimension) is 8192. Keep in mind that these numbers come from the released model cards and technical details provided by Meta. If you’re planning on using or modifying the model, it’s a good idea to check the latest configuration details in the official documentation or repository to be sure.
Testing if the previous output contains the string 8,192
: False
Testing if the previous output contains the string 8192
: True
What is the hidden dimension of LLAMA-2 70b?
What is the hidden dimension of LLAMA-2 70b?
The LLAMA-2 70B model uses a hidden (or model) dimension of 8,192. This means that each transformer layer in the model works with 8,192-dimensional vectors when processing tokens. To elaborate a bit: • LLAMA-2 comes in multiple sizes, and the architecture scales in a manner similar to the original LLaMA models. • In the series, the smaller models (like the 7B and 13B variants) use lower hidden dimensions, while the 70B variant is configured with 8,192 as its hidden size. • This value is an important hyperparameter that determines the size of the internal representations and, indirectly, impacts the model’s capacity and computational requirements. Keep in mind that these figures are based on the released architectural details from the creators (Meta) and are subject to change only if future revisions or different configurations are adopted in derivative versions or further research.
Testing if the previous output contains the string 8,192
: True
Testing if the previous output contains the string 8192
: False