Initial Query

What is the hidden dimension of LLAMA-2 70b?

LLM Generation

Query

What is the hidden dimension of LLAMA-2 70b?

Output

The **hidden dimension** (also known as `d_model`) of **LLAMA-2 70B** is **8192**. This value represents the size of the token embeddings and the internal state throughout the transformer layers. ### Key Details: - **Total Parameters**: ~70 billion. - **Architecture**: The hidden dimension is consistent with the original LLAMA 65B model, where `d_model = 8192`. The parameter increase to 70B likely stems from adjustments in other components (e.g., more layers or a larger feed-forward expansion factor) while retaining the same hidden dimension. - **Attention Heads**: Uses **64 attention heads** with a **head dimension of 128** (`d_head = 128`), such that `d_model = heads × d_head = 64 × 128 = 8192`. This dimension ensures compatibility with hardware optimizations (e.g., tensor cores on GPUs/TPUs) and aligns with standard practices in large transformer models.

Check if any of the following conditions are true:

Substring Evaluation

Testing if the previous output contains the string 8,192: False

Substring Evaluation

Testing if the previous output contains the string 8192: True

Final Answer: True