23
submitted 3 weeks ago by [email protected] to c/[email protected]
top 5 comments
sorted by: hot top controversial new old
[-] [email protected] 4 points 2 weeks ago

To actually read how they did it, here is there model page: https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

Approach:

  • meta-llama/Meta-Llama-3-8B-Instruct as the base
  • NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by empirical RoPE theta optimization
  • Progressive training on increasing context lengths, similar to Large World Model [2] (See details below)

Infra

We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 1048k tokens on Crusoe Energy high performance L40S cluster.

Notably, we layered parallelism on top of Ring Attention with a custom network topology to better leverage large GPU clusters in the face of network bottlenecks from passing many KV blocks between devices. This gave us a 33x speedup in model training (compare 524k and 1048k to 65k and 262k in the table below).

Data

For training data, we generate long contexts by augmenting SlimPajama. We also fine-tune on a chat dataset based on UltraChat [4], following a similar recipe for data augmentation to [2].

[-] [email protected] 1 points 3 weeks ago

Oh hey look, another community to block.

[-] [email protected] 1 points 2 weeks ago

That's cool. Am I reading right that this wouldn't run on consumer grade hardware though?

[-] [email protected] 4 points 2 weeks ago

I believe you'd need roughly 500GB of RAM to run it minimum at full context length. There is chatter that 125k context took and used 40GB

I know I can load the 70B models into my laptop at lower bits but it consumes about 140GB of RAM.

[-] [email protected] 3 points 2 weeks ago

It is llama3-8B so it is not out of question but I am not sure how much memory you would need to really go to 1M context window. They use ring attention to achieve high context window, which I am unfamiliar with but that seems to lower greatly the memory requirements.

this post was submitted on 25 Jun 2024
23 points (92.6% liked)

AI

3938 readers
18 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 3 years ago