๊ด€๋ฆฌ ๋ฉ”๋‰ด

hyerong's Dev_world๐ŸŽก

MS - DeepSpeed๋ž€ ๋ฌด์—‡์ธ๊ฐ€ ๋ณธ๋ฌธ

AI

MS - DeepSpeed๋ž€ ๋ฌด์—‡์ธ๊ฐ€

hyerong 2024. 11. 11. 01:09

๋”ฅ์Šคํ”ผ๋“œ(DeepSpeed)๋Š” ๋งˆ์ดํฌ๋กœ์†Œํ”„ํŠธ์—์„œ ๋ฐœํ‘œํ•œ ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค. 

์˜คํ”ผ์…œ ๊ฐœ๋ฐœ ๊นƒํ—ˆ๋ธŒ์—์„œ๋Š” ํ•œ๋ฒˆ์˜ ํด๋ฆญ์œผ๋กœ ์ฑ—์ง€ํ”ผํ‹ฐ์™€ ์œ ์‚ฌํ•œ ๋ชจ๋ธ ๊ต์œก์„ ์ง€์›ํ•ด ๋ชจ๋“  ๊ทœ๋ชจ์—์„œ ํฐ ๋น„์šฉ ์ ˆ๊ฐ์œผ๋กœ SOTA RLHF ์‹œ์Šคํ…œ๋ณด๋‹ค 15๋ฐฐ ๋น ๋ฅธ ์†๋„๋ฅผ ์ œ๊ณตํ•œ๋‹ค๊ณ  ๋งํ•œ๋‹ค. 

gpu ๋ฉ”๋ชจ๋ฆฌ์™€ ์—ฐ์‚ฐ ์ž์›์„ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉด์„œ ํฐ ์–ธ์–ด๋ชจ๋ธ ํ›ˆ๋ จ๊ณผ ๋ฐฐํฌ์— ์šฉ์ดํ•˜๋‹ค! 

DeepSpeed๋Š” ๋ชจ๋ธ ๋ณ‘๋ ฌํ™”, ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ›ˆ๋ จ, ZeRO(Zero Redundancy Optimizer) ๊ธฐ์ˆ ์„ ์ œ๊ณตํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๊ณ  ํ›ˆ๋ จ ์†๋„๋ฅผ ๋†’์ธ๋‹ค๊ณ  ํ•œ๋‹ค. 

์†๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ• ํ•˜๋‚˜ํ•˜๋‚˜์— ๋Œ€ํ•ด ์ข€ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด์ž.

 

  • ZeRO ์ตœ์ ํ™”:
    • ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ, ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ, ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๋ถ„์‚ฐํ•˜์—ฌ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์„ ์ตœ์†Œํ™”ํ•˜๊ณ  ํฐ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค. ZeRO๋Š” Stage 1, Stage 2, Stage 3๋กœ ๋‚˜๋‰˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„๋Š” ์ ์ฐจ์ ์œผ๋กœ ๋” ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ๋ณ‘๋ ฌํ™”:
    • ๋ชจ๋ธ์„ ์—ฌ๋Ÿฌ GPU์— ๋ถ„์‚ฐํ•˜์—ฌ ํ›ˆ๋ จํ•˜๋ฉฐ, ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌํ™” ๋ฐ ํ…์„œ ๋ณ‘๋ ฌํ™”๋ฅผ ์ง€์›ํ•˜์—ฌ ํ›ˆ๋ จ ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
  • ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ›ˆ๋ จ (Mixed Precision Training):
    • fp16 ๋˜๋Š” bf16๊ณผ ๊ฐ™์€ ๋‚ฎ์€ ์ •๋ฐ€๋„์˜ ๋ถ€๋™ ์†Œ์ˆ˜์  ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์†๋„๋ฅผ ๋†’์ด๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ž…๋‹ˆ๋‹ค.
  • ์‹ฌํ™”๋œ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌํ™”:
    • ํฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์—ฌ๋Ÿฌ GPU์— ๋ถ„์‚ฐํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ํ›ˆ๋ จ์„ ์ง€์›ํ•˜๋ฉฐ, ํ›ˆ๋ จ ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•ฉ๋‹ˆ๋‹ค.
  • Offload ๊ธฐ์ˆ :
    • CPU ๋˜๋Š” NVMe์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ ์˜คํ”„๋กœ๋“œํ•˜์—ฌ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

 

MS DeepSpeed

 

 

- ๋”ฅ์Šคํฌ๋ฆฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ ๋ฐฉ๋ฒ•

pip install deepspeed

 

- ๊ธฐ๋ณธ ์‚ฌ์šฉ ์˜ˆ๊ธฐ 

PyTorch์™€ ํ•จ๊ป˜ DeepSpeed๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด deepspeed.initialize ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์„ค์ • ํŒŒ์ผ์„ ๋กœ๋“œํ•˜๊ณ  ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

1. DeepSpeed ์„ค์ • ํŒŒ์ผ ์ƒ์„ฑ:

ds_config.json ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜์—ฌ ZeRO, ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ๋“ฑ์˜ ์„ค์ •์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

{
  "train_batch_size": 8, 
  
  #ํ˜ผํ•ฉ ์ •๋ฐ€๋„
  "fp16": {
    "enabled": true
  },
  
  #zero 
  "zero_optimization": {
    "stage": 2
  }
}

 

2. ๋ชจ๋ธ ์ดˆ๊ธฐํ™” ์ฝ”๋“œ:

PyTorch ๋ชจ๋ธ์„ DeepSpeed๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค์ • ํŒŒ์ผ์„ ๋กœ๋“œํ•˜๊ณ  deepspeed.initialize ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

import deepspeed
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer = nn.Linear(768, 10)

    def forward(self, x):
        return self.layer(x)

# ์„ค์ • ํŒŒ์ผ ๊ฒฝ๋กœ
ds_config = "ds_config.json"

# ๋ชจ๋ธ ๋ฐ DeepSpeed ์ดˆ๊ธฐํ™”
model = MyModel()
model_engine, optimizer, _, _ = deepspeed.initialize(config=ds_config, model=model, model_parameters=model.parameters())

# ํ›ˆ๋ จ ๋ฃจํ”„
for batch in data_loader:
    outputs = model_engine(batch)
    loss = loss_fn(outputs, labels)
    model_engine.backward(loss)
    model_engine.step()

 

 


 

๐Ÿ“Œ ๋”ฅ์Šคํฌ๋ฆฐ์— ๋” ์•Œ์•„๋ณด๊ธฐ์— ์ข‹์€ ์œ ํŠœ๋ธŒ ๊ฐ•์˜ ๐Ÿ”

https://www.youtube.com/watch?v=g_O3O4ExaUY 

 

 

LLM ํŒŒ์ธํŠœ๋‹์„ ํ•˜๋ฉด์„œ ํฐ ๊ฑธ๋ฆผ๋Œ์ด ๋ฉ”๋ชจ๋ฆฌ ์ด์Šˆ์˜€๋Š”๋ฐ (cuda memory out) ๋‹ค์Œ ํŒŒ์ธํŠœ๋‹์‹œ์— ๋”ฅ์Šคํฌ๋ฆฐ์„ ์ ์šฉํ•ด์„œ ํ›ˆ๋ จ์‹œ์ผœ๋ด์•ผ๊ฒ ๋‹ค.