๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

pytorch zero to all2

[pytorch zero to all ๊ฐ•์˜ ๋‚ด์šฉ ์ •๋ฆฌ] 3๊ฐ• gradient descent ๊ฐ•์˜ ์ฃผ์ œ : 3๊ฐ• Gradient descendent - ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ๊ฐ•์˜ ๋ชฉํ‘œ : ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ  ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋Š” W๊ฐ’์„ ์ฐพ๋Š” ๊ณผ์ •๊ณผ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณธ๋‹ค. ๋˜ํ•œ ๋ฐฉ์ •์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ณ  ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•œ๋‹ค. - ๐ŸŽฏ The goal of training or learning in machine learning is to find the optimal value of W that minimizes the loss function. - ๐Ÿงญ The gradient descent algorithm provides a systematic way to identify the optimal value of W by iteratively updating the w.. 2024. 1. 14.
[pytorch zero to all ๊ฐ•์˜ ๋‚ด์šฉ ์ •๋ฆฌ] 2๊ฐ• Linear Model - ์„ ํ˜• ๋ชจ๋ธ ์บก์Šคํ†ค ์ฃผ์ œ๊ฐ€ LLM์„ ์ด์šฉํ•œ ๊ฒ€์ƒ‰ ์—”์ง„ ์ œ์ž‘์œผ๋กœ ์ขํ˜€์ง€๋ฉด์„œ ํŒŒ์ดํ† ์น˜ ์Šคํ„ฐ๋””๋ฅผ ๊ฒจ์šธ๋ฐฉํ•™๋™์•ˆ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ต์ˆ˜๋‹˜๊ป˜์„œ ๊ณต์œ ํ•ด์ฃผ์‹  pytorch zero to all ๊ฐ•์˜๋ฅผ ์ˆ˜๊ฐ•ํ•˜๋ฉด์„œ ์ •๋ฆฌํ•œ ๋‚ด์šฉ์„ ๊ณต์œ ํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜ํ•™์ ์ธ ๋‚ด์šฉ๊ณผ ์›๋ฆฌ์— ๋Œ€ํ•ด์„œ๋Š” ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•˜๊ณ  ๊ฐ•์˜์˜ ์ˆฒ์„ ๋ณด๋Š” ์ฃผ์ œ์œ„์ฃผ๋กœ ์ •๋ฆฌํ•œ ๋ถ€๋ถ„์ด๋‹ˆ ์ €์ฒ˜๋Ÿผ ํŒŒ์ดํ† ์น˜์— ์ œ๋กœ๋ฒ ์ด์Šค์˜€๋˜ ๋ถ„๋“ค๊ป˜์„œ๋Š” ํ•œ๋ฒˆ ์ฝ๊ณ  ํŒŒ์ดํ† ์น˜ ์Šคํ„ฐ๋””๋ฅผ ์‹œ์ž‘ํ•˜์‹œ๋Š”๊ฒŒ ๋„์›€์ด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 13๊ฐ•๊นŒ์ง€ ๋‚ด์šฉ์„ ์ „๋ถ€ ์˜ฌ๋ฆฌ๊ณ  ์ดํ›„ ์ถ”๊ฐ€์ ์ธ ์Šคํ„ฐ๋””๋ฅผ ์ง„ํ–‰ํ• ๋•Œ๋งˆ๋‹ค ์‹œ๊ฐ„์„ ๋‚ด์–ด ๊ณต๋ถ€ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•์˜ ๋ชฉํ‘œ : ํŒŒ์ดํ† ์น˜์˜ ์„ ํ˜• ๋ชจ๋ธ ๊ฐœ๋…์„ ์ด์•ผ๊ธฐํ•˜๊ณ  ์ง€๋„ ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์„ ํ˜• ๋ชจ๋ธ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ชจ๋ธํ•™์Šต ๋ฐ ํ‰๊ฐ€ ๊ณผ์ •์„ ์„ค๋ช…ํ•œ๋‹ค. ๋˜ํ•œ ์†์‹ค ๊ณ„์‚ฐ๊ณผ ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ(MSE)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ.. 2024. 1. 7.