๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

[๋”ฅ๋Ÿฌ๋‹] 8. RNN ๋ณธ๋ฌธ

๐Ÿค– AI/Deep Learning

[๋”ฅ๋Ÿฌ๋‹] 8. RNN

jimingee 2024. 7. 1. 18:55

[ ๋ชฉ์ฐจ ]

 

1. ์ˆœ์ฐจ๋ฐ์ดํ„ฐ๋ž€?

2. Recurrent Neural Network

3. Vanilla RNN (๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ์˜ RNN ๋ชจ๋ธ)


1. ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ž€?

RNN(Recurrent Neural Network) : ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ๊ฐ™์€ ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ(Sequential Data) ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ

 

 

์ˆœ์ฐจ ๋ฐ์ดํ„ฐ(Sequential Data) - ์˜ˆ์‹œ: ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ, ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ

  • ์ˆœ์„œ(Order)๋ฅผ ๊ฐ€์ง€๊ณ  ๋‚˜ํƒ€๋‚˜๋Š” ๋ฐ์ดํ„ฐ
  • ๋ฐ์ดํ„ฐ ๋‚ด ๊ฐ ๊ฐœ์ฒด๊ฐ„์˜ ์ˆœ์„œ๊ฐ€ ์ค‘์š”
  • ์˜ˆ) ๋‚ ์งœ์— ๋”ฐ๋ฅธ ๊ธฐ์˜จ ๋ฐ์ดํ„ฐ, ๋‹จ์–ด๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์žฅ, DNA ์—ผ๊ธฐ ์„œ์—ด, ์ƒ˜ํ”Œ๋ง๋œ ์†Œ๋ฆฌ ์‹ ํ˜ธ ๋“ฑ
  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ (Time-Series Data)
    • ์ผ์ •ํ•œ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๊ฐ€์ง€๊ณ  ์–ป์–ด๋‚ธ ๋ฐ์ดํ„ฐ
    • ์˜ˆ) ์—ฐ๋„๋ณ„ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ํ‰๊ท  ๊ธฐ์˜จ, ์‹œ๊ฐ„๋ณ„ ์ฃผ์‹ ๊ฐ€๊ฒฉ ๊ธฐ๋ก ๋“ฑ
  • ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ (Natural Language)
    • ์ธ๋ฅ˜๊ฐ€ ๋งํ•˜๋Š” ์–ธ์–ด๋ฅผ ์˜๋ฏธ
    • ์ฃผ๋กœ ๋ฌธ์žฅ ๋‚ด์—์„œ ๋‹จ์–ด๊ฐ€ ๋“ฑ์žฅํ•˜๋Š” ์ˆœ์„œ์— ์ฃผ๋ชฉ

 

๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•œ ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์˜ˆ์‹œ

1. ๊ฒฝํ–ฅ์„ฑ ํŒŒ์•… : ์ฃผ๊ฐ€ ์˜ˆ์ธก, ๊ธฐ์˜จ ์˜ˆ์ธก ๋“ฑ, ๋‹ค์–‘ํ•œ ์‹œ๊ณ„์—ด ํŠน์ง•์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ์— ์ ์šฉ ๊ฐ€๋Šฅ

2. ์Œ์•… ์žฅ๋ฅด ๋ถ„์„ : ์˜ค๋””์˜ค ํŒŒ์ผ์€ ๋ณธ์งˆ์ ์œผ๋กœ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ, ์ŒํŒŒ ํ˜•ํƒœ ๋“ฑ์„ ๋ถ„์„ํ•˜์—ฌ ์˜ค๋””์˜ค ํŒŒ์ผ์˜ ์žฅ๋ฅด๋ฅผ ๋ถ„์„

3. ๊ฐ•์ˆ˜๋Ÿ‰์˜ˆ์ธก(Precipitation Forecasting) : ๊ตฌ๊ธ€์—์„œ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ๊ธฐ์ˆ ๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ฃผ๋„์ ์œผ๋กœ ์—ฐ๊ตฌ (์˜ˆ: MetNet)

4. ์Œ์„ฑ ์ธ์‹ (Speech Recognition) : ์Œ์„ฑ์— ํฌํ•จ๋œ ๋‹จ์–ด๋‚˜ ์†Œ๋ฆฌ๋ฅผ ์ถ”์ถœ (์˜ˆ: Apple Siri, Google Assistant)

5. ๋ฒˆ์—ญ๊ธฐ (Translator) : ๋‘ ์–ธ์–ด๊ฐ„ ๋ฌธ์žฅ ๋ฒˆ์—ญ, ๋”ฅ๋Ÿฌ๋‹ ๋ฐœ์ „ ์ดํ›„ ๋ฒˆ์—ญ์˜ ์ž์—ฐ์Šค๋Ÿฌ์›€ ํ–ฅ์ƒ (์˜ˆ: ๊ตฌ๊ธ€ ๋ฒˆ์—ญ, ๋„ค์ด๋ฒ„ ํŒŒํŒŒ๊ณ  ๋“ฑ)

6. ์ฑ—๋ด‡ (chatbot) : ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ์‘๋‹ตํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ, ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์„ ๋ถ„์„ ํ›„ ์งˆ๋ฌธ์— ์ ์ ˆํ•œ ์‘๋‹ต ์ƒ์„ฑ

 

 

 

2. Recurrent Neural Network

 

[ Fully connected Layer๊ฐ€ ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋Š” ์ด์œ  ]

  • FC Layer๋Š” ์ž…๋ ฅ ๋…ธ๋“œ ๊ฐœ์ˆ˜์™€ ์ถœ๋ ฅ ๋…ธ๋“œ ๊ฐœ์ˆ˜๊ฐ€ ์ •ํ•ด์ง
  • ์ˆœ์ฐจ๋ฐ์ดํ„ฐ๋Š” ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด๋ฃจ๋Š” ๊ฐœ์ฒด ์ˆ˜๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Œ (์˜ˆ, ๋ฌธ์žฅ์€ ๋ชจ๋‘ ์„œ๋กœ ๋‹ค๋ฅธ ๊ฐœ์ˆ˜์˜ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ง)
  • ๋˜ํ•œ, FC Layer๋Š” ์ˆœ์„œ ๊ณ ๋ ค ๋ถˆ๊ฐ€๋Šฅ

 

 

 

RNN ( Recurrent Neural Network )

  • ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ
  • RNN์˜ ๋Œ€ํ‘œ์ ์ธ ๊ตฌ์„ฑ์š”์†Œ -> Hidden State: ์ˆœํ™˜๊ตฌ์กฐ๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ํ•ต์‹ฌ ์žฅ์น˜

 

[ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ (์ˆœ์ฐจ์  ๋ฐ์ดํ„ฐ) ๊ตฌ์กฐ ]

  • $x_1,x_2,x_3, ... , x_n $๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ์˜ ๋‚˜์—ด
  • ๊ฐ $x_t$์˜ ์˜๋ฏธ
    • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ : ์ผ์ • ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์œผ๋กœ ๋‚˜๋ˆ ์ง„ ๋ฐ์ดํ„ฐ ๊ฐœ์ฒด ํ•˜๋‚˜
    • ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ : ๋ฌธ์žฅ ๋‚ด์˜ ๊ฐ ๋‹จ์–ด

 

  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ๋ฒกํ„ฐ ๋ณ€ํ™˜
    • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๊ฐ $x_t$๋Š” ๋ฒกํ„ฐ ํ˜•ํƒœ
    • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ, ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ด๋ฃจ๋Š” Feature ๊ฐ’๋“ค์„ ์›์†Œ๋กœ ํ•˜๋Š” ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜

 

 

  • ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ์˜ ๋ฒกํ„ฐ ๋ณ€ํ™˜
    • ์ž„๋ฒ ๋”ฉ(Embedding)๊ฐ ๋‹จ์–ด๋“ค์„ ์ˆซ์ž๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜
    • ๋Œ€ํ‘œ์ ์ธ ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฒ•
      • One-hot Encoding : ํ•˜๋‚˜์˜ ์š”์†Œ๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0์ธ ํฌ์†Œ ๋ฒกํ„ฐ
      • Word2Vec ์ฃผ์–ด์ง„ ๋‹จ์–ด๋“ค์„ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ (๋‹จ์–ด๊ฐ„์˜ ์—ฐ๊ด€์„ฑ ํ‘œํ˜„)

 

 

 

 

3. Vanilla RNN

Vanilla RNN์˜ ๊ตฌ์กฐ

  • ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ์˜ RNN๋ชจ๋ธ
  • ๋‚ด๋ถ€์— ์„ธ๊ฐœ์˜ FC Layer๋กœ ๊ตฌ์„ฑ
    • $W_{hh}$ : hidden state($h_{t-1}$)๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” Layer์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ
    • $W_{xh}$ : ํ•œ ์‹œ์ ์˜ ์ž…๋ ฅ๊ฐ’($x_t$)์„ ๋ณ€ํ™˜ํ•˜๋Š” Layer์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ
    • $W_{hy}$ : ํ•œ ์‹œ์ ์˜ ์ถœ๋ ฅ๊ฐ’($y_t$)์„ ๋ณ€ํ™˜ํ•˜๋Š” Layer์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ

 

[ Vanilla RNN์˜ ์—ฐ์‚ฐ ๊ณผ์ • - hidden state, output ]

 

  • ํ˜„์žฌ ์ž…๋ ฅ๊ฐ’($x_t$)์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด hidden state ($h_t$) ๊ณ„์‚ฐ
  • $ h_t = tanh(h_{t-1} W_{hh} + x_t W_{xh} $

 

 

 

 

  • ํ˜„์žฌ ์ž…๋ ฅ๊ฐ’($x_t$)์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ถœ๋ ฅ๊ฐ’ ($y_t$) ๊ณ„์‚ฐ
  • $ y_t = W_{hy}h_t $
  • ์•ž์„œ ๊ณ„์‚ฐํ•œ hidden state($h_t$) ์ด์šฉ

 

 

 

 

+) tanh๋Š” tangent hyperbolic ํ•จ์ˆ˜ : ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉ (๋น„์„ ํ˜•์„ฑ ์ถ”๊ฐ€)

 

 

 

[ ์‹œ๊ฐ„ ์ˆœ์œผ๋กœ ๋ณด๋Š” Vanilla RNN์˜ ์—ฐ์‚ฐ ๊ณผ์ • ]

  • ๋ชจ๋ธ์— ๋“ค์–ด์˜ค๋Š” ๊ฐ ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ $x_t$๋งˆ๋‹ค ์•ž์„œ ์„ค๋ช…ํ•œ ์—ฐ์‚ฐ ๊ณผ์ •์„ ์ˆ˜ํ–‰
  • ์ž…๋ ฅ๊ฐ’์— ๋”ฐ๋ผ ๋ฐ˜๋ณตํ•ด์„œ ์ถœ๋ ฅ๊ฐ’($y_n$)๊ณผ hidden state($h_n$)๋ฅผ ๊ณ„์‚ฐ
  • ์ด์ „ ์‹œ์ ์— ์ƒ์„ฑ๋œ hidden state๋ฅผ ๋‹ค์Œ ์‹œ์ ์— ์‚ฌ์šฉ -> recurrent
  • ์—ฌ๊ธฐ์„œ RNN ๋ชจ๋ธ์€ ๋™์ผํ•œ RNN ๋ชจ๋ธ - ์ž…๋ ฅ ์‹œ์ ($x_n$)์ด ๋‹ค๋ฆ„์„ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์˜†์œผ๋กœ ํŽผ์ณ์„œ ํ‘œํ˜„ํ–ˆ์Œ

 

  • Hidden state์˜ ์˜๋ฏธ
    • ํŠน์ • ์‹œ์  $t$๊นŒ์ง€ ๋“ค์–ด์˜จ ์ž…๋ ฅ๊ฐ’๋“ค์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋‚˜ ๊ฒฝํ–ฅ์„ฑ ์ •๋ณด๋ฅผ ์••์ถ•ํ•ด์„œ ์ €์žฅ
    • ๋ชจ๋ธ์ด ๋‚ด๋ถ€์ ์œผ๋กœ ๊ณ„์† ๊ฐ€์ง€๋Š” ๊ฐ’์ด๋ฏ€๋กœ ์ผ์ข…์˜ ๋ฉ”๋ชจ๋ฆฌ(Memory)๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ
  • Parameter Sharing
    • ๋ชจ๋“  ์‹œ์ ์—์„œ ๊ฐ™์€ RNN ๋ชจ๋ธ๊ณผ hidden state๋ฅผ ์‚ฌ์šฉ
    • Hidden state์™€ ์ถœ๋ ฅ๊ฐ’ ๊ณ„์‚ฐ์„ ์œ„ํ•œ FC Layer๋ฅผ ๋ชจ๋“  ์‹œ์ ์˜ ์ž…๋ ฅ๊ฐ’์ด ์žฌ์‚ฌ์šฉ
    • FC Layer ์„ธ ๊ฐœ๊ฐ€ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ „๋ถ€

 

 

[ Vanilla RNN์˜ ์ข…๋ฅ˜ ]

์‚ฌ์šฉํ•  ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’์˜ ๊ตฌ์„ฑ์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ RNN์ด ์กด์žฌ

  • many-to-one : ํ•œ ์‹œ์ ์˜๊ฐ’ ์ถœ๋ ฅ๊ฐ’๋งŒ ์‚ฌ์šฉ
  • many-to-many : ์—ฌ๋Ÿฌ ์‹œ์ ์˜ ์ž…๋ ฅ๊ฐ’๊ณผ ์—ฌ๋Ÿฌ ์‹œ์ ์˜ ์ถœ๋ ฅ๊ฐ’์„ ์‚ฌ์šฉ
                                            ( ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’์— ์‚ฌ์šฉํ•˜๋Š” ์‹œ์ ์˜ ๊ฐœ์ˆ˜๋Š” ๊ฐ™์„ ์ˆ˜๋„ ์žˆ๊ณ  ๋‹ค๋ฅผ ์ˆ˜๋„ ์žˆ์Œ ) 

 

 

  • Encoder-Decoder : ์ž…๋ ฅ๊ฐ’๋“ค์„ ๋ฐ›์•„ ํŠน์ • hidden state๋กœ ์ธ์ฝ”๋”ฉํ•œ ํ›„, ์ด hidden state๋กœ ์ƒˆ๋กœ์šด ์ถœ๋ ฅ๊ฐ’ ์ƒ์„ฑํ•˜๋Š” ๊ตฌ์กฐ

 

 

 

[ Vanilla RNN์˜ ๋ฌธ์ œ์  ]

 

  • RNN์€ ์ถœ๋ ฅ๊ฐ’์ด ์‹œ๊ฐ„ ์ˆœ์„œ์— ๋”ฐ๋ผ ์ƒ์„ฑ
  • ๊ฐ ์‹œ์ ์˜ ์ถœ๋ ฅ๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์„ ๋น„๊ตํ•˜์—ฌ ์†์‹ค(Loss)๊ฐ’ ๊ณ„์‚ฐ (์‹œ์ ๋งˆ๋‹ค ์†์‹ค๊ฐ’ ๊ณ„์‚ฐ)
  • ์—ญ์ „ํŒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์‹œ๊ฐ„์— ๋”ฐ๋ผ ์ž‘๋™ → Back-propagation Through Time (BPTT)
  • ์ž…๋ ฅ๊ฐ’์˜ ๊ธธ์ด๊ฐ€ ๋งค์šฐ ๊ธธ์–ด์งˆ ๊ฒฝ์šฐ(์‹œ์ ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์„ ๊ฒฝ์šฐ) -> ์ดˆ๊ธฐ ์ž…๋ ฅ๊ฐ’๊ณผ ๋‚˜์ค‘ ์ถœ๋ ฅ๊ฐ’ ์‚ฌ์ด์— ์ „ํŒŒ๋˜๋Š” ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด ๋งค์šฐ ์ž‘์•„์งˆ ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ
  • ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค(Vanishing Gradient)๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฌ์›€ → ์žฅ๊ธฐ ์˜์กด์„ฑ(Long-term Dependency)์„ ๋‹ค๋ฃจ๊ธฐ๊ฐ€ ์–ด๋ ค์›€
  • RNN์˜ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด LSTM, GRU ๋“ฑ์˜ ๋ชจ๋ธ์ด ์ œ์•ˆ๋จ

 

 

 

[ Vanilla RNN ๋ถ„๋ฅ˜ ๋ชจ๋ธ ๊ตฌํ˜„ - 1๊ฐœ์˜ SimpleRNN layer๋กœ ๊ตฌํ˜„ ]

  • Tensorflow์—์„œ๋Š” Vanilla RNN์ด SimpleRNN๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์Œ
  • Vanilla RNN์œผ๋กœ IMDb ๋ฐ์ดํ„ฐ ํ•™์Šตํ•˜๊ธฐ
    • ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ ์…‹ : IMDb : ์˜ํ™” ์ •๋ณด, ์‚ฌ์šฉ์ž๋“ค์˜ ๋ฆฌ๋ทฐ๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ
    • ์Šคํƒ ํฌ๋“œ ๋Œ€ํ•™์—์„œ ์‚ฌ์šฉ์ž ๋ฆฌ๋ทฐ๋ฅผ ๋ณ„์  ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฆฌ๋ทฐ๊ฐ€ ๊ธ์ •์ ์ธ์ง€ ๋ถ€์ •์ ์ธ์ง€ ๋ถ„์„ํ•˜์—ฌ ํด๋ž˜์Šค๊ฐ€ ๋‘๊ฐœ์ธ ๋ฐ์ดํ„ฐ ์…‹์„ ๊ตฌ์„ฑํ•จ
  • ๋ชจ๋ธ์—์„œ ๋งˆ์ง€๋ง‰ dense layer์˜ ๋…ธ๋“œ ๊ฐœ์ˆ˜๊ฐ€ 1๊ฐœ๋กœ ์„ค์ • -> ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ sigmoid์ด๋ฏ€๋กœ 0~1 ๊ฐ’์ด ๋‚˜์˜ด, ์ด๋Š” ํ™•๋ฅ ๋กœ ํ•ด์„ ๊ฐ€๋Šฅํ•จ
        
import os
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

def load_data(num_words, max_len): # imdb ๋ฐ์ดํ„ฐ์…‹์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
    (X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=num_words) # num_words: ํ™œ์šฉํ•  ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜

    # ๋‹จ์–ด๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๋ฌธ์žฅ์˜ ๋‹จ์–ด ๊ฐœ์ˆ˜๋กœ ํ†ต์ผ -> maxlen์˜ ์ˆ˜๋งŒํผ padding(0์œผ๋กœ) ์ถ”๊ฐ€
    X_train = pad_sequences(X_train, maxlen=max_len) 
    X_test = pad_sequences(X_test, maxlen=max_len)
    
    return X_train, X_test, y_train, y_test

' Vanilla RNN ๋ชจ๋ธ ๊ตฌํ˜„ '
def build_rnn_model(num_words, embedding_len):
    model = Sequential()
    
    model.add(layers.Embedding(input_dim=num_words, output_dim=embedding_len))
    model.add(layers.SimpleRNN(units=16)) # hidden state์˜ ํฌ๊ธฐ
    model.add(layers.Dense(units=1, activation='sigmoid')) # ๋ถ„๋ฅ˜
    # ์™œ dense layer์˜ ๋…ธ๋“œ ๊ฐœ์ˆ˜๊ฐ€ 1๊ฐœ ์ธ์ง€? => sigmoid์ด๋ฏ€๋กœ 0~1 ๊ฐ’์ด ๋‚˜์˜ด, ์ด๋Š” ํ™•๋ฅ ๋กœ ํ•ด์„ ๊ฐ€๋Šฅํ•จ
    
    return model

def main(model=None, epochs=5):
    # IMDb ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ€์ ธ์˜ฌ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜
    num_words = 6000
    
    # ๊ฐ ๋ฌธ์žฅ์ด ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ ๋‹จ์–ด ๊ฐœ์ˆ˜
    max_len = 130
    
    # ์ž„๋ฒ ๋”ฉ ๋œ ๋ฒกํ„ฐ์˜ ๊ธธ์ด
    embedding_len = 100
    
    X_train, X_test, y_train, y_test = load_data(num_words, max_len)
    
    if model is None:
        model = build_rnn_model(num_words, embedding_len)
    
    # ๋ชจ๋ธ ์ตœ์ ํ™”
    optimizer = Adam(learning_rate=0.001)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    
    # ๋ชจ๋ธ ํ›ˆ๋ จ
    hist = model.fit(X_train, y_train, epochs=epochs,batch_size=100,validation_split=0.2,shuffle=True,verbose=2)
    
    # ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
    print()
    print("ํ…Œ์ŠคํŠธ Loss: {:.5f}, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„: {:.3f}%".format(test_loss, test_acc * 100))
    
    return optimizer, hist

if __name__=="__main__":
    main()

 

[ ์ฝ”๋“œ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ ] 

ํ…Œ์ŠคํŠธ Loss: 0.49907, ํ…Œ์ŠคํŠธ ์ •ํ™•๋„: 81.984%

 

 

 

[ Vanilla RNN ์˜ˆ์ธก ๋ชจ๋ธ ๊ตฌํ˜„ - 1๊ฐœ์˜ SimpleRNN layer๋กœ ๊ตฌํ˜„ ]

  • ํ•ญ๊ณต ์Šน๊ฐ ์ˆ˜ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์›”๋ณ„๋กœ ํ•ญ๊ณต๊ธฐ๋ฅผ ์ด์šฉํ•˜๋Š” ์Šน๊ฐ ์ˆ˜ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ ์ƒ์„ฑ
  • ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ์…‹ : 1949๋…„ 1์›”๋ถ€ํ„ฐ 1960๋…„ 12์›”๊นŒ์ง€ ํ•ญ๊ณต๊ธฐ ์ด์šฉ ์Šน๊ฐ ์ˆ˜๋ฅผ ์›”๋ณ„๋กœ ๊ธฐ๋กํ•œ ๋ฐ์ดํ„ฐ์…‹ (์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ)
  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ๋•Œ๋Š” window size๋ผ๋Š” ๊ฐœ๋… ์‚ฌ์šฉ
    • window size : ๋ชจ๋ธ์„ ํ•œ๋ฒˆ ํ•™์Šตํ•  ๋•Œ ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธ
    • ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์ด 10๊ฐœ์˜ ๋ฐ์ดํ„ฐ์—์„œ 4๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ๋ฒˆ ํ•™์Šต์— ์‚ฌ์šฉํ•œ๋‹ค๋ฉด window size๋Š” 4

 

import os
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def load_data(window_size):
    raw_data = pd.read_csv("./airline-passengers.csv")
    raw_passengers = raw_data["Passengers"].to_numpy()

    # ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’์œผ๋กœ ์ •๊ทœํ™”(ํ‘œ์ค€ํ™”)
    mean_passenger = raw_passengers.mean()
    stdv_passenger = raw_passengers.std(ddof=0)
    raw_passengers = (raw_passengers - mean_passenger) / stdv_passenger
    data_stat = {"month": raw_data["Month"], "mean": mean_passenger, "stdv": stdv_passenger}

    ''' ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์…‹ ๊ตฌ์„ฑ '''
    # window_size๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ(X)๋กœ ์„ค์ •ํ•˜๊ณ 
    # window_size๋ณด๋‹ค ํ•œ ์‹œ์  ๋’ค์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธกํ•  ๋Œ€์ƒ(y)์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ
    X, y = [], []
    for i in range(len(raw_passengers) - window_size):
        cur_passenger = raw_passengers[i:i + window_size]
        target = raw_passengers[i + window_size]

        X.append(list(cur_passenger))
        y.append(target)

    X = np.array(X)
    y = np.array(y)

    # ๊ฐ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” sequence ๊ธธ์ด๊ฐ€ window_size์ด๊ณ , featuer ๊ฐœ์ˆ˜๋Š” 1๊ฐœ๊ฐ€ ๋˜๋„๋ก ๋งˆ์ง€๋ง‰์— ์ƒˆ๋กœ์šด ์ฐจ์› ์ถ”๊ฐ€
    # ์ฆ‰, (์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜, window_size) -> (์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜, window_size, 1)์ด ๋˜๋„๋ก ๋ณ€ํ™˜
    X = X[:, :, np.newaxis]

    # ํ•™์Šต ๋ฐ์ดํ„ฐ 80%, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ 20%
    total_len = len(X)
    train_len = int(total_len * 0.8)

    X_train, y_train = X[:train_len], y[:train_len]
    X_test, y_test = X[train_len:], y[train_len:]

    return X_train, X_test, y_train, y_test, data_stat

''' Vanilla RNN ๋ชจ๋ธ ๊ตฌํ˜„ '''
def build_rnn_model(window_size):
    model = Sequential()

    model.add(layers.SimpleRNN(units=4,input_shape=(window_size, 1))) # ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹ˆ๋ผ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์ด๋ฏ€๋กœ embedding layer ์—†์Œ -> ๋”ฐ๋ผ์„œ input_shape์„ ์•Œ๋ ค์ค˜์•ผ ํ•  ํ•„์š”์žˆ์Œ
    model.add(layers.Dense(units=1)) # ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์‚ฌ์šฉ x ->๋ชจ๋ธ์˜ ์ถœ๋ ฅ๊ฐ’ ์ž์ฒด๋ฅผ ํ™•๋ฅ ๋กœ ์‚ฌ์šฉ

    return model
    
def plot_result(X_true, y_true, y_pred, data_stat):
    # ํ‘œ์ค€ํ™”๋œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์‹œ ์›๋ž˜ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜
    y_true_orig = (y_true * data_stat["stdv"]) + data_stat["mean"]
    y_pred_orig = (y_pred * data_stat["stdv"]) + data_stat["mean"]

    # ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์‚ฌ์šฉํ•œ ๋‚ ์งœ๋“ค๋งŒ ๊ฐ€์ ธ์˜ด
    test_month = data_stat["month"][-len(y_true):]

    # ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’, ์‹ค์ œ๊ฐ’ ๊ทธ๋ž˜ํ”„
    fig = plt.figure(figsize=(8, 6))
    ax = plt.gca()
    ax.plot(y_true_orig, color="b", label="True")
    ax.plot(y_pred_orig, color="r", label="Prediction")
    ax.set_xticks(list(range(len(test_month))))
    ax.set_xticklabels(test_month, rotation=45)
    ax.set_title("RNN Result")
    ax.legend(loc="upper left")
    plt.savefig("airline_rnn.png")
    elice_utils.send_image("airline_rnn.png")

def main(model=None, epochs=100):
    tf.random.set_seed(2022)

    window_size = 4
    X_train, X_test, y_train, y_test, data_stat = load_data(window_size)

    if model is None:
        model = build_rnn_model(window_size)

    # ๋ชจ๋ธ ์ตœ์ ํ™”
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss='MeanSquaredError')

    # ๋ชจ๋ธ ํ•™์Šต
    hist = model.fit(X_train, y_train, batch_size=8,epochs=epochs,shuffle=True,verbose=2)
    
    # ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
    test_loss = model.evaluate(X_test, y_test, verbose=0)
    print()
    print("ํ…Œ์ŠคํŠธ MSE: {:.5f}".format(test_loss))
    print()
    
    y_pred = model.predict(X_test)
    plot_result(X_test, y_test, y_pred, data_stat)

    return optimizer, hist

if __name__ == "__main__":
    main()

 

 

 

[ Deep Vanilla RNN ๋ชจ๋ธ ๊ตฌํ˜„ -  ์—ฌ๋Ÿฌ๊ฐœ์˜ SimpleRNN layer๋กœ ๊ตฌํ˜„ ]

  • SimpleRNN ๋˜ํ•œ Convolutional Layer ์ฒ˜๋Ÿผ ํ•˜๋‚˜์˜ Layer ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ์ธต์œผ๋กœ ์Œ“์„ ์ˆ˜ ์žˆ์Œ
  • ์—ฌ๋Ÿฌ SimpleRNN ์ธต์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ชจ๋ธ -> ์‹ฌ์ธต RNN(Deep RNN) ๋ชจ๋ธ
  • ์‹คํ—˜์—์„œ SimpleRNN์ด ํ•˜๋‚˜๋กœ ์ด๋ฃจ์–ด์ง„ ๋ชจ๋ธ๊ณผ ๋‘๊ฐœ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ๋น„๊ต
    • ๋ฐ์ดํ„ฐ์…‹ : numpy๋ฅผ ์ด์šฉํ•˜์—ฌ 2๊ฐœ์˜ sin ํ•จ์ˆ˜๋ฅผ ์กฐํ•ฉํ•œ ๊ฐ„๋‹จํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ
    • Window Size๋Š” 50, ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ Mean Squared Error(MSE) ์ ์ˆ˜๋กœ ํ™•์ธ
  • Deep Vanilla RNN์€ many-to-many RNN ๊ตฌ์กฐ ์‚ฌ์šฉ -> ์ƒ์„ฑ๋œ N๊ฐœ์˜ output์ด ๋‹ค์Œ RNN layer์˜ ์ž…๋ ฅ์œผ๋กœ ์“ฐ์ด๊ธฐ ๋•Œ๋ฌธ
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam
import numpy as np

def load_data(num_data, window_size):
    freq1, freq2, offsets1, offsets2 = np.random.rand(4, num_data, 1)
    ## freq1, freq2 : 2๊ฐœ์˜ sinํ•จ์ˆ˜๊ฐ’
    ## offset : sin ํ•จ์ˆ˜๊ฐ€ ์‹œ์ž‘๋ ๋•Œ๊ฐ€์ง€์˜ ์ง€์—ฐ ์‹œ๊ฐ„

    time = np.linspace(0, 1, window_size + 1)
    series = 0.5 * np.sin((time - offsets1) * (freq1 * 10 + 10))
    series += 0.1 * np.sin((time - offsets2) * (freq2 * 10 + 10)) # sin ํ•จ์ˆ˜๋ฅผ 2๊ฐœ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ•ฉ์ณ์คŒ
    series += 0.1 * (np.random.rand(num_data, window_size + 1) - 0.5)
    
    num_train = int(num_data * 0.8)
    X_train, y_train = series[:num_train, :window_size], series[:num_train, -1]
    X_test, y_test = series[num_train:, :window_size], series[num_train:, -1]
    
    X_train = X_train[:, :, np.newaxis]
    X_test = X_test[:, :, np.newaxis]
    
    return X_train, X_test, y_train, y_test

''' 1๊ฐœ์˜ SimpleRNN layer๋ฅผ ๊ฐ€์ง€๋Š” RNN ๋ชจ๋ธ '''
def build_rnn_model(window_size):
    model = Sequential()

    model.add(layers.SimpleRNN(units=20,input_shape=(window_size, 1)))
    model.add(layers.Dense(units=1))

    return model

''' 2๊ฐœ์˜ SimpleRNN layer๋ฅผ ๊ฐ€์ง€๋Š” Deep RNN ๋ชจ๋ธ '''
def build_deep_rnn_model(window_size):
    model = Sequential()

    # return sequences : RNN์˜ ์ข…๋ฅ˜ ์ค‘ many-to-many๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ๋‹ค๋Š” ์˜๋ฏธ
    # ์‹ฌ์ธต RNN ์ด๋ฏ€๋กœ many-to-many RNN ๋ชจ๋ธ ์ƒ์„ฑ (์ƒ์„ฑ๋œ 20๊ฐœ์˜ output์ด ๋‹ค์Œ simple RNN์˜ ์ž…๋ ฅ์œผ๋กœ ์“ฐ์ž„)
    model.add(layers.SimpleRNN(units=20,return_sequences=True,input_shape=(window_size, 1))) 
    model.add(layers.SimpleRNN(units=20))
    model.add(layers.Dense(units=1))

    return model

def run_model(model, X_train, X_test, y_train, y_test, epochs=20, name=None):
    # ๋ชจ๋ธ ์ตœ์ ํ™”
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss='mse')

    # ๋ชจ๋ธ ํ•™์Šต
    hist = model.fit(X_train, y_train,epochs=epochs,batch_size=256,shuffle=True,verbose=2)
    
    # ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
    test_loss = model.evaluate(X_test, y_test, verbose=0)
    print("[{}] ํ…Œ์ŠคํŠธ MSE: {:.5f}".format(name, test_loss))
    print()

    return optimizer, hist
    
def main():
    tf.random.set_seed(2022)
    np.random.seed(2022)

    window_size = 50
    X_train, X_test, y_train, y_test = load_data(10000, window_size)

    rnn_model = build_rnn_model(window_size)
    run_model(rnn_model, X_train, X_test, y_train, y_test, name="RNN")

    deep_rnn_model = build_deep_rnn_model(window_size)
    run_model(deep_rnn_model, X_train, X_test, y_train, y_test, name="Deep RNN")


if __name__ == "__main__":
    main()

 

 

 

 

[ SimpleRNN์„ ์‚ฌ์šฉํ•˜์—ฌ Encoder-Decoder ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋Š” ๋ชจ๋ธ ๊ตฌํ˜„ ]

  • Encoder์—์„œ ๋‚˜์˜ค๋Š” ์ถœ๋ ฅ๊ฐ’์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , Encoder์˜ hidden state๋งŒ ๊ฐ€์ ธ์™€์„œ Decoder์˜ ์ดˆ๊ธฐ hidden state๋กœ ํ™œ์šฉ
  • __call__ ๋ฉ”์†Œ๋“œ :  ์‹ค์ œ ์ž…๋ ฅ๊ฐ’์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๋ชจ๋ธ์˜ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ•จ์ˆ˜
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras import layers, Sequential, Input


class EncoderDecoder(Model):
    def __init__(self, hidden_dim, encoder_input_shape, decoder_input_shape, num_classes):
        super(EncoderDecoder, self).__init__()
        
        # SimpleRNN์œผ๋กœ ์ด๋ฃจ์–ด์ง„ Encoder
        self.encoder = layers.SimpleRNN(units=hidden_dim,return_state=True,input_shape=encoder_input_shape)
        # return state: hidden state๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ
                                        
        # SimpleRNN์œผ๋กœ ์ด๋ฃจ์–ด์ง„ Decoder
        self.decoder = layers.SimpleRNN(units=hidden_dim,return_sequences=True,input_shape=decoder_input_shape)
        # decoder๋„ input shape ์ง€์ •ํ•  ํ•„์š” ์žˆ์Œ
        
        self.dense = layers.Dense(num_classes, activation="softmax")
        
    def call(self, encoder_inputs, decoder_inputs):
        # Encoder์— ์ž…๋ ฅ๊ฐ’์„ ๋„ฃ์–ด Decoder์˜ ์ดˆ๊ธฐ state๋กœ ์‚ฌ์šฉํ•  hidden state ๋ฐ˜ํ™˜
        encoder_outputs, encoder_state = self.encoder(encoder_inputs)
        
        # Decoder์— ์ž…๋ ฅ๊ฐ’์„ ๋„ฃ๊ณ , ์ดˆ๊ธฐ state๋Š” Encoder์—์„œ ์–ป์–ด๋‚ธ state(hidden state)๋กœ ์„ค์ •
        decoder_outputs = self.decoder(decoder_inputs, initial_state = [encoder_state])
        
        outputs = self.dense(decoder_outputs)
        
        return outputs


def main():
    # hidden state์˜ ํฌ๊ธฐ
    hidden_dim = 20
    
    # Encoder์— ๋“ค์–ด๊ฐˆ ๊ฐ ๋ฐ์ดํ„ฐ์˜ ๋ชจ์–‘
    encoder_input_shape = (10, 1) # encoder์— ๋“ค์–ด๊ฐˆ ๊ฐ ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ - sequence ๊ธธ์ด :10, ๊ฐ ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ ๊ธธ์ด : 1
    
    # Decoder์— ๋“ค์–ด๊ฐˆ ๊ฐ ๋ฐ์ดํ„ฐ์˜ ๋ชจ์–‘
    decoder_input_shape = (30, 1)
    
    # ๋ถ„๋ฅ˜ํ•œ ํด๋ž˜์Šค ๊ฐœ์ˆ˜
    num_classes = 5

    model = EncoderDecoder(hidden_dim, encoder_input_shape, decoder_input_shape, num_classes)
    
    # ๋ชจ๋ธ์— ๋„ฃ์–ด์ค„ ๊ฐ€์ƒ์˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
    encoder_x, decoder_x = tf.random.uniform(shape=encoder_input_shape), tf.random.uniform(shape=decoder_input_shape)
    encoder_x, decoder_x = tf.expand_dims(encoder_x, axis=0), tf.expand_dims(decoder_x, axis=0)
    y = model(encoder_x, decoder_x)

    # ๋ชจ๋ธ ์ •๋ณด ์ถœ๋ ฅ
    model.summary()

if __name__ == "__main__":
    main()

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ] 

 

Comments