๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

[๋”ฅ๋Ÿฌ๋‹] 9. LSTM, GRU ๋ณธ๋ฌธ

๐Ÿค– AI/Deep Learning

[๋”ฅ๋Ÿฌ๋‹] 9. LSTM, GRU

jimingee 2024. 7. 10. 15:56

 

Vanilla RNN์˜ ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ชจ๋ธ์ธ LSTM๊ณผ  GRU์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ๋‹ค.

LSTM๊ณผ GRU๋Š” ๋‚ด๋ถ€ ์—ฐ์‚ฐ ๋ฐฉ์‹๋งŒ Vanilla RNN๊ณผ ๋‹ค๋ฅด๋‹ค. ์ฆ‰, ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’์„ Vanilla RNN์™€ ๋™์ผํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

 

[ ๋ชฉ์ฐจ ]

1. LSTM ์†Œ๊ฐœ

2. GRU ์†Œ๊ฐœ

3. RNN ๋ชจ๋ธ์˜ ํ™œ์šฉ


1. LSTM

  • Vanilla RNN์˜ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ๋“ฑ์žฅ
  • Long Short Term Memory(์žฅ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ)์˜ ์•ฝ์ž → ์žฅ๊ธฐ ์˜์กด์„ฑ๊ณผ ๋‹จ๊ธฐ ์˜์กด์„ฑ์„ ๋ชจ๋‘ ๊ธฐ์–ตํ•  ์ˆ˜ ์žˆ์Œ
  • ์ƒˆ๋กœ ๊ณ„์‚ฐ๋œ hidden state $h_t$ ๋ฅผ ์ถœ๋ ฅ๊ฐ’ $y_t$ ์œผ๋กœ๋„ ์‚ฌ์šฉ
  • LSTM์˜ ๊ตฌ์„ฑ์š”์†Œ : cell state, forget gate, input gate, output gate

 

 

LSTM์˜ ๊ตฌ์„ฑ์š”์†Œ

 

Cell state

  • ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•ต์‹ฌ ์žฅ์น˜
  • ์žฅ๊ธฐ์ ์œผ๋กœ ๊ธฐ์–ตํ•  ์ •๋ณด๋ฅผ ์กฐ์ ˆ

 

 

 

Gate

3์ข…๋ฅ˜์˜ ๊ฒŒ์ดํŠธ๋ฅผ 4๊ฐœ์˜ FC Layer๋กœ ๊ตฌ์„ฑ

  • $ W_f $: ๋ง๊ฐ๊ฒŒ์ดํŠธ(Forget Gate)
  • $ W_i, W_C $: ์ž…๋ ฅ๊ฒŒ์ดํŠธ(Input Gate)
  • $ W_o $: ์ถœ๋ ฅ๊ฒŒ์ดํŠธ(Output Gate)

 

Forget Gate

  • ๊ธฐ์กด cell state์—์„œ ์žŠ์„ ์ •๋ณด๋ฅผ ๊ฒฐ์ •
  • $f_t =  \sigma (W_f[h_{t-1}, x_t]$
    • $ \sigma $ : sigmoid ํ•จ์ˆ˜
    • $ [h_{t-1}, x_t] $: $ h_{t-1}$ ๋ฒกํ„ฐ์™€ $ x_t $ ๋ฒกํ„ฐ๋ฅผ concatenateํ•˜๋Š”์—ฐ์‚ฐ

 

Input Gate

  • ํ˜„์žฌ ์ž…๋ ฅ ๋ฐ›์€ ์ •๋ณด์—์„œ cell state์—
    ์ €์žฅํ•  ์ •๋ณด ๊ฒฐ์ •
  • $ i_t = \sigma(W_i [h_{t-1}, x_t]) $
  • $ \tilde{ C } = tanh(W_c[h_{t-1}, x_t ]) $

 

์ƒˆ๋กœ์šด Cell state

  • Forget Gate์™€ Input Gate์˜ ์ •๋ณด๋ฅผ
    ํ†ตํ•ด cell state ๊ฐฑ์‹ 
  • $ C_t = f_t * C_{t-1} + i_t * \tilde{C}_t $
    • * ์—ฐ์‚ฐ์ž๋Š” ๋ฒกํ„ฐ์˜ ๊ฐ ์›์†Œ๋ณ„๋กœ ๊ณฑํ•˜๋Š” ์—ฐ์‚ฐ (Hadamard Product)
    • ์˜ˆ) [1 2 3] * [4 5 6] = [4 10 18]

 

Output Gate

  • ๋‹ค์Œ hidden state $ h_t $์™€ ์ถœ๋ ฅ๊ฐ’ $ y_t $์„ ๊ณ„์‚ฐ 
    (์ƒˆ๋กœ ๊ณ„์‚ฐ๋œ cell state ์‚ฌ์šฉ)
  • $ \o_t = \sigma(W_o[h_{t-1}, x_t]) $
  • $ h_t = o_t * tanh(C_t) = y_t $

 


2. GRU

  • Gated Recurrent Unit์˜ ์•ฝ์ž
  • LSTM์„ ๊ฐœ๋Ÿ‰ํ•œ ๋ชจ๋ธ
  • LSTM๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ƒˆ๋กœ ๊ณ„์‚ฐ๋œ hidden state $h_t$ ๋ฅผ ์ถœ๋ ฅ๊ฐ’ $ y_t $ ์œผ๋กœ๋„ ์‚ฌ์šฉ 
  • LSTM์ด ๊ฐ€์ง€๋Š” 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ๋ฅผ 2๊ฐœ๋กœ ๊ฐ„์†Œํ™”ํ•˜๊ณ  Cell State๋ฅผ ์—†์•ฐ
    • ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜์—ฌ LSTM๋ณด๋‹ค ๋น ๋ฅธ ํ•™์Šต ์†๋„ ๊ฐ€์ง
    • ๊ทธ๋Ÿผ์—๋„ ์„ฑ๋Šฅ์€ ์ผ๋ฐ˜์ ์œผ๋กœ LSTM๊ณผ ๋น„์Šทํ•œ ์ˆ˜์ค€

 

 

 

GRU์˜ ๊ตฌ์„ฑ์š”์†Œ

  • 2์ข…๋ฅ˜์˜ ๊ฒŒ์ดํŠธ๋ฅผ 2๊ฐœ์˜ FC Layer๋กœ ๊ตฌ์„ฑ
  • $ W_r $ : ๋ฆฌ์…‹ ๊ฒŒ์ดํŠธ(Reset Gate)
  • $ W_z $ : ์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ(Update Gate)

 

 

 

Reset Gate

  • ๊ธฐ์กด hidden state์˜ ์ •๋ณด๋ฅผ ์–ผ๋งˆ๋‚˜ ์ดˆ๊ธฐํ™”ํ• ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒŒ์ดํŠธ
  • $ r_t = \sigma(W_r [h_{t-1}, x_t] ) $

  

 

 

Update Gate

  • ๊ธฐ์กด hidden state์˜ ์ •๋ณด๋ฅผ ์–ผ๋งˆ๋‚˜ ์‚ฌ์šฉํ• ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒŒ์ดํŠธ
  • $ z_t = \sigma(W_x[h_{t-1}, x_t] ) $

 

 

 

 

 

์ƒˆ๋กœ์šด hidden state ๊ณ„์‚ฐ ( 2 step์œผ๋กœ ์„ค๋ช…)

 

Step 1)

  • Reset Gate์˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ƒˆ๋กœ์šด hidden state์˜ ํ›„๋ณด $\tilde{h}_t $ ๊ณ„์‚ฐ
  • $ \tilde{h}_t = tanh (W_i [r_t * h_{t-1}, x_t] ) $
    • *๋Š” hadamard product
    • $ r_t * h_{t-1} $ ๋ถ€๋ถ„์ด ์ ์„ ์„ ํ†ตํ•ด $ W_i $๋กœ ์ „๋‹ฌ๋จ

 

Step 2)

  • Update Gate์˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ƒˆ๋กœ์šด hidden state ๊ณ„์‚ฐ
  • $ h_t = (1-z_t) * h_{t-1} + z_t * \tilde{h}_t $
  • Update Gate์˜ ์ •๋ณด $ z_t $๊ฐ€ ์ƒˆ๋กœ์šด hidden state ์ •๋ณด $ \tilde{h}_t $ , ์ด์ „ hidden state ์ •๋ณด $ h_{t-1} $๋ฅผ ์–ผ๋งˆ๋‚˜ ์‚ฌ์šฉํ•  ์ง€ ๊ฒฐ์ •ํ•จ
  • Update Gate์˜ ์ •๋ณด๋งŒ์ด ์ƒˆ๋กœ์šด hidden state ๊ณ„์‚ฐ์— ์‚ฌ์šฉ๋จ
    • GRU์˜ Update Gate๊ฐ€ LSTM์˜ Forget Gate์™€ Input Gate๋ฅผ ํ•˜๋‚˜๋กœ ํ•ฉ์นœ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•œ ์—ญํ• 

 

 

[ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ ํ™•์ธ - Vanilla RNN vs LSTM vs GRU ๋น„๊ต ]

  • RNN์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์™€ ๊ฐ™์€ ๊ฒฝํ–ฅ์„ฑ์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ํ›Œ๋ฅญํ•œ ์„ฑ๋Šฅ ์ œ๊ณต. ํ•˜์ง€๋งŒ ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ธธ์ด(sequence)๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ํ•™์Šต ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๋ฌธ์ œ ์žˆ์Œ.
  • ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’ ์‚ฌ์ด์— ์—ญ์ „ํŒŒ ๋˜๋Š” ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด ์ ์  0์— ์ˆ˜๋ ดํ•˜๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ ๋•Œ๋ฌธ, ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ธธ์ด๊ฐ€ ๊ธด ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์—์„œ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ์•ฝ์ ์ด ์žˆ์Œ -> ์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด LSTM, GRU ๋ชจ๋ธ ์ œ์•ˆ๋จ
  • ์‹œ๊ณ„์—ญ ๋ฐ์ดํ„ฐ ํšŒ๊ท€ ๋ฌธ์ œ
  • ๋ฐ์ดํ„ฐ ์…‹ : 1980.01.01 ~ 1990.12.31์˜ ๋ฉœ๋ฒ„๋ฅธ ์ง€์—ญ ์ตœ์ € ๊ธฐ์˜จ ๋ฐ์ดํ„ฐ ์…‹
  • SimpleRNN, LSTM, GRU๋Š” ์ž…๋ ฅ ํ˜•ํƒœ๊ฐ€ ๋™์ผํ•จ
 
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam

import pandas as pd
import numpy as np

def load_data(window_size):
    raw_data = pd.read_csv("./daily-min-temperatures.csv")
    raw_temps = raw_data["Temp"]

    mean_temp = raw_temps.mean()
    stdv_temp = raw_temps.std(ddof=0)
    raw_temps = (raw_temps - mean_temp) / stdv_temp # ๋ฐ์ดํ„ฐ ์ •๊ทœํ™”

    # window size๋งŒํผ x,y ๊ฐ’ ์ €์žฅ
    X, y = [], []
    for i in range(len(raw_temps) - window_size):
        cur_temps = raw_temps[i:i + window_size]
        target = raw_temps[i + window_size]

        X.append(list(cur_temps))
        y.append(target)

    X = np.array(X)
    y = np.array(y)
    X = X[:, :, np.newaxis]

    total_len = len(X)
    train_len = int(total_len * 0.8)

    X_train, y_train = X[:train_len], y[:train_len]
    X_test, y_test = X[train_len:], y[train_len:]

    return X_train, X_test, y_train, y_test


''' 1. SimpleRNN + Fully-connected Layer ๋ชจ๋ธ '''
def build_rnn_model(window_size):
    model = Sequential()

    model.add(layers.SimpleRNN(units=128, input_shape=(window_size,1)))
    model.add(layers.Dense(units=32, activation='relu'))
    model.add(layers.Dense(units=1))

    return model


''' 2. LSTM + Fully-connected Layer ๋ชจ๋ธ '''
def build_lstm_model(window_size):
    model = Sequential()

    model.add(layers.LSTM(units=128, input_shape=(window_size,1)))
    model.add(layers.Dense(units=32, activation='relu'))
    model.add(layers.Dense(units=1))

    return model

''' 3. GRU + Fully-connected Layer ๋ชจ๋ธ '''
def build_gru_model(window_size):
    model = Sequential()

    model.add(layers.GRU(units=128, input_shape=(window_size,1)))
    model.add(layers.Dense(units=32, activation='relu'))
    model.add(layers.Dense(units=1))

    return model

def run_model(model, X_train, X_test, y_train, y_test, epochs=10, model_name=None):
    # ๋ชจ๋ธ ์ตœ์ ํ™”
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss='mse')

    # ๋ชจ๋ธ ํ•™์Šต(hyperparameter ์„ค์ •)
    hist = model.fit(X_train, y_train, batch_size=64, epochs=epochs, shuffle=True, verbose=2)
    
    # ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
    test_loss = model.evaluate(X_test, y_test, verbose=0)
    
    return test_loss, optimizer, hist

def main(window_size):
    tf.random.set_seed(2022)
    X_train, X_test, y_train, y_test = load_data(window_size)

    rnn_model = build_rnn_model(window_size)
    lstm_model = build_lstm_model(window_size)
    gru_model = build_gru_model(window_size)

    rnn_test_loss, _, _ = run_model(rnn_model, X_train, X_test, y_train, y_test, model_name="RNN")
    lstm_test_loss, _, _ = run_model(lstm_model, X_train, X_test, y_train, y_test, model_name="LSTM")
    gru_test_loss, _, _ = run_model(gru_model, X_train, X_test, y_train, y_test, model_name="GRU")
    
    return rnn_test_loss, lstm_test_loss, gru_test_loss

if __name__ == "__main__":
    # 10์ผ์น˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ณ  ๋‹ค์Œ๋‚ ์˜ ๊ธฐ์˜จ ์˜ˆ์ธก -> window size = 10
    rnn_10_test_loss, lstm_10_test_loss, gru_10_test_loss = main(10)
    
    # 300์ผ์น˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ณ  ๋‹ค์Œ๋‚ ์˜ ๊ธฐ์˜จ ์˜ˆ์ธก -> window size = 300
    rnn_300_test_loss, lstm_300_test_loss, gru_300_test_loss = main(300)
    
    print("=" * 20, "์‹œ๊ณ„์—ด ๊ธธ์ด๊ฐ€ 10 ์ธ ๊ฒฝ์šฐ", "=" * 20)
    print("[RNN ] ํ…Œ์ŠคํŠธ MSE = {:.5f}".format(rnn_10_test_loss))
    print("[LSTM] ํ…Œ์ŠคํŠธ MSE = {:.5f}".format(lstm_10_test_loss))
    print("[GRU ] ํ…Œ์ŠคํŠธ MSE = {:.5f}".format(gru_10_test_loss))
    print()
    
    print("=" * 20, "์‹œ๊ณ„์—ด ๊ธธ์ด๊ฐ€ 300 ์ธ ๊ฒฝ์šฐ", "=" * 20)
    print("[RNN ] ํ…Œ์ŠคํŠธ MSE = {:.5f}".format(rnn_300_test_loss))
    print("[LSTM] ํ…Œ์ŠคํŠธ MSE = {:.5f}".format(lstm_300_test_loss))
    print("[GRU ] ํ…Œ์ŠคํŠธ MSE = {:.5f}".format(gru_300_test_loss))
    print()

 

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ]

  • ์‹œ๊ณ„์—ด ๊ธธ์ด(window size)๊ฐ€ 10์ผ ๋•Œ, ๋ชจ๋ธ 3๊ฐœ ์„ฑ๋Šฅ ์ฐจ์ด ๋ณ„๋กœ ์—†์Œ
  • ์‹œ๊ณ„์—ด ๊ธธ์ด(window size)๊ฐ€ 300์ผ ๋•Œ, LSTM๊ณผ GRU๋Š” ์‹œ๊ณ„์—ด ๊ธธ์ด (window size)๊ฐ€ ๊ธธ์–ด๋„ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ๋œ ๋ฐ›์Œ
  • RNN์˜ ๊ฒฝ์šฐ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•ด์„œ ์„ฑ๋Šฅ์ด ๋‚ฎ์Œ
==================== ์‹œ๊ณ„์—ด ๊ธธ์ด๊ฐ€ 10 ์ธ ๊ฒฝ์šฐ ====================
[RNN ] ํ…Œ์ŠคํŠธ MSE = 0.30041
[LSTM] ํ…Œ์ŠคํŠธ MSE = 0.30050
[GRU ] ํ…Œ์ŠคํŠธ MSE = 0.29302

==================== ์‹œ๊ณ„์—ด ๊ธธ์ด๊ฐ€ 300 ์ธ ๊ฒฝ์šฐ ====================
[RNN ] ํ…Œ์ŠคํŠธ MSE = 0.33759
[LSTM] ํ…Œ์ŠคํŠธ MSE = 0.29616
[GRU ] ํ…Œ์ŠคํŠธ MSE = 0.29959

 

 


 

 

[ RNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ†ตํ•œ ๋ถ„๋ฅ˜ ์ž‘์—… ]

  • ๋ฌธ์žฅ์„ ํ†ตํ•ด ๋ณ„์  ์˜ˆ์ธก (5๊ฐœ์˜ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ชจ๋ธ)
  • ๋ฐ์ดํ„ฐ์…‹ : ์•„๋งˆ์กด์˜ ์‹ํ’ˆ ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ (์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‹ค์ œ ๋ฆฌ๋ทฐ ๋ฌธ์žฅ๊ณผ ํ•ด๋‹น ๋ฆฌ๋ทฐ์–ด์˜ ํ‰๊ฐ€ ์ ์ˆ˜ ๋งŒ์„ ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ ์…‹)
  • ๋ฌธ์žฅ ๋‚ด ๊ฐ ๋‹จ์–ด๋ฅผ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Tokenizer ์ ์šฉ
  • ๊ฐ RNN ๋ชจ๋ธ ๊ตฌ์„ฑ (Embedding, RNN, Dense)์€ ๋™์ผํ•˜๊ฒŒ ๊ตฌ์„ฑ
    • 1. Embedding layer : max_features์—์„œ embedding_size๋กœ ๊ฐ ๋ฌธ์žฅ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” embedding layer
    • 2. RNN layer : hidden state์˜ ํฌ๊ธฐ๋ฅผ 20์œผ๋กœ ์„ค์ •ํ•œ RNN ๊ธฐ๋ฐ˜ layer
    • 3. Dense layer : ๊ฐ ๊ฒฐ๊ณผ๊ฐ’์„ 5๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” dense layer
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split

import pandas as pd


def load_data(max_len):
    data = pd.read_csv("./review_score.csv")
    # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ: ๋ฆฌ๋ทฐ ๋ฌธ์žฅ / ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ: ํ•ด๋‹น ๋ฆฌ๋ทฐ์˜ ํ‰์ 
    X = data['Review']
    y = data['Score']
    y = y - 1 # ๊ฐ’์„ 1~5์—์„œ 0~4๋กœ ๋ณ€๊ฒฝ

    # ๋ฌธ์žฅ ๋‚ด ๊ฐ ๋‹จ์–ด๋ฅผ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Tokenizer ์ ์šฉ # ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(X) # ๋ฌธ์žฅ -> ์ˆซ์ž๋กœ ๋ณ€ํ™˜
    X = tokenizer.texts_to_sequences(X) # ์ˆซ์ž๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์žฅ ์ƒ์„ฑ

    # ์ „์ฒด ๋‹จ์–ด ์ค‘์—์„œ ๊ฐ€์žฅ ํฐ ์ˆซ์ž๋กœ mapping๋œ ๋‹จ์–ด์˜ ์ˆซ์ž ๊ฐ€์ ธ์˜ด
    # ์ฆ‰, max_features๋Š” ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋“ฑ์žฅํ•˜๋Š” ๊ฒน์น˜์ง€ ์•Š๋Š” ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜ + 1๊ณผ ๋™์ผ
    max_features = max([max(_in) for _in in X]) + 1

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    # ๋ชจ๋“  ๋ฌธ์žฅ๋“ค์„ ๊ฐ€์žฅ ๊ธด ๋ฌธ์žฅ์˜ ๋‹จ์–ด ๊ฐœ์ˆ˜๊ฐ€ ๋˜๊ฒŒ padding ์ถ”๊ฐ€
    X_train = pad_sequences(X_train, maxlen=max_len)
    X_test = pad_sequences(X_test, maxlen=max_len)

    return X_train, X_test, y_train, y_test, max_features


''' 1. Simple RNN ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ '''
def build_rnn_model(max_features, embedding_size):
    model = Sequential()

    model.add(layers.Embedding(max_features, embedding_size)) 
    model.add(layers.SimpleRNN(units=20)) 
    model.add(layers.Dense(units=5, activation='softmax'))

    return model


''' 2. LSTM ๊ธฐ๋ฐ˜ ๋ชจ๋ธ '''
def build_lstm_model(max_features, embedding_size):
    model = Sequential()

    model.add(layers.Embedding(max_features, embedding_size))
    model.add(layers.LSTM(units=20)) #  hidden state ํฌ๊ธฐ: 20
    model.add(layers.Dense(units=5, activation='softmax'))

    return model


''' 3. GRU ๊ธฐ๋ฐ˜ ๋ชจ๋ธ '''
def build_gru_model(max_features, embedding_size):
    model = Sequential()

    model.add(layers.Embedding(max_features, embedding_size))
    model.add(layers.GRU(units=20)) #  hidden state ํฌ๊ธฐ: 20
    model.add(layers.Dense(units=5, activation='softmax'))

    return model


def run_model(model, X_train, X_test, y_train, y_test, epochs=10):
    # ๋ชจ๋ธ ์ตœ์ ํ™” 
    # label์ด one-hot vector๋กœ ์ด๋ฃจ์–ด์ง„ ๊ฒฝ์šฐ์—๋Š” loss๋กœ categorical_crossentropy ์‚ฌ์šฉ
    # ํ˜„์žฌ ๋ฐ์ดํ„ฐ ์…‹์ด 0-4์˜ ์ •์ˆ˜ ๋ผ๋ฒจ๋กœ ์ด๋ฃจ์–ด์กŒ๊ธฐ ๋•Œ๋ฌธ์— loss๋กœ sparse_categorical_crossentropy ์‚ฌ์šฉํ•จ
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    # ๋ชจ๋ธ ํ•™์Šต
    hist = model.fit(X_train, y_train, batch_size=256,epochs=epochs,shuffle=True, verbose=2)
    
    # ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

    return test_loss, test_acc, optimizer, hist

def main():
    tf.random.set_seed(2022)
    max_len = 150
    embedding_size = 128

    X_train, X_test, y_train, y_test, max_features = load_data(max_len)
    rnn_model = build_rnn_model(max_features, embedding_size)
    lstm_model = build_lstm_model(max_features, embedding_size)
    gru_model = build_gru_model(max_features, embedding_size)

    rnn_test_loss, rnn_test_acc, _, _ = run_model(rnn_model, X_train, X_test, y_train, y_test)
    lstm_test_loss, lstm_test_acc, _, _ = run_model(lstm_model, X_train, X_test, y_train, y_test)
    gru_test_loss, gru_test_acc, _, _ = run_model(gru_model, X_train, X_test, y_train, y_test)

    print()
    print("=" * 20, "๋ชจ๋ธ ๋ณ„ Test Loss์™€ ์ •ํ™•๋„", "=" * 20)
    print("[RNN ] ํ…Œ์ŠคํŠธ Loss: {:.5f}, ํ…Œ์ŠคํŠธ Accuracy: {:.3f}%".format(rnn_test_loss, rnn_test_acc * 100))
    print("[LSTM] ํ…Œ์ŠคํŠธ Loss: {:.5f}, ํ…Œ์ŠคํŠธ Accuracy: {:.3f}%".format(lstm_test_loss, lstm_test_acc * 100))
    print("[GRU ] ํ…Œ์ŠคํŠธ Loss: {:.5f}, ํ…Œ์ŠคํŠธ Accuracy: {:.3f}%".format(gru_test_loss, gru_test_acc * 100))

if __name__ == "__main__":
    main()

 

 

 

[ ์ฝ”๋“œ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ ]

  • ์„ฑ๋Šฅ : LSTM > GRU > Vanilla RNN
==================== ๋ชจ๋ธ ๋ณ„ Test Loss์™€ ์ •ํ™•๋„ ====================
[RNN ] ํ…Œ์ŠคํŠธ Loss: 1.39075, ํ…Œ์ŠคํŠธ Accuracy: 63.370%
[LSTM] ํ…Œ์ŠคํŠธ Loss: 1.11475, ํ…Œ์ŠคํŠธ Accuracy: 67.670%
[GRU ] ํ…Œ์ŠคํŠธ Loss: 1.27642, ํ…Œ์ŠคํŠธ Accuracy: 66.050%

 


 

[ RNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ†ตํ•œ ํšŒ๊ท€ ์ž‘์—… ]

  • ํ•œ๋‹ฌ ์ด์ƒ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์ง€๋ง‰ ๋‚ ์งœ ๋‹ค์Œ๋‚ ์˜ ์ข…๊ฐ€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ
  • ๋ฐ์ดํ„ฐ์…‹ : Apple ์ฃผ๊ฐ€ ๋ฐ์ดํ„ฐ์…‹ - ๋‚ ์งœ๋ณ„๋กœ ์‹œ์ž‘๊ฐ€, ์ผ ์ตœ๊ณ ๊ฐ€, ์ผ ์ตœ์ €๊ฐ€, ์ข…๊ฐ€
    • 1980.12.12 ~ 2020.04.01 ์˜ ์ฃผ๊ฐ€ ๊ธฐ๋ก
    • ๊ฐ ์‹œ์ ์˜ feature ๊ฐœ์ˆ˜๋ฅผ 4๊ฐœ๋กœ ๊ตฌ์„ฑ - ์‹œ์ž‘๊ฐ€, ์ผ ์ตœ๊ณ ๊ฐ€, ์ผ ์ตœ์ €๊ฐ€, ์ข…๊ฐ€
  • RNN ๋ชจ๋ธ ๊ตฌ์„ฑ
    • RNN Layer : hidden_state ํฌ๊ธฐ 256, input_shape=(window_size, num_features)
    • 3๊ฐœ์˜ Dense Layer : ๊ฐ๊ฐ node์˜ ๊ฐœ์ˆ˜ 64, 16, 1๊ฐœ / ํ™œ์„ฑํ™” ํ•จ์ˆ˜ : relu
    • +) ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜์น˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์ด๋ฏ€๋กœ Embedding layer ํ•„์š” ์—†์Œ
    • ๋ชจ๋ธ์— ํ•œ๋ฒˆ์— ๋„ฃ์–ด์ค„ ์‹œ์ ์˜ ๊ฐœ์ˆ˜, window size๋Š” 30๊ฐœ๋กœ ์„ค์ •
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam

from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker


def load_data(window_size):
    raw_data_df = pd.read_csv("./AAPL.csv", index_col="Date")
    
    # ๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™”
    scaler = StandardScaler()
    raw_data = scaler.fit_transform(raw_data_df)
    plot_data = {"mean": scaler.mean_[3], "var": scaler.var_[3], "date": raw_data_df.index}

    # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ(X): ์‹œ์ž‘๊ฐ€, ์ผ ์ตœ๊ณ ๊ฐ€, ์ผ ์ตœ์ €๊ฐ€, ์ข…๊ฐ€ ๋ฐ์ดํ„ฐ
    # ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ(y): ์ข…๊ฐ€ ๋ฐ์ดํ„ฐ(4๋ฒˆ์งธ ์ปฌ๋Ÿผ)
    raw_X = raw_data[:, :4]
    raw_y = raw_data[:, 3]

    # ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ
    # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ(X): window_size๊ฐœ์˜ ๋ฐ์ดํ„ฐ
    # ์˜ˆ์ธกํ•  ๋Œ€์ƒ(y): window_size๋ณด๋‹ค ํ•œ ์‹œ์  ๋’ค์˜ ๋ฐ์ดํ„ฐ
    X, y = [], []
    for i in range(len(raw_X) - window_size):
        cur_prices = raw_X[i:i + window_size, :]
        target = raw_y[i + window_size]

        X.append(list(cur_prices))
        y.append(target)

    X = np.array(X)
    y = np.array(y)

    # ํ•™์Šต ๋ฐ์ดํ„ฐ 80%, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ 20%
    total_len = len(X)
    train_len = int(total_len * 0.8)

    X_train, y_train = X[:train_len], y[:train_len]
    X_test, y_test = X[train_len:], y[train_len:]

    return X_train, X_test, y_train, y_test, plot_data


''' SimpleRNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ '''
def build_rnn_model(window_size, num_features):
    model = Sequential()
    model.add(layers.SimpleRNN(units=256, input_shape=(window_size, num_features)))
    model.add(layers.Dense(units=64, activation='relu'))
    model.add(layers.Dense(units=16, activation='relu'))
    model.add(layers.Dense(units=1)) 

    return model


''' LSTM ๊ธฐ๋ฐ˜ ๋ชจ๋ธ '''
def build_lstm_model(window_size, num_features):
    model = Sequential()
    model.add(layers.LSTM(units=256, input_shape=(window_size, num_features)))
    model.add(layers.Dense(units=64, activation='relu'))
    model.add(layers.Dense(units=16, activation='relu'))
    model.add(layers.Dense(units=1))

    return model


''' GRU ๊ธฐ๋ฐ˜ ๋ชจ๋ธ '''
def build_gru_model(window_size, num_features):
    model = Sequential()
    model.add(layers.GRU(units=256, input_shape=(window_size, num_features)))
    model.add(layers.Dense(units=64, activation='relu'))
    model.add(layers.Dense(units=16, activation='relu'))
    model.add(layers.Dense(units=1))

    return model

def run_model(model, X_train, X_test, y_train, y_test, epochs=10, name=None):
    # ๋ชจ๋ธ ์ตœ์ ํ™”
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss='mse')

    # ๋ชจ๋ธ ํ•™์Šต(ํ•™์Šต์„ ์œ„ํ•œ hyperparameter ์„ค์ •)
    hist = model.fit(X_train, y_train, batch_size=128, epochs=epochs, shuffle=True, verbose=2)
    
    # ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
    test_loss = model.evaluate(X_test, y_test, verbose=0)
    print("[{}] ํ…Œ์ŠคํŠธ loss: {:.5f}".format(name, test_loss))
    print()

    return optimizer, hist


def plot_result(model, X_true, y_true, plot_data, name):
    y_pred = model.predict(X_true)

    # ํ‘œ์ค€ํ™”๋œ ๊ฒฐ๊ณผ๋ฅผ ์›๋ž˜ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜
    y_true_orig = (y_true * np.sqrt(plot_data["var"])) + plot_data["mean"]
    y_pred_orig = (y_pred * np.sqrt(plot_data["var"])) + plot_data["mean"]

    # ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์‚ฌ์šฉํ•œ ๋‚ ์งœ๋“ค
    test_date = plot_data["date"][-len(y_true):]

    # ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’, ์‹ค์ œ๊ฐ’ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ
    fig = plt.figure(figsize=(12, 8))
    ax = plt.gca()
    ax.plot(y_true_orig, color="b", label="True")
    ax.plot(y_pred_orig, color="r", label="Prediction")
    ax.set_xticks(list(range(len(test_date))))
    ax.set_xticklabels(test_date, rotation=45)
    ax.xaxis.set_major_locator(ticker.MultipleLocator(100))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(100))
    ax.set_title("{} Result".format(name))
    ax.legend(loc="upper left")
    plt.tight_layout()
    plt.savefig("apple_stock_{}".format(name.lower()))
    
    elice_utils.send_image("apple_stock_{}.png".format(name.lower()))

def main():
    tf.random.set_seed(2022)

    window_size = 30
    X_train, X_test, y_train, y_test, plot_data = load_data(window_size)
    num_features = X_train[0].shape[1]

    rnn_model = build_rnn_model(window_size, num_features)
    lstm_model = build_lstm_model(window_size, num_features)
    gru_model = build_gru_model(window_size, num_features)

    run_model(rnn_model, X_train, X_test, y_train, y_test, name="RNN")
    run_model(lstm_model, X_train, X_test, y_train, y_test, name="LSTM")
    run_model(gru_model, X_train, X_test, y_train, y_test, name="GRU")

    plot_result(rnn_model, X_test, y_test, plot_data, name="RNN")
    plot_result(lstm_model, X_test, y_test, plot_data, name="LSTM")
    plot_result(gru_model, X_test, y_test, plot_data, name="GRU")

if __name__ == "__main__":
    main()

 

[ ์ฝ”๋“œ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ ]

  • Loss, ์˜ˆ์ธก๊ฒฐ๊ณผ ๊ทธ๋ž˜ํ”„์—์„œ ์„ฑ๋Šฅ : GRU > LSTM > RNN ์ˆœ์œผ๋กœ ์ข‹์Œ
  • GRU, LSTM์— ๋น„ํ•ด RNN์˜ ์„ฑ๋Šฅ ํ˜„์ €ํžˆ ๋–จ์–ด์ง -> GRU, LSTM์€ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ ํ•ด๊ฒฐ๋กœ ์„ฑ๋Šฅ์ด ๋” ์ข‹์Œ
[RNN] ํ…Œ์ŠคํŠธ loss: 0.90601
[LSTM] ํ…Œ์ŠคํŠธ loss: 0.08191
[GRU] ํ…Œ์ŠคํŠธ loss: 0.03845

 

RNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ์˜ˆ์ธก ๊ฒฐ๊ณผ ๊ทธ๋ž˜ํ”„

 


3. RNN ๋ชจ๋ธ์˜ ํ™œ์šฉ

 

RNN/LSTM/GRU๋ชจ๋ธ์€ ํšŒ๊ท€๋ถ„์„๊ณผ ๋ถ„๋ฅ˜์— ๋ชจ๋‘ ํ™œ์šฉ ๊ฐ€๋Šฅ

  • ํšŒ๊ท€ ๋ถ„์„ : ๊ฐ ์‹œ์ ์˜ ์ถœ๋ ฅ๊ฐ’์ด ์–ด๋А ์ •๋„์ผ์ง€ ์˜ˆ์ธก (์˜ˆ: ์ฃผ๊ฐ€ ์˜ˆ์ธก, ๊ธฐ์˜จ ์˜ˆ์ธก )
  • ๋ถ„๋ฅ˜ ์ž‘์—… : ๊ฐ ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋А ํด๋ž˜์Šค์ผ์ง€ ์˜ˆ์ธก (์˜ˆ: ๋ฌธ์žฅ์—์„œ ๋‹ค์Œ ๋‹จ์–ด ์˜ˆ์ธก, ๊ฐ ๋‹จ์–ด์˜ ํ’ˆ์‚ฌ ์˜ˆ์ธก)

 

๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ์†์‹ค ํ•จ์ˆ˜ ๊ณ„์‚ฐ

  • ๊ฐ ์‹œ์ ๋ณ„ ์˜ˆ์ธก๊ฐ’ $\hat{y}_t$์™€ ์‹ค์ œ๊ฐ’ $y_t$ ์„ ํ†ตํ•ด ์‹œ์ ๋ณ„ ์†์‹ค ํ•จ์ˆ˜๊ฐ’ ๊ณ„์‚ฐ -> $L_t$
  • $L_t$๋ฅผ ๋ชจ๋‘ ๋”ํ•˜์—ฌ ์ตœ์ข… ์†์‹ค๊ฐ’ ๊ณ„์‚ฐ -> $L  = \sum_{t=1}^{T} L_t$
  • ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์†์‹คํ•จ์ˆ˜(f) 
    • ํšŒ๊ท€๋ถ„์„ : Mean Squared Error(MSE)
    • ๋ถ„๋ฅ˜ : Cross Entropy

 

Comments