์ผ | ์ | ํ | ์ | ๋ชฉ | ๊ธ | ํ |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- electrochemical models
- m1 anaconda ์ค์น
- gradient descent
- ์๊ทน์ฌ
- ์ค์คํธ๋ฆฌ์
- anaconda ๊ฐ์ํ๊ฒฝ
- set method
- ์ด์ฐจ์ ์ง
- Machine learning
- ๋ฏธ๋์์ ํด์ธ๊ตํ
- Deeplearning
- ์ฒญ์ถ ํ์ดํ
- ๊ตํํ์
- ํน๋ณ ๋ฉ์๋
- fluent python
- Python
- ์ ๋ฝ ๊ตํํ์
- set add
- li-ion
- ๋์23์ด
- ๋ฏธ๋์์ ์ฅํ์
- Andrew ng
- Linear Regression
- ์ ๋ฝ
- ๋ฅ๋ฌ๋
- 2022๋
- cost function
- special method
- ์ ํํ๊ท
- fatigue fracture
- Today
- Total
Done is Better Than Perfect
[๋ฅ๋ฌ๋] 9. LSTM, GRU ๋ณธ๋ฌธ
Vanilla RNN์ ๋จ์ ์ ํด๊ฒฐํ๊ธฐ ์ํด ์ ์๋ ๋ชจ๋ธ์ธ LSTM๊ณผ GRU์ ๋ํด ์์๋ณด๊ฒ ๋ค.
LSTM๊ณผ GRU๋ ๋ด๋ถ ์ฐ์ฐ ๋ฐฉ์๋ง Vanilla RNN๊ณผ ๋ค๋ฅด๋ค. ์ฆ, ์ ๋ ฅ๊ฐ๊ณผ ์ถ๋ ฅ๊ฐ์ Vanilla RNN์ ๋์ผํ๊ฒ ์ฌ์ฉํ๋ฉด ๋๋ค.
[ ๋ชฉ์ฐจ ]
1. LSTM ์๊ฐ
2. GRU ์๊ฐ
3. RNN ๋ชจ๋ธ์ ํ์ฉ
1. LSTM
- Vanilla RNN์ ๊ธฐ์ธ๊ธฐ ์์ค ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ณ ์ ๋ฑ์ฅ
- Long Short Term Memory(์ฅ๋จ๊ธฐ ๋ฉ๋ชจ๋ฆฌ)์ ์ฝ์ → ์ฅ๊ธฐ ์์กด์ฑ๊ณผ ๋จ๊ธฐ ์์กด์ฑ์ ๋ชจ๋ ๊ธฐ์ตํ ์ ์์
- ์๋ก ๊ณ์ฐ๋ hidden state $h_t$ ๋ฅผ ์ถ๋ ฅ๊ฐ $y_t$ ์ผ๋ก๋ ์ฌ์ฉ
- LSTM์ ๊ตฌ์ฑ์์ : cell state, forget gate, input gate, output gate
LSTM์ ๊ตฌ์ฑ์์
Cell state
- ๊ธฐ์ธ๊ธฐ ์์ค ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ ํต์ฌ ์ฅ์น
- ์ฅ๊ธฐ์ ์ผ๋ก ๊ธฐ์ตํ ์ ๋ณด๋ฅผ ์กฐ์
Gate
3์ข ๋ฅ์ ๊ฒ์ดํธ๋ฅผ 4๊ฐ์ FC Layer๋ก ๊ตฌ์ฑ
- $ W_f $: ๋ง๊ฐ๊ฒ์ดํธ(Forget Gate)
- $ W_i, W_C $: ์ ๋ ฅ๊ฒ์ดํธ(Input Gate)
- $ W_o $: ์ถ๋ ฅ๊ฒ์ดํธ(Output Gate)
Forget Gate
- ๊ธฐ์กด cell state์์ ์์ ์ ๋ณด๋ฅผ ๊ฒฐ์
- $f_t = \sigma (W_f[h_{t-1}, x_t]$
- $ \sigma $ : sigmoid ํจ์
- $ [h_{t-1}, x_t] $: $ h_{t-1}$ ๋ฒกํฐ์ $ x_t $ ๋ฒกํฐ๋ฅผ concatenateํ๋์ฐ์ฐ
Input Gate
- ํ์ฌ ์
๋ ฅ ๋ฐ์ ์ ๋ณด์์ cell state์
์ ์ฅํ ์ ๋ณด ๊ฒฐ์ - $ i_t = \sigma(W_i [h_{t-1}, x_t]) $
- $ \tilde{ C } = tanh(W_c[h_{t-1}, x_t ]) $
์๋ก์ด Cell state
- Forget Gate์ Input Gate์ ์ ๋ณด๋ฅผ
ํตํด cell state ๊ฐฑ์ - $ C_t = f_t * C_{t-1} + i_t * \tilde{C}_t $
- * ์ฐ์ฐ์๋ ๋ฒกํฐ์ ๊ฐ ์์๋ณ๋ก ๊ณฑํ๋ ์ฐ์ฐ (Hadamard Product)
- ์) [1 2 3] * [4 5 6] = [4 10 18]
Output Gate
- ๋ค์ hidden state $ h_t $์ ์ถ๋ ฅ๊ฐ $ y_t $์ ๊ณ์ฐ
(์๋ก ๊ณ์ฐ๋ cell state ์ฌ์ฉ) - $ \o_t = \sigma(W_o[h_{t-1}, x_t]) $
- $ h_t = o_t * tanh(C_t) = y_t $
2. GRU
- Gated Recurrent Unit์ ์ฝ์
- LSTM์ ๊ฐ๋ํ ๋ชจ๋ธ
- LSTM๊ณผ ๋ง์ฐฌ๊ฐ์ง๋ก ์๋ก ๊ณ์ฐ๋ hidden state $h_t$ ๋ฅผ ์ถ๋ ฅ๊ฐ $ y_t $ ์ผ๋ก๋ ์ฌ์ฉ
- LSTM์ด ๊ฐ์ง๋ 3๊ฐ์ ๊ฒ์ดํธ๋ฅผ 2๊ฐ๋ก ๊ฐ์ํํ๊ณ Cell State๋ฅผ ์์ฐ
- ํ๋ผ๋ฏธํฐ ์๊ฐ ๊ฐ์ํ์ฌ LSTM๋ณด๋ค ๋น ๋ฅธ ํ์ต ์๋ ๊ฐ์ง
- ๊ทธ๋ผ์๋ ์ฑ๋ฅ์ ์ผ๋ฐ์ ์ผ๋ก LSTM๊ณผ ๋น์ทํ ์์ค
GRU์ ๊ตฌ์ฑ์์
- 2์ข ๋ฅ์ ๊ฒ์ดํธ๋ฅผ 2๊ฐ์ FC Layer๋ก ๊ตฌ์ฑ
- $ W_r $ : ๋ฆฌ์ ๊ฒ์ดํธ(Reset Gate)
- $ W_z $ : ์ ๋ฐ์ดํธ ๊ฒ์ดํธ(Update Gate)
Reset Gate
- ๊ธฐ์กด hidden state์ ์ ๋ณด๋ฅผ ์ผ๋ง๋ ์ด๊ธฐํํ ์ง ๊ฒฐ์ ํ๋ ๊ฒ์ดํธ
- $ r_t = \sigma(W_r [h_{t-1}, x_t] ) $
Update Gate
- ๊ธฐ์กด hidden state์ ์ ๋ณด๋ฅผ ์ผ๋ง๋ ์ฌ์ฉํ ์ง ๊ฒฐ์ ํ๋ ๊ฒ์ดํธ
- $ z_t = \sigma(W_x[h_{t-1}, x_t] ) $
์๋ก์ด hidden state ๊ณ์ฐ ( 2 step์ผ๋ก ์ค๋ช )
Step 1)
- Reset Gate์ ๊ฒฐ๊ณผ๋ฅผ ํตํด ์๋ก์ด hidden state์ ํ๋ณด $\tilde{h}_t $ ๊ณ์ฐ
- $ \tilde{h}_t = tanh (W_i [r_t * h_{t-1}, x_t] ) $
- *๋ hadamard product
- $ r_t * h_{t-1} $ ๋ถ๋ถ์ด ์ ์ ์ ํตํด $ W_i $๋ก ์ ๋ฌ๋จ
Step 2)
- Update Gate์ ๊ฒฐ๊ณผ๋ฅผ ํตํด ์๋ก์ด hidden state ๊ณ์ฐ
- $ h_t = (1-z_t) * h_{t-1} + z_t * \tilde{h}_t $
- Update Gate์ ์ ๋ณด $ z_t $๊ฐ ์๋ก์ด hidden state ์ ๋ณด $ \tilde{h}_t $ , ์ด์ hidden state ์ ๋ณด $ h_{t-1} $๋ฅผ ์ผ๋ง๋ ์ฌ์ฉํ ์ง ๊ฒฐ์ ํจ
- Update Gate์ ์ ๋ณด๋ง์ด ์๋ก์ด hidden state ๊ณ์ฐ์ ์ฌ์ฉ๋จ
- GRU์ Update Gate๊ฐ LSTM์ Forget Gate์ Input Gate๋ฅผ ํ๋๋ก ํฉ์น ๊ฒ๊ณผ ์ ์ฌํ ์ญํ
[ ์ฅ๊ธฐ ์์กด์ฑ ๋ฌธ์ ํ์ธ - Vanilla RNN vs LSTM vs GRU ๋น๊ต ]
- RNN์ ์๊ณ์ด ๋ฐ์ดํฐ์ ๊ฐ์ ๊ฒฝํฅ์ฑ์ ํ์ตํ๋ ๋ฐ ํ๋ฅญํ ์ฑ๋ฅ ์ ๊ณต. ํ์ง๋ง ๋ชจ๋ธ ํ์ต์ ์ฌ์ฉํ๋ ๋ฐ์ดํฐ์ ๊ธธ์ด(sequence)๊ฐ ๊ธธ์ด์ง์๋ก ํ์ต ์ฑ๋ฅ์ด ์ ํ๋๋ ๋ฌธ์ ์์.
- ๊ฑฐ๋ฆฌ๊ฐ ๋จผ ์ ๋ ฅ๊ฐ๊ณผ ์ถ๋ ฅ๊ฐ ์ฌ์ด์ ์ญ์ ํ ๋๋ ๊ธฐ์ธ๊ธฐ ๊ฐ์ด ์ ์ 0์ ์๋ ดํ๋ ๊ธฐ์ธ๊ธฐ ์์ค ๋ฌธ์ ๋๋ฌธ, ๊ฒฐ๊ณผ์ ์ผ๋ก ๊ธธ์ด๊ฐ ๊ธด ์๊ณ์ด ๋ฐ์ดํฐ์์ ์ฅ๊ธฐ ์์กด์ฑ์ ํ์ตํ๋ ๋ฐ ์ฝ์ ์ด ์์ -> ์ด๋ฅผ ๋ณด์ํ๊ธฐ ์ํด LSTM, GRU ๋ชจ๋ธ ์ ์๋จ
- ์๊ณ์ญ ๋ฐ์ดํฐ ํ๊ท ๋ฌธ์
- ๋ฐ์ดํฐ ์ : 1980.01.01 ~ 1990.12.31์ ๋ฉ๋ฒ๋ฅธ ์ง์ญ ์ต์ ๊ธฐ์จ ๋ฐ์ดํฐ ์
- SimpleRNN, LSTM, GRU๋ ์ ๋ ฅ ํํ๊ฐ ๋์ผํจ
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam
import pandas as pd
import numpy as np
def load_data(window_size):
raw_data = pd.read_csv("./daily-min-temperatures.csv")
raw_temps = raw_data["Temp"]
mean_temp = raw_temps.mean()
stdv_temp = raw_temps.std(ddof=0)
raw_temps = (raw_temps - mean_temp) / stdv_temp # ๋ฐ์ดํฐ ์ ๊ทํ
# window size๋งํผ x,y ๊ฐ ์ ์ฅ
X, y = [], []
for i in range(len(raw_temps) - window_size):
cur_temps = raw_temps[i:i + window_size]
target = raw_temps[i + window_size]
X.append(list(cur_temps))
y.append(target)
X = np.array(X)
y = np.array(y)
X = X[:, :, np.newaxis]
total_len = len(X)
train_len = int(total_len * 0.8)
X_train, y_train = X[:train_len], y[:train_len]
X_test, y_test = X[train_len:], y[train_len:]
return X_train, X_test, y_train, y_test
''' 1. SimpleRNN + Fully-connected Layer ๋ชจ๋ธ '''
def build_rnn_model(window_size):
model = Sequential()
model.add(layers.SimpleRNN(units=128, input_shape=(window_size,1)))
model.add(layers.Dense(units=32, activation='relu'))
model.add(layers.Dense(units=1))
return model
''' 2. LSTM + Fully-connected Layer ๋ชจ๋ธ '''
def build_lstm_model(window_size):
model = Sequential()
model.add(layers.LSTM(units=128, input_shape=(window_size,1)))
model.add(layers.Dense(units=32, activation='relu'))
model.add(layers.Dense(units=1))
return model
''' 3. GRU + Fully-connected Layer ๋ชจ๋ธ '''
def build_gru_model(window_size):
model = Sequential()
model.add(layers.GRU(units=128, input_shape=(window_size,1)))
model.add(layers.Dense(units=32, activation='relu'))
model.add(layers.Dense(units=1))
return model
def run_model(model, X_train, X_test, y_train, y_test, epochs=10, model_name=None):
# ๋ชจ๋ธ ์ต์ ํ
optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='mse')
# ๋ชจ๋ธ ํ์ต(hyperparameter ์ค์ )
hist = model.fit(X_train, y_train, batch_size=64, epochs=epochs, shuffle=True, verbose=2)
# ๋ชจ๋ธ ํ
์คํธ
test_loss = model.evaluate(X_test, y_test, verbose=0)
return test_loss, optimizer, hist
def main(window_size):
tf.random.set_seed(2022)
X_train, X_test, y_train, y_test = load_data(window_size)
rnn_model = build_rnn_model(window_size)
lstm_model = build_lstm_model(window_size)
gru_model = build_gru_model(window_size)
rnn_test_loss, _, _ = run_model(rnn_model, X_train, X_test, y_train, y_test, model_name="RNN")
lstm_test_loss, _, _ = run_model(lstm_model, X_train, X_test, y_train, y_test, model_name="LSTM")
gru_test_loss, _, _ = run_model(gru_model, X_train, X_test, y_train, y_test, model_name="GRU")
return rnn_test_loss, lstm_test_loss, gru_test_loss
if __name__ == "__main__":
# 10์ผ์น ๋ฐ์ดํฐ๋ฅผ ๋ณด๊ณ ๋ค์๋ ์ ๊ธฐ์จ ์์ธก -> window size = 10
rnn_10_test_loss, lstm_10_test_loss, gru_10_test_loss = main(10)
# 300์ผ์น ๋ฐ์ดํฐ๋ฅผ ๋ณด๊ณ ๋ค์๋ ์ ๊ธฐ์จ ์์ธก -> window size = 300
rnn_300_test_loss, lstm_300_test_loss, gru_300_test_loss = main(300)
print("=" * 20, "์๊ณ์ด ๊ธธ์ด๊ฐ 10 ์ธ ๊ฒฝ์ฐ", "=" * 20)
print("[RNN ] ํ
์คํธ MSE = {:.5f}".format(rnn_10_test_loss))
print("[LSTM] ํ
์คํธ MSE = {:.5f}".format(lstm_10_test_loss))
print("[GRU ] ํ
์คํธ MSE = {:.5f}".format(gru_10_test_loss))
print()
print("=" * 20, "์๊ณ์ด ๊ธธ์ด๊ฐ 300 ์ธ ๊ฒฝ์ฐ", "=" * 20)
print("[RNN ] ํ
์คํธ MSE = {:.5f}".format(rnn_300_test_loss))
print("[LSTM] ํ
์คํธ MSE = {:.5f}".format(lstm_300_test_loss))
print("[GRU ] ํ
์คํธ MSE = {:.5f}".format(gru_300_test_loss))
print()
[ ์ฝ๋ ์คํ ๊ฒฐ๊ณผ ]
- ์๊ณ์ด ๊ธธ์ด(window size)๊ฐ 10์ผ ๋, ๋ชจ๋ธ 3๊ฐ ์ฑ๋ฅ ์ฐจ์ด ๋ณ๋ก ์์
- ์๊ณ์ด ๊ธธ์ด(window size)๊ฐ 300์ผ ๋, LSTM๊ณผ GRU๋ ์๊ณ์ด ๊ธธ์ด (window size)๊ฐ ๊ธธ์ด๋ ์ฑ๋ฅ์ ์ํฅ์ ๋ ๋ฐ์
- RNN์ ๊ฒฝ์ฐ ์ฅ๊ธฐ ์์กด์ฑ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์ง ๋ชปํด์ ์ฑ๋ฅ์ด ๋ฎ์
==================== ์๊ณ์ด ๊ธธ์ด๊ฐ 10 ์ธ ๊ฒฝ์ฐ ====================
[RNN ] ํ
์คํธ MSE = 0.30041
[LSTM] ํ
์คํธ MSE = 0.30050
[GRU ] ํ
์คํธ MSE = 0.29302
==================== ์๊ณ์ด ๊ธธ์ด๊ฐ 300 ์ธ ๊ฒฝ์ฐ ====================
[RNN ] ํ
์คํธ MSE = 0.33759
[LSTM] ํ
์คํธ MSE = 0.29616
[GRU ] ํ
์คํธ MSE = 0.29959
[ RNN ๊ธฐ๋ฐ ๋ชจ๋ธ์ ํตํ ๋ถ๋ฅ ์์ ]
- ๋ฌธ์ฅ์ ํตํด ๋ณ์ ์์ธก (5๊ฐ์ ํด๋์ค ๋ถ๋ฅ ๋ชจ๋ธ)
- ๋ฐ์ดํฐ์ : ์๋ง์กด์ ์ํ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ (์ ์ฒด ๋ฐ์ดํฐ์ ์์ ์ค์ ๋ฆฌ๋ทฐ ๋ฌธ์ฅ๊ณผ ํด๋น ๋ฆฌ๋ทฐ์ด์ ํ๊ฐ ์ ์ ๋ง์ ์ถ์ถํ ๋ฐ์ดํฐ ์ )
- ๋ฌธ์ฅ ๋ด ๊ฐ ๋จ์ด๋ฅผ ์ซ์๋ก ๋ณํํ๋ Tokenizer ์ ์ฉ
- ๊ฐ RNN ๋ชจ๋ธ ๊ตฌ์ฑ (Embedding, RNN, Dense)์ ๋์ผํ๊ฒ ๊ตฌ์ฑ
- 1. Embedding layer : max_features์์ embedding_size๋ก ๊ฐ ๋ฌธ์ฅ์ ๊ตฌ์ฑํ๋ ๋ฒกํฐ์ ํฌ๊ธฐ๋ฅผ ์ค์ด๋ embedding layer
- 2. RNN layer : hidden state์ ํฌ๊ธฐ๋ฅผ 20์ผ๋ก ์ค์ ํ RNN ๊ธฐ๋ฐ layer
- 3. Dense layer : ๊ฐ ๊ฒฐ๊ณผ๊ฐ์ 5๊ฐ์ ํด๋์ค๋ก ๋ถ๋ฅํ๋ dense layer
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
import pandas as pd
def load_data(max_len):
data = pd.read_csv("./review_score.csv")
# ์
๋ ฅ ๋ฐ์ดํฐ: ๋ฆฌ๋ทฐ ๋ฌธ์ฅ / ๋ผ๋ฒจ ๋ฐ์ดํฐ: ํด๋น ๋ฆฌ๋ทฐ์ ํ์
X = data['Review']
y = data['Score']
y = y - 1 # ๊ฐ์ 1~5์์ 0~4๋ก ๋ณ๊ฒฝ
# ๋ฌธ์ฅ ๋ด ๊ฐ ๋จ์ด๋ฅผ ์ซ์๋ก ๋ณํํ๋ Tokenizer ์ ์ฉ # ์์ฐ์ด ์ฒ๋ฆฌ
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X) # ๋ฌธ์ฅ -> ์ซ์๋ก ๋ณํ
X = tokenizer.texts_to_sequences(X) # ์ซ์๋ก ์ด๋ฃจ์ด์ง ๋ฌธ์ฅ ์์ฑ
# ์ ์ฒด ๋จ์ด ์ค์์ ๊ฐ์ฅ ํฐ ์ซ์๋ก mapping๋ ๋จ์ด์ ์ซ์ ๊ฐ์ ธ์ด
# ์ฆ, max_features๋ ์ ์ฒด ๋ฐ์ดํฐ์
์ ๋ฑ์ฅํ๋ ๊ฒน์น์ง ์๋ ๋จ์ด์ ๊ฐ์ + 1๊ณผ ๋์ผ
max_features = max([max(_in) for _in in X]) + 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# ๋ชจ๋ ๋ฌธ์ฅ๋ค์ ๊ฐ์ฅ ๊ธด ๋ฌธ์ฅ์ ๋จ์ด ๊ฐ์๊ฐ ๋๊ฒ padding ์ถ๊ฐ
X_train = pad_sequences(X_train, maxlen=max_len)
X_test = pad_sequences(X_test, maxlen=max_len)
return X_train, X_test, y_train, y_test, max_features
''' 1. Simple RNN ๊ธฐ๋ฐ์ ๋ชจ๋ธ '''
def build_rnn_model(max_features, embedding_size):
model = Sequential()
model.add(layers.Embedding(max_features, embedding_size))
model.add(layers.SimpleRNN(units=20))
model.add(layers.Dense(units=5, activation='softmax'))
return model
''' 2. LSTM ๊ธฐ๋ฐ ๋ชจ๋ธ '''
def build_lstm_model(max_features, embedding_size):
model = Sequential()
model.add(layers.Embedding(max_features, embedding_size))
model.add(layers.LSTM(units=20)) # hidden state ํฌ๊ธฐ: 20
model.add(layers.Dense(units=5, activation='softmax'))
return model
''' 3. GRU ๊ธฐ๋ฐ ๋ชจ๋ธ '''
def build_gru_model(max_features, embedding_size):
model = Sequential()
model.add(layers.Embedding(max_features, embedding_size))
model.add(layers.GRU(units=20)) # hidden state ํฌ๊ธฐ: 20
model.add(layers.Dense(units=5, activation='softmax'))
return model
def run_model(model, X_train, X_test, y_train, y_test, epochs=10):
# ๋ชจ๋ธ ์ต์ ํ
# label์ด one-hot vector๋ก ์ด๋ฃจ์ด์ง ๊ฒฝ์ฐ์๋ loss๋ก categorical_crossentropy ์ฌ์ฉ
# ํ์ฌ ๋ฐ์ดํฐ ์
์ด 0-4์ ์ ์ ๋ผ๋ฒจ๋ก ์ด๋ฃจ์ด์ก๊ธฐ ๋๋ฌธ์ loss๋ก sparse_categorical_crossentropy ์ฌ์ฉํจ
optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# ๋ชจ๋ธ ํ์ต
hist = model.fit(X_train, y_train, batch_size=256,epochs=epochs,shuffle=True, verbose=2)
# ๋ชจ๋ธ ํ
์คํธ
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
return test_loss, test_acc, optimizer, hist
def main():
tf.random.set_seed(2022)
max_len = 150
embedding_size = 128
X_train, X_test, y_train, y_test, max_features = load_data(max_len)
rnn_model = build_rnn_model(max_features, embedding_size)
lstm_model = build_lstm_model(max_features, embedding_size)
gru_model = build_gru_model(max_features, embedding_size)
rnn_test_loss, rnn_test_acc, _, _ = run_model(rnn_model, X_train, X_test, y_train, y_test)
lstm_test_loss, lstm_test_acc, _, _ = run_model(lstm_model, X_train, X_test, y_train, y_test)
gru_test_loss, gru_test_acc, _, _ = run_model(gru_model, X_train, X_test, y_train, y_test)
print()
print("=" * 20, "๋ชจ๋ธ ๋ณ Test Loss์ ์ ํ๋", "=" * 20)
print("[RNN ] ํ
์คํธ Loss: {:.5f}, ํ
์คํธ Accuracy: {:.3f}%".format(rnn_test_loss, rnn_test_acc * 100))
print("[LSTM] ํ
์คํธ Loss: {:.5f}, ํ
์คํธ Accuracy: {:.3f}%".format(lstm_test_loss, lstm_test_acc * 100))
print("[GRU ] ํ
์คํธ Loss: {:.5f}, ํ
์คํธ Accuracy: {:.3f}%".format(gru_test_loss, gru_test_acc * 100))
if __name__ == "__main__":
main()
[ ์ฝ๋ ์ํ ๊ฒฐ๊ณผ ]
- ์ฑ๋ฅ : LSTM > GRU > Vanilla RNN
==================== ๋ชจ๋ธ ๋ณ Test Loss์ ์ ํ๋ ====================
[RNN ] ํ
์คํธ Loss: 1.39075, ํ
์คํธ Accuracy: 63.370%
[LSTM] ํ
์คํธ Loss: 1.11475, ํ
์คํธ Accuracy: 67.670%
[GRU ] ํ
์คํธ Loss: 1.27642, ํ
์คํธ Accuracy: 66.050%
[ RNN ๊ธฐ๋ฐ ๋ชจ๋ธ์ ํตํ ํ๊ท ์์ ]
- ํ๋ฌ ์ด์์ ์ ๋ ฅ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ฉํ์ฌ ๋ง์ง๋ง ๋ ์ง ๋ค์๋ ์ ์ข ๊ฐ๋ฅผ ์์ธกํ๋ ๋ชจ๋ธ
- ๋ฐ์ดํฐ์
: Apple ์ฃผ๊ฐ ๋ฐ์ดํฐ์
- ๋ ์ง๋ณ๋ก ์์๊ฐ, ์ผ ์ต๊ณ ๊ฐ, ์ผ ์ต์ ๊ฐ, ์ข
๊ฐ
- 1980.12.12 ~ 2020.04.01 ์ ์ฃผ๊ฐ ๊ธฐ๋ก
- ๊ฐ ์์ ์ feature ๊ฐ์๋ฅผ 4๊ฐ๋ก ๊ตฌ์ฑ - ์์๊ฐ, ์ผ ์ต๊ณ ๊ฐ, ์ผ ์ต์ ๊ฐ, ์ข ๊ฐ
- RNN ๋ชจ๋ธ ๊ตฌ์ฑ
- RNN Layer : hidden_state ํฌ๊ธฐ 256, input_shape=(window_size, num_features)
- 3๊ฐ์ Dense Layer : ๊ฐ๊ฐ node์ ๊ฐ์ 64, 16, 1๊ฐ / ํ์ฑํ ํจ์ : relu
- +) ๋ฐ์ดํฐ๊ฐ ์์น ์๊ณ์ด ๋ฐ์ดํฐ์ด๋ฏ๋ก Embedding layer ํ์ ์์
- ๋ชจ๋ธ์ ํ๋ฒ์ ๋ฃ์ด์ค ์์ ์ ๊ฐ์, window size๋ 30๊ฐ๋ก ์ค์
import tensorflow as tf
from tensorflow.keras import layers, Sequential
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
def load_data(window_size):
raw_data_df = pd.read_csv("./AAPL.csv", index_col="Date")
# ๋ฐ์ดํฐ ํ์คํ
scaler = StandardScaler()
raw_data = scaler.fit_transform(raw_data_df)
plot_data = {"mean": scaler.mean_[3], "var": scaler.var_[3], "date": raw_data_df.index}
# ์
๋ ฅ ๋ฐ์ดํฐ(X): ์์๊ฐ, ์ผ ์ต๊ณ ๊ฐ, ์ผ ์ต์ ๊ฐ, ์ข
๊ฐ ๋ฐ์ดํฐ
# ๋ผ๋ฒจ ๋ฐ์ดํฐ(y): ์ข
๊ฐ ๋ฐ์ดํฐ(4๋ฒ์งธ ์ปฌ๋ผ)
raw_X = raw_data[:, :4]
raw_y = raw_data[:, 3]
# ๋ฐ์ดํฐ์
๊ตฌ์ฑ
# ์
๋ ฅ ๋ฐ์ดํฐ(X): window_size๊ฐ์ ๋ฐ์ดํฐ
# ์์ธกํ ๋์(y): window_size๋ณด๋ค ํ ์์ ๋ค์ ๋ฐ์ดํฐ
X, y = [], []
for i in range(len(raw_X) - window_size):
cur_prices = raw_X[i:i + window_size, :]
target = raw_y[i + window_size]
X.append(list(cur_prices))
y.append(target)
X = np.array(X)
y = np.array(y)
# ํ์ต ๋ฐ์ดํฐ 80%, ํ
์คํธ ๋ฐ์ดํฐ 20%
total_len = len(X)
train_len = int(total_len * 0.8)
X_train, y_train = X[:train_len], y[:train_len]
X_test, y_test = X[train_len:], y[train_len:]
return X_train, X_test, y_train, y_test, plot_data
''' SimpleRNN ๊ธฐ๋ฐ ๋ชจ๋ธ '''
def build_rnn_model(window_size, num_features):
model = Sequential()
model.add(layers.SimpleRNN(units=256, input_shape=(window_size, num_features)))
model.add(layers.Dense(units=64, activation='relu'))
model.add(layers.Dense(units=16, activation='relu'))
model.add(layers.Dense(units=1))
return model
''' LSTM ๊ธฐ๋ฐ ๋ชจ๋ธ '''
def build_lstm_model(window_size, num_features):
model = Sequential()
model.add(layers.LSTM(units=256, input_shape=(window_size, num_features)))
model.add(layers.Dense(units=64, activation='relu'))
model.add(layers.Dense(units=16, activation='relu'))
model.add(layers.Dense(units=1))
return model
''' GRU ๊ธฐ๋ฐ ๋ชจ๋ธ '''
def build_gru_model(window_size, num_features):
model = Sequential()
model.add(layers.GRU(units=256, input_shape=(window_size, num_features)))
model.add(layers.Dense(units=64, activation='relu'))
model.add(layers.Dense(units=16, activation='relu'))
model.add(layers.Dense(units=1))
return model
def run_model(model, X_train, X_test, y_train, y_test, epochs=10, name=None):
# ๋ชจ๋ธ ์ต์ ํ
optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='mse')
# ๋ชจ๋ธ ํ์ต(ํ์ต์ ์ํ hyperparameter ์ค์ )
hist = model.fit(X_train, y_train, batch_size=128, epochs=epochs, shuffle=True, verbose=2)
# ๋ชจ๋ธ ํ
์คํธ
test_loss = model.evaluate(X_test, y_test, verbose=0)
print("[{}] ํ
์คํธ loss: {:.5f}".format(name, test_loss))
print()
return optimizer, hist
def plot_result(model, X_true, y_true, plot_data, name):
y_pred = model.predict(X_true)
# ํ์คํ๋ ๊ฒฐ๊ณผ๋ฅผ ์๋ ๊ฐ์ผ๋ก ๋ณํ
y_true_orig = (y_true * np.sqrt(plot_data["var"])) + plot_data["mean"]
y_pred_orig = (y_pred * np.sqrt(plot_data["var"])) + plot_data["mean"]
# ํ
์คํธ ๋ฐ์ดํฐ์์ ์ฌ์ฉํ ๋ ์ง๋ค
test_date = plot_data["date"][-len(y_true):]
# ๋ชจ๋ธ์ ์์ธก๊ฐ, ์ค์ ๊ฐ ๊ทธ๋ํ ์์ฑ
fig = plt.figure(figsize=(12, 8))
ax = plt.gca()
ax.plot(y_true_orig, color="b", label="True")
ax.plot(y_pred_orig, color="r", label="Prediction")
ax.set_xticks(list(range(len(test_date))))
ax.set_xticklabels(test_date, rotation=45)
ax.xaxis.set_major_locator(ticker.MultipleLocator(100))
ax.yaxis.set_major_locator(ticker.MultipleLocator(100))
ax.set_title("{} Result".format(name))
ax.legend(loc="upper left")
plt.tight_layout()
plt.savefig("apple_stock_{}".format(name.lower()))
elice_utils.send_image("apple_stock_{}.png".format(name.lower()))
def main():
tf.random.set_seed(2022)
window_size = 30
X_train, X_test, y_train, y_test, plot_data = load_data(window_size)
num_features = X_train[0].shape[1]
rnn_model = build_rnn_model(window_size, num_features)
lstm_model = build_lstm_model(window_size, num_features)
gru_model = build_gru_model(window_size, num_features)
run_model(rnn_model, X_train, X_test, y_train, y_test, name="RNN")
run_model(lstm_model, X_train, X_test, y_train, y_test, name="LSTM")
run_model(gru_model, X_train, X_test, y_train, y_test, name="GRU")
plot_result(rnn_model, X_test, y_test, plot_data, name="RNN")
plot_result(lstm_model, X_test, y_test, plot_data, name="LSTM")
plot_result(gru_model, X_test, y_test, plot_data, name="GRU")
if __name__ == "__main__":
main()
[ ์ฝ๋ ์ํ ๊ฒฐ๊ณผ ]
- Loss, ์์ธก๊ฒฐ๊ณผ ๊ทธ๋ํ์์ ์ฑ๋ฅ : GRU > LSTM > RNN ์์ผ๋ก ์ข์
- GRU, LSTM์ ๋นํด RNN์ ์ฑ๋ฅ ํ์ ํ ๋จ์ด์ง -> GRU, LSTM์ ๊ธฐ์ธ๊ธฐ ์์ค ๋ฌธ์ ํด๊ฒฐ๋ก ์ฑ๋ฅ์ด ๋ ์ข์
[RNN] ํ
์คํธ loss: 0.90601
[LSTM] ํ
์คํธ loss: 0.08191
[GRU] ํ
์คํธ loss: 0.03845
3. RNN ๋ชจ๋ธ์ ํ์ฉ
RNN/LSTM/GRU๋ชจ๋ธ์ ํ๊ท๋ถ์๊ณผ ๋ถ๋ฅ์ ๋ชจ๋ ํ์ฉ ๊ฐ๋ฅ
- ํ๊ท ๋ถ์ : ๊ฐ ์์ ์ ์ถ๋ ฅ๊ฐ์ด ์ด๋ ์ ๋์ผ์ง ์์ธก (์: ์ฃผ๊ฐ ์์ธก, ๊ธฐ์จ ์์ธก )
- ๋ถ๋ฅ ์์ : ๊ฐ ์์ ์ ๋ฐ์ดํฐ๊ฐ ์ด๋ ํด๋์ค์ผ์ง ์์ธก (์: ๋ฌธ์ฅ์์ ๋ค์ ๋จ์ด ์์ธก, ๊ฐ ๋จ์ด์ ํ์ฌ ์์ธก)
๋ชจ๋ธ ํ์ต์ ์ํ ์์ค ํจ์ ๊ณ์ฐ
- ๊ฐ ์์ ๋ณ ์์ธก๊ฐ $\hat{y}_t$์ ์ค์ ๊ฐ $y_t$ ์ ํตํด ์์ ๋ณ ์์ค ํจ์๊ฐ ๊ณ์ฐ -> $L_t$
- $L_t$๋ฅผ ๋ชจ๋ ๋ํ์ฌ ์ต์ข ์์ค๊ฐ ๊ณ์ฐ -> $L = \sum_{t=1}^{T} L_t$
- ์ฃผ๋ก ์ฌ์ฉํ๋ ์์คํจ์(f)
- ํ๊ท๋ถ์ : Mean Squared Error(MSE)
- ๋ถ๋ฅ : Cross Entropy
'๐ค AI > Deep Learning' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[๋ฅ๋ฌ๋] 8. RNN (1) | 2024.07.01 |
---|---|
[๋ฅ๋ฌ๋] 7. CNN (0) | 2024.06.27 |
[๋ฅ๋ฌ๋] 6. ๋ฅ๋ฌ๋ ๋ชจ๋ธ ํ์ต์ ๋ฌธ์ ์ pt.3 : ๊ณผ์ ํฉ (0) | 2024.06.22 |
[๋ฅ๋ฌ๋] 5. ๋ฅ๋ฌ๋ ๋ชจ๋ธ ํ์ต์ ๋ฌธ์ ์ pt.2 : ๊ธฐ์ธ๊ธฐ ์์ค, ๊ฐ์ค์น ์ด๊ธฐํ ๋ฐฉ๋ฒ (2) | 2024.06.12 |
[๋ฅ๋ฌ๋] 4. ๋ฅ๋ฌ๋ ๋ชจ๋ธ ํ์ต์ ๋ฌธ์ ์ pt.1 : ์ต์ ํ ์๊ณ ๋ฆฌ์ฆ (0) | 2024.06.10 |