๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

[๋”ฅ๋Ÿฌ๋‹] 6. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต์˜ ๋ฌธ์ œ์  pt.3 : ๊ณผ์ ํ•ฉ ๋ณธ๋ฌธ

๐Ÿค– AI/Deep Learning

[๋”ฅ๋Ÿฌ๋‹] 6. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต์˜ ๋ฌธ์ œ์  pt.3 : ๊ณผ์ ํ•ฉ

jimingee 2024. 6. 22. 20:28

4. ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์™€ ๋ฐฉ์ง€ ๊ธฐ๋ฒ•

 

๊ณผ์ ํ•ฉ ๋ฌธ์ œ (overfitting) : ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ณผํ•˜๊ฒŒ ์ ํ•ฉํ•œ ์ƒํƒœ.
                                            ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์—์„œ ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•จ (์ผ๋ฐ˜ํ™” ํ•˜์ง€ ๋ชปํ•จ)

  • ๊ณผ์ ํ•ฉ ๋ฐœ์ƒ ์›์ธ : 
    • ๋ฐ์ดํ„ฐ์˜ ํผ์ง„ ์ •๋„, ์ฆ‰ ๋ถ„์‚ฐ(variance)์ด ๋†’์€ ๊ฒฝ์šฐ
    • ๋„ˆ๋ฌด ๋งŽ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚จ ๊ฒฝ์šฐ (epochs๊ฐ€ ๋งค์šฐ ํฐ ๊ฒฝ์šฐ)
    • ํ•™์Šต์— ์‚ฌ์šฉ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒฝ์šฐ
    • ๋ฐ์ดํ„ฐ์— ๋น„ํ•ด ๋ชจ๋ธ์ด ๋„ˆ๋ฌด ๋ณต์žกํ•œ ๊ฒฝ์šฐ
    • ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ & ์ด์ƒ์น˜(outlier)๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒฝ์šฐ

 

  • ๊ณผ์ ํ•ฉ ํ˜„์ƒ ๋ฐฉ์ง€ ๊ธฐ๋ฒ• : ์ •๊ทœํ™” (Regularization), ๋“œ๋กญ์•„์›ƒ (Dropout), ๋ฐฐ์น˜ ์ •๊ทœํ™” (Batch Normalization)

 

 

 

1. ์ •๊ทœํ™” (Regularization) : 

๋ชจ๋ธ์ด ๋ณต์žกํ•ด์งˆ์ˆ˜๋ก parameter๋“ค์€ ๋งŽ์•„์ง€๊ณ , ์ ˆ๋Œ“๊ฐ’์ด ์ปค์ง€๋Š” ๊ฒฝํ–ฅ์ด ๋ฐœ์ƒํ•จ. ->  ๊ธฐ์กด ์†์‹ค ํ•จ์ˆ˜์— ๊ทœ์ œํ•ญ์„ ๋”ํ•ด ์ตœ์ ๊ฐ’ ์ฐพ๊ธฐ ๊ฐ€๋Šฅ

๋”ฅ๋Ÿฌ๋‹์€ ๊ทœ์ œํ•ญ(loss)๊ฐ€ ์ž‘์•„์ง€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•จ

 

  • L1 ์ •๊ทœํ™” (Lasso Regularization)
    • ๊ฐ€์ค‘์น˜์˜ ์ ˆ๋Œ“๊ฐ’์˜ ํ•ฉ์„ ๊ทœ์ œํ•ญ(loss)์œผ๋กœ ์ •์˜.
    • $ Total Loss = Loss + \lambda  \sum_w|W| $
    • ๋ชจ๋ธ ๋‚ด์˜ ์ผ๋ถ€ ๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด ์˜๋ฏธ์žˆ๋Š” ๊ฐ€์ค‘์น˜๋งŒ ๋‚จ๋„๋ก ๋งŒ๋“ค์–ด ์คŒ > sparseํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ฆ
    • ๊ฐ€์ค‘์น˜์— L1 ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•˜๋Š” ๋น„์œจ (0.001 ~0.005)
    • tf.keras.layers.Dense(kernel_regularizer = tf.keras.regularizers.l1(ratio))
  • L2 ์ •๊ทœํ™”(Ridge Regularization)
    • ๊ฐ€์ค‘์น˜์˜ ์ œ๊ณฑ์˜ ํ•ฉ์„ ๊ทœ์ œํ•ญ(loss)์œผ๋กœ ์ •์˜. 
    • $ Total Loss =  Loss + \lambda \sum_w W^2$ 
    • ํ•™์Šต์ด ์ง„ํ–‰๋  ๋•Œ ๊ฐ€์ค‘์น˜์˜ ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก ๋งŒ๋“ค์–ด์คŒ. ํฐ ๊ฐ’์„ ๊ฐ€์ง„ ๊ฐ€์ค‘์น˜๋ฅผ ๋”์šฑ ์ œ์•ฝํ•˜๋Š” ํšจ๊ณผ
    • L1 ์ •๊ทœํ™”์— ๋น„ํ•˜์—ฌ 0์œผ๋กœ ์ˆ˜๋ ดํ•˜๋Š” ๊ฐ€์ค‘์น˜๊ฐ€ ์ ์Œ. 
    • ํŠน์ • ๊ฐ€์ค‘์น˜์— ์น˜์ค‘๋˜์ง€ ์•Š๋„๋ก ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ์กฐ์œจํ•˜๊ฒŒ ๋˜๋ฉฐ ๊ฐ€์ค‘์น˜ ๊ฐ์‡  (Weight Decay)๋ผ๊ณ  ๋ถ€๋ฆ„
    • ๊ฐ€์ค‘์น˜์— L2 ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•˜๋Š” ๋น„์œจ (0.001 ~0.005)
    • tf.keras.layers.Dense(kernel_regularizer = tf.keras.regularizers.l2(ratio))

 

[ ๊ธฐ๋ณธ ๋ชจ๋ธ vs L1 ์ •๊ทœํ™” ์ ์šฉ ๋ชจ๋ธ vs L2 ์ •๊ทœํ™” ์ ์šฉ ๋ชจ๋ธ ๋น„๊ต ]

import numpy as np
import tensorflow as tf
from visual import *
import logging, os
logging.disable(logging.WARNING)

# ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ํ•จ์ˆ˜ - one hot ์ž„๋ฒ ๋”ฉ
def sequences_shaping(sequences, dimension):
    
    results = np.zeros((len(sequences), dimension))
    for i, word_indices in enumerate(sequences):
        results[i, word_indices] = 1.0 
    
    return results

''' ๊ธฐ๋ณธ ๋ชจ๋ธ '''
def Basic(word_num):
    
    basic_model = tf.keras.Sequential([ 
        tf.keras.layers.Dense(256, activation = 'relu', input_shape=(word_num,)), 
        tf.keras.layers.Dense(128, activation = 'relu'),
        tf.keras.layers.Dense(1, activation= 'sigmoid')
        ])
    
    return basic_model


''' ๊ธฐ๋ณธ ๋ชจ๋ธ์— L1 ์ •๊ทœํ™” ์ ์šฉ (์ž…๋ ฅ์ธต๊ณผ ํžˆ๋“ ์ธต์—๋งŒ ์ ์šฉ) '''
def L1(word_num):
    
    l1_model = tf.keras.Sequential([ 
        tf.keras.layers.Dense(256, activation = 'relu', input_shape=(word_num,), kernel_regularizer = tf.keras.regularizers.l1(0.002)), 
        tf.keras.layers.Dense(128, activation = 'relu', kernel_regularizer = tf.keras.regularizers.l1(0.002)),
        tf.keras.layers.Dense(1, activation= 'sigmoid')
        ])
    
    return l1_model

''' ๊ธฐ๋ณธ ๋ชจ๋ธ์— L2 ์ •๊ทœํ™” ์ ์šฉ (์ž…๋ ฅ์ธต๊ณผ ํžˆ๋“ ์ธต์—๋งŒ ์ ์šฉ) '''
def L2(word_num):
    
    l2_model = tf.keras.Sequential([ 
        tf.keras.layers.Dense(256, activation = 'relu', input_shape=(word_num,), kernel_regularizer = tf.keras.regularizers.l2(0.002)), 
        tf.keras.layers.Dense(128, activation = 'relu', kernel_regularizer = tf.keras.regularizers.l2(0.002)),
        tf.keras.layers.Dense(1, activation= 'sigmoid')
        ])
    
    return l2_model


''' ์„ธ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ํ›„ ํ•™์Šต์‹œํ‚ค๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ‰๊ฐ€ (binary crossentropy ๊ฐ’ ์ถœ๋ ฅ) '''

def main():
    
    word_num = 100
    data_num = 25000
    
    # Keras์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” imdb ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌ
    (train_data, train_labels), (test_data, test_labels) = tf.keras.datasets.imdb.load_data(num_words = word_num)
    
    train_data = sequences_shaping(train_data, dimension = word_num)
    test_data = sequences_shaping(test_data, dimension = word_num)
    
    basic_model = Basic(word_num)  # ๊ธฐ๋ณธ ๋ชจ๋ธ
    l1_model = L1(word_num)     # L1 ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•  ๋ชจ๋ธ
    l2_model = L2(word_num)     # L2 ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•  ๋ชจ๋ธ
    
    # ๋ชจ๋ธ ์ตœ์ ํ™”
    basic_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy','binary_crossentropy'])
    l1_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy','binary_crossentropy'])
    l2_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy','binary_crossentropy'])
    
    basic_model.summary()
    l1_model.summary()
    l2_model.summary
    
    # ๋ชจ๋ธ ํ•™์Šต
    basic_history = basic_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)
    print('\n')
    l1_history = l1_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)
    print('\n')
    l2_history = l2_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)
    
    # ๋ชจ๋ธ ํ‰๊ฐ€
    scores_basic = basic_model.evaluate(test_data, test_labels)
    scores_l1 = l1_model.evaluate(test_data, test_labels)
    scores_l2 = l2_model.evaluate(test_data, test_labels)
    
    print('\nscores_basic: ', scores_basic[-1])
    print('scores_l1: ', scores_l1[-1])
    print('scores_l2: ', scores_l2[-1])
    
    Visulaize([('Basic', basic_history),('L1 Regularization', l1_history), ('L2 Regularization', l2_history)])
    
    return basic_history, l1_history, l2_history

if __name__ == "__main__":
    main()

 

 

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ]

  • ๊ทœ์ œ๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์€ basic ๋ชจ๋ธ์€ train์˜ crossentropy ๊ฐ’๊ณผ validation์˜ crossentropy ๊ฐ’์ด ์ฐจ์ด๊ฐ€ ํผ -> overfitting๋ฐœ์ƒํ•จ
  • L1, L2 ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์˜ train, validation์˜ cross entropy ๊ฐ’ ์ฐจ์ด๊ฐ€ ํฌ์ง€ ์•Š์Œ -> overfitting์ด ์™„ํ™”๋˜์—ˆ์Œ
  • ์ˆ˜์น˜ ๋ฐ์ดํ„ฐ์—์„œ L1, L2 ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋” ์ข‹์Œ์„ ์•Œ์ˆ˜ ์žˆ์Œ
### output ###
scores_basic:  0.7418451
scores_l1:  0.56926525
scores_l2:  0.56637627

 

 

 

 

 

2. ๋“œ๋กญ ์•„์›ƒ (Drop out) 

  • ๊ฐ layer๋งˆ๋‹ค ์ผ์ • ๋น„์œจ์˜ ๋‰ด๋Ÿฐ์„ ์ž„์˜๋กœ drop์‹œ์ผœ ๋‚˜๋จธ์ง€ ๋‰ด๋Ÿฐ๋“ค๋งŒ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•. 
  • ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•  ๋•Œ, ์ผ๋ถ€ ํผ์…‰ํŠธ๋ก (๋‰ด๋Ÿฐ)์„ ๋žœ๋คํ•˜๊ฒŒ 0์œผ๋กœ ๋งŒ๋“ค์–ด ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ํŠน์ • ๊ฐ€์ค‘์น˜(Weight)์— ์น˜์ค‘๋˜๋Š” ๊ฒƒ์„ ๋ง‰์Œ
  • ๋“œ๋กญ ์•„์›ƒ์„ ์ ์šฉํ•˜๋ฉด ํ•™์Šต๋˜๋Š” ๋…ธ๋“œ์™€ ๊ฐ€์ค‘์น˜๋“ค์ด ๋งค๋ฒˆ ๋‹ฌ๋ผ์ง.
  • ๋‹ค๋ฅธ ์ •๊ทœํ™” ๊ธฐ๋ฒ•๋“ค๊ณผ ์ƒํ˜ธ ๋ณด์™„์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • drop๋œ ๋‰ด๋Ÿฐ์€ backpropagation๋•Œ ์‹ ํ˜ธ ์ฐจ๋‹จ. Test ๊ณผ์ •์—์„œ๋Š” dropout ์‚ฌ์šฉ X(๋ชจ๋“  ๋‰ด๋Ÿฐ์— ์‹ ํ˜ธ ์ „๋‹ฌ)
  • ๋“œ๋กญ ์•„์›ƒ์„ ์ ์šฉํ•  ํ™•๋ฅ  : 0.1 ~ 0.5
  • tf.keras.layers.Dropout(prob)

 

 

[ ๊ธฐ๋ณธ ๋ชจ๋ธ vs dropout ์ ์šฉ ๋ชจ๋ธ ๋น„๊ต ]

import numpy as np
import tensorflow as tf
from visual import *
import logging, os
logging.disable(logging.WARNING)

# ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ํ•จ์ˆ˜
def sequences_shaping(sequences, dimension):
    
    results = np.zeros((len(sequences), dimension))
    for i, word_indices in enumerate(sequences):
        results[i, word_indices] = 1.0 
        
    return results
    
''' ๊ธฐ๋ณธ ๋ชจ๋ธ ์ƒ์„ฑ '''
def Basic(word_num):
    
    basic_model = tf.keras.Sequential([ 
        tf.keras.layers.Dense(256, activation = 'relu', input_shape=(word_num,)), 
        tf.keras.layers.Dense(128, activation = 'relu'),
        tf.keras.layers.Dense(1, activation= 'sigmoid')
        ])
    
    return basic_model
    
''' ๊ธฐ๋ณธ ๋ชจ๋ธ์— ๋“œ๋กญ ์•„์›ƒ ๋ ˆ์ด์–ด ์ถ”๊ฐ€ '''
def Dropout(word_num):
    
    dropout_model = tf.keras.Sequential([ 
        tf.keras.layers.Dense(256, activation = 'relu', input_shape=(word_num,)),
        tf.keras.layers.Dropout(0.3), 
        tf.keras.layers.Dense(128, activation = 'relu'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(1, activation= 'sigmoid')
        ])
    
    return dropout_model

''' ๋‘ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ํ›„ ํ•™์Šต์‹œํ‚ค๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ‰๊ฐ€(binary crossentropy ์ ์ˆ˜ ์ถœ๋ ฅ) '''
def main():
    
    word_num = 100
    data_num = 25000
    
    # Keras์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” imdb ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌ
    (train_data, train_labels), (test_data, test_labels) = tf.keras.datasets.imdb.load_data(num_words = word_num)
    
    train_data = sequences_shaping(train_data, dimension = word_num)
    test_data = sequences_shaping(test_data, dimension = word_num)
    
    basic_model = Basic(word_num)   # ๊ธฐ๋ณธ ๋ชจ๋ธ
    dropout_model = Dropout(word_num)  # ๋“œ๋กญ ์•„์›ƒ ์ ์šฉํ•  ๋ชจ๋ธ
    
    basic_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy','binary_crossentropy'])
    dropout_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy','binary_crossentropy'])
    
    basic_model.summary()
    dropout_model.summary()
    
    basic_history = basic_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)
    print('\n')
    dropout_history = dropout_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)
    
    scores_basic = basic_model.evaluate(test_data, test_labels)
    scores_dropout = dropout_model.evaluate(test_data, test_labels)
    
    print('\nscores_basic: ', scores_basic[-1])
    print('scores_dropout: ', scores_dropout[-1])
    
    Visulaize([('Basic', basic_history),('Dropout', dropout_history)])
    
    return basic_history, dropout_history

if __name__ == "__main__":
    main()

 

 

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ]

  • dropout์„ ์ ์šฉํ•˜์ง€ ์•Š์€ basic ๋ชจ๋ธ์€ train์˜ crossentropy ๊ฐ’๊ณผ validation์˜ crossentropy ๊ฐ’์ด ์ฐจ์ด ํผ -> overfitting๋ฐœ์ƒ
  • dropout์„ ์ ์šฉํ•œ ๋ชจ๋ธ์˜ train, validation์˜ cross entropy ๊ฐ’ ์ฐจ์ด๊ฐ€ ํฌ์ง€ ์•Š์Œ -> overfitting์ด ์™„ํ™”๋˜์—ˆ์Œ
  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ droupout์„ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์˜ binary crossentropy ์ ์ˆ˜๊ฐ€ ๋” ๋‚ฎ์Œ -> dropout ์ ์šฉ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋” ์ข‹์Œ
### output ###
scores_basic:  0.7272758
scores_dropout:  0.60718566

 

 

 

 

3. ๋ฐฐ์น˜ ์ •๊ทœํ™” (Batch Normalization):

  • Normalization(์ •๊ทœํ™”)์„ ์ฒ˜์Œ Input data ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹ ๊ฒฝ๋ง ๋‚ด๋ถ€ Hidden Layer์˜ input์—๋„ ์ ์šฉ
  • ๊ฐ’์˜ ๋ถ„ํฌ๋ฅผ ํ†ต์ผํ•จ (scailing)
  • ๋ฐฐ์น˜ ์ •๊ทœํ™”์˜ ์žฅ์  :
    • ๋งค Layer๋งˆ๋‹ค ์ •๊ทœํ™”๋ฅผ ์ง„ํ–‰ํ•˜๋ฏ€๋กœ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐ๊ฐ’์— ํฌ๊ฒŒ ์˜์กดํ•˜์ง€ ์•Š์Œ. (๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ์ค‘์š”๋„ ๊ฐ์†Œ)
    • ๊ณผ์ ํ•ฉ ์–ต์ œ (Dropout, L1, L2 ์ •๊ทœํ™” ํ•„์š”์„ฑ ๊ฐ์†Œ)
    • ํ•ต์‹ฌ์€ ํ•™์Šต ์†๋„์˜ ํ–ฅ์ƒ

 

 

 

[ ๊ธฐ๋ณธ ๋ชจ๋ธ vs  ๋ฐฐ์น˜ ์ •๊ทœํ™” ์ ์šฉ ๋ชจ๋ธ ๋น„๊ต ]

  • ๋ฐฐ์น˜ ์ •๊ทœํ™”๋Š” ํ•˜๋‚˜์˜ ๋ ˆ์ด์–ด๋กœ์จ Dense ๋ ˆ์ด์–ด์™€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์‚ฌ์ด์—์„œ ์ž‘์šฉ
  • ๋”ฐ๋ผ์„œ, ๊ธฐ๋ณธ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•  ๋•Œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ๋˜‘๊ฐ™์€ ์—ญํ• ์„ ํ•˜๋Š” Activation ๋ ˆ์ด์–ด๋ฅผ ๋”ฐ๋กœ ์ƒ์„ฑํ•ด์•ผ ํ•จ
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import logging
import os
from visual import *
logging.disable(logging.WARNING)

np.random.seed(42)
tf.random.set_seed(42)

# ๊ธฐ๋ณธ ๋ชจ๋ธ
def generate_basic_model():
    basic_model = tf.keras.Sequential([
                  tf.keras.layers.Flatten(input_shape=(28, 28)),
                  tf.keras.layers.Dense(256),
                  tf.keras.layers.Activation('relu'),
                  tf.keras.layers.Dense(128),
                  tf.keras.layers.Activation('relu'),
                  tf.keras.layers.Dense(512),
                  tf.keras.layers.Activation('relu'),
                  tf.keras.layers.Dense(64),
                  tf.keras.layers.Activation('relu'),
                  tf.keras.layers.Dense(128),
                  tf.keras.layers.Activation('relu'),
                  tf.keras.layers.Dense(256),
                  tf.keras.layers.Activation('relu'),
                  tf.keras.layers.Dense(10, activation='softmax')])
    return basic_model

''' ๋ฐฐ์น˜ ์ •๊ทœํ™” ์ ์šฉ ๋ชจ๋ธ(๊ฐ Dense Layer ์‚ฌ์ด์— ์ ์šฉ) '''
def generate_batch_norm_model():
    bn_model = tf.keras.Sequential([
                tf.keras.layers.Flatten(input_shape=(28, 28)),
                tf.keras.layers.Dense(256),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Activation('relu'),
                tf.keras.layers.Dense(128),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Activation('relu'),
                tf.keras.layers.Dense(512),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Activation('relu'),
                tf.keras.layers.Dense(64),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Activation('relu'),
                tf.keras.layers.Dense(128),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Activation('relu'),
                tf.keras.layers.Dense(256),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Activation('relu'),
                tf.keras.layers.Dense(10, activation='softmax')])
    return bn_model


def main():
    # MNIST ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌ
    mnist = tf.keras.datasets.mnist
    (train_data, train_labels), (test_data, test_labels) = mnist.load_data()
    train_data, test_data = train_data / 255.0, test_data / 255.0

    base_model = generate_basic_model() # ๊ธฐ๋ณธ ๋ชจ๋ธ
    bn_model = generate_batch_norm_model() # ๋ฐฐ์น˜ ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•œ ๋ชจ๋ธ

    
    base_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    bn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    base_model.summary()
    bn_model.summary()
    
    base_history = base_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)
    bn_history = bn_model.fit(train_data, train_labels, epochs=20, batch_size=500, validation_data=(test_data, test_labels), verbose=0)

    score_basic = base_model.evaluate(test_data, test_labels)
    score_bn = bn_model.evaluate(test_data, test_labels)

    print('\naccuracy_basic : ', score_basic[-1])
    print('\naccuracy_bn : ', score_bn[-1])

    Visulaize([('Basic', base_history), ('Batch Normalization', bn_history)])

    return base_history, bn_history

if __name__ == "__main__":
    main()

 

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ] 

  • batch norm์„ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์˜ loss ๊ฐ€ ๋” ์ž‘์Œ
  • base model์€ epoch๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ loss๊ฐ€ ๊ฐ์†Œํ•˜๊ธฐ๋„ ํ•˜๊ณ , ์ฆ๊ฐ€ํ•˜๊ธฐ๋„ ํ•จ -> ํ•™์Šต์ด ์•ˆ์ •์ ์œผ๋กœ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์Œ
  • batch norm model์€ epoch๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ loss๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ž„ -> ํ•™์Šต์ด ์•ˆ์ •์ ์œผ๋กœ ์ด๋ฃจ์–ด์ง



Comments