๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

[๋”ฅ๋Ÿฌ๋‹] 5. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต์˜ ๋ฌธ์ œ์  pt.2 : ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• ๋ณธ๋ฌธ

๐Ÿค– AI/Deep Learning

[๋”ฅ๋Ÿฌ๋‹] 5. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต์˜ ๋ฌธ์ œ์  pt.2 : ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•

jimingee 2024. 6. 12. 17:45

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต์˜ ๋ฌธ์ œ์ ์œผ๋กœ ์•„๋ž˜์˜ 4๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

1. ํ•™์Šต ์†๋„ ๋ฌธ์ œ์™€ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜

2. ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ

3. ์ดˆ๊ธฐ๊ฐ’ ์„ค์ • ๋ฌธ์ œ

4. ๊ณผ์ ํ•ฉ ๋ฌธ์ œ

 

์ด๋ฒˆ ์žฅ์—์„œ๋Š” 2. ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ, 3. ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ์„ค์ • ๋ฌธ์ œ์™€ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.


 

2. ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ์™€ ๋ฐฉ์ง€ ๊ธฐ๋ฒ•

 

๊ธฐ์šธ๊ธฐ ์†Œ์‹ค (Vanishing Gradient) 

  • ๋ฐœ์ƒ ์›์ธ : ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ธ ๊ฐ’์„ ์ „๋‹ฌํ•˜๋ฉฐ ์ค‘๊ฐ„ ์ „๋‹ฌ๊ฐ’์ด ์‚ฌ๋ผ์ง€๋Š” ๋ฌธ์ œ
    • ๊ธฐ์šธ๊ธฐ๊ฐ€ ์†Œ์‹ค๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐ˜๋ณต๋˜๋ฉฐ ํ•™์Šต์ด ์ž˜ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์Œ 
    • ๊นŠ์€ ์ธต์˜ ๋ชจ๋ธ์—์„œ ์—ญ์ „ํŒŒ ์‹œ์— ์ „๋‹ฌ๋˜๋Š” ์†์‹ค ํ•จ์ˆ˜(loss function)์˜ gradient ๊ฐ’์— ํ™œ์„ฑํ™” ํ•จ์ˆ˜์ธ sigmoid ํ•จ์ˆ˜์˜ 0์— ๊ฐ€๊นŒ์šด ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด ๊ณ„์†ํ•ด์„œ ๊ณฑํ•ด์ง€๋ฉด์„œ ๊ฒฐ๊ตญ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๊ฐ€ ์ž˜ ์•ˆ๋˜๋Š” ๋ฌธ์ œ
  • ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• :
    • ReLU : ๊ธฐ์กด์— ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•˜๋˜ sigmoid ํ•จ์ˆ˜ ๋Œ€์‹  ReLU ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๊ฒฐ
    • Tanh : ๋‚ด๋ถ€ hidden layer์—๋Š” ReLU๋ฅผ ์ ์šฉํ•˜๊ณ , output layer์—์„œ๋งŒ Tanh ์ ์šฉ
  • ReLU๊ฐ€ sigmoid ๋ณด๋‹ค ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ์— ๊ฐ•ํ•œ ์ด์œ 
    • sigmoid ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ๊ฐ’์ด ๋งค์šฐ ํฌ๊ฑฐ๋‚˜ ์ž‘๋‹ค๋ฉด ๊ธฐ์šธ๊ธฐ๋„ 0์— ๊ฐ€๊นŒ์›Œ์ง
    • ReLU ํ•จ์ˆ˜๋Š” ์–‘์ˆ˜ ์ž…๋ ฅ ๊ฐ’์— ๋Œ€ํ•ด ์ผ์ •ํ•œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐ–๊ณ  ์žˆ์œผ๋ฏ€๋กœ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Œ

 

 

 

[ hidden layer์˜ activation function์ด sigmoid์ธ ๋ชจ๋ธ VS relu์ธ ๋ชจ๋ธ  ๋น„๊ต]

import tensorflow as tf
import logging, os
logging.disable(logging.WARNING)

''' 1. hidden layer์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ `relu`์ธ 10์ธต ์ด์ƒ์˜ ๋ชจ๋ธ '''
def make_model_relu():
    
    model_relu = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    return model_relu
    
''' 2. hidden layer์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ `sigmoid`์ธ 10์ธต ์ด์ƒ์˜ ๋ชจ๋ธ '''
def make_model_sig():
    
    model_sig = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(32, activation='sigmoid'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    return model_sig

''' 3. ๋‘ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜จ ํ›„ ํ•™์Šต์‹œํ‚ค๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ‰๊ฐ€ '''
def main():
   
    # MNIST ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌ
    mnist = tf.keras.datasets.mnist
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    
    model_relu = make_model_relu()  # hidden layer๋“ค์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ relu๋ฅผ ์“ฐ๋Š” ๋ชจ๋ธ
    model_sig = make_model_sig()   # hidden layer๋“ค์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ sigmoid๋ฅผ ์“ฐ๋Š” ๋ชจ๋ธ
    
    # ๋ชจ๋ธ ์ตœ์ ํ™”
    model_relu.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model_sig.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model_relu.summary()
    model_sig.summary()
    
    # ๋ชจ๋ธ ํ•™์Šต
    model_relu_history = model_relu.fit(x_train, y_train, epochs=5, verbose=0)
    print('\n')
    model_sig_history = model_sig.fit(x_train, y_train, epochs=5, verbose=0)
    
    # ๋ชจ๋ธ ํ‰๊ฐ€
    scores_relu = model_relu.evaluate(x_test, y_test)
    scores_sig = model_sig.evaluate(x_test, y_test)
    
    print('\naccuracy_relu: ', scores_relu[-1])
    print('accuracy_sig: ', scores_sig[-1])
    
    return model_relu_history, model_sig_history

if __name__ == "__main__":
    main()

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ ]

  • hidden layer์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ReLU๋ฅผ ์„ ํƒํ•œ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์Œ.
  • ๋”ฐ๋ผ์„œ, ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ์—์„œ ReLU ํ•จ์ˆ˜๋ฅผ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ๋”์šฑ ์ •ํ™•๋„๊ฐ€ ๋†’์Œ
### output ###
accuracy_relu:  0.9632
accuracy_sig:  0.7123

 

 

 

 

3. ์ดˆ๊ธฐ๊ฐ’ ์„ค์ • ๋ฌธ์ œ์™€ ๋ฐฉ์ง€ ๊ธฐ๋ฒ•

 

๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” (weight initialization)

  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์ž…๋ ฅ ๊ฐ’์ด ๋„ˆ๋ฌด ์ปค์ง€๊ฑฐ๋‚˜ ์ž‘์•„์ง€์ง€ ์•Š๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ (๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฐฉ์ง€)
  • ์ดˆ๊ธฐํ™” ์„ค์ • ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• :
    • ํ‘œ์ค€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•œ ์ดˆ๊ธฐํ™” (๋ถ„์‚ฐ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ 0.01๋กœ ํ•˜๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ์ดˆ๊ธฐํ™”)
    • Xavier ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• + sigmoid ํ•จ์ˆ˜ : ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ์ž…๋ ฅ ๊ฐœ์ˆ˜์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ๋‚˜๋ˆ„์–ด ์คŒ. sigmoid์™€ ๊ฐ™์€ S์ž ํ•จ์ˆ˜์˜ ๊ฒฝ์šฐ ์ถœ๋ ฅ ๊ฐ’๋“ค์ด ์ •๊ทœ ๋ถ„ํฌ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ ธ์•ผ ์•ˆ์ •์ ์œผ๋กœ ํ•™์Šต ๊ฐ€๋Šฅ
    • Xavier ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• + ReLU ํ•จ์ˆ˜ : ReLU ํ•จ์ˆ˜์—๋Š” Xavier ์ดˆ๊ธฐํ™”๊ฐ€ ๋ถ€์ ํ•ฉ. ๋ ˆ์ด์–ด๋ฅผ ๊ฑฐ์ณ๊ฐˆ์ˆ˜๋ก ๊ฐ’์ด 0์— ์ˆ˜๋ ด
    • He ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• : ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ๋ฅผ ์ž…๋ ฅ ๊ฐœ์ˆ˜ ์ ˆ๋ฐ˜์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ๋‚˜๋ˆ„์–ด์คŒ. 10์ธต ๋ ˆ์ด์–ด์—์„œ๋„ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 0์œผ๋กœ ์ˆ˜๋ ดํ•˜์ง€ ์•Š์Œ.
  • ์ ์ ˆํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• :
    • Sigmoid, tanh์˜ ๊ฒฝ์šฐ, Xavier ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ํšจ์œจ์ .
    • ReLU๊ณ„์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์‚ฌ์šฉ ์‹œ,  He ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์ด ํšจ์œจ์ 
    • ์ตœ๊ทผ ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์—์„œ๋Š” He ์ดˆ๊ธฐํ™”๋ฅผ ์ฃผ๋กœ ์„ ํƒ

 

[ 1. ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” VS ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ 0.01๋กœ ํ•˜๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ]

import numpy as np
from visual import *
np.random.seed(100)

def sigmoid(x):
    result = 1 / (1 + np.exp(-x))
    return result

def main():
    # 100๊ฐœ์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ์— ๋“ค์–ด๊ฐˆ 1000๊ฐœ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ
    x_1 = np.random.randn(1000,100) 
    x_2 = np.random.randn(1000,100) 
    
    node_num = 100
    hidden_layer_size = 5
    
    activations_1 = {}
    activations_2 = {}
    
    for i in range(hidden_layer_size):
        if i != 0:
            x_1 = activations_1[i-1]
            x_2 = activations_2[i-1]
        
        # ๊ฐ€์ค‘์น˜ ์ •์˜
        w_1 = np.random.randn(100,100)*1 + 0 # ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ - N(0,1)
        w_2 = np.random.randn(100,100)*0.01 + 0 # ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 0.01์ธ ์ •๊ทœ๋ถ„ํฌ - N(0,0.01)
        
        a_1 = np.dot(x_1, w_1)
        a_2 = np.dot(x_2, w_2)
        
        ## sigmoid ํ†ต๊ณผ
        z_1 = sigmoid(a_1)
        z_2 = sigmoid(a_2)
        
        activations_1[i] = z_1
        activations_2[i] = z_2
        
    Visual(activations_1,activations_2)
    
    return activations_1, activations_2

if __name__ == "__main__":
    main()

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ - activation ๊ฒฐ๊ณผ์˜ ๋ถ„ํฌ๋„ ]

  • ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•œ ๋ชจ๋ธ -> activation ๊ฒฐ๊ณผ๊ฐ’์ด 0, 1 ๊ฐ’์œผ๋กœ ๋ชฐ๋ฆผ
  • ํ‘œ์ค€ํŽธ์ฐจ 0.01์ธ ์ •๊ทœ๋ถ„ํฌ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•œ ๋ชจ๋ธ -> activation ๊ฒฐ๊ณผ๊ฐ’์ด 0.5 ์ฃผ๋ณ€์œผ๋กœ ๋ชฐ๋ฆผ

์ •๊ทœ๋ถ„ํฌ์˜ sigmoid ๊ฒฐ๊ณผ ๋ถ„ํฌ๋„ (์™ผ์ชฝ- ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ / ์˜ค๋ฅธ์ชฝ- ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 0.01์ธ ์ •๊ทœ ๋ถ„ํฌ

  • activation ๊ฐ’์ด ์–‘๊ทน๋‹จ(0๋˜๋Š” 1)์œผ๋กœ ๋ชฐ๋ฆฌ๋Š” ํ˜„์ƒ์€ ์ข‹์ง€ ์•Š์Œ
    • ํ™œ์„ฑํ™” ํ•จ์ˆ˜(์˜ˆ, sigmoid ํ•จ์ˆ˜)์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์— ์ˆ˜๋ ด -> ํ•™์Šต์ด ์ž˜ ์ด๋ฃจ์–ด ์ง€์ง€ ์•Š์Œ

 

 

[ 2. Xavier ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” - ํ™œ์„ฑํ™” ํ•จ์ˆ˜(sigmoid & relu)์™€ ๊ฒฐํ•ฉํ–ˆ์„ ๋•Œ ๋น„๊ต]

  • Xavier ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์€ ์•ž ๋ ˆ์ด์–ด์˜ ๋…ธ๋“œ๊ฐ€ n๊ฐœ์ผ ๋•Œ ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ $ \frac{1}{ \sqrt{n}}$ ์ธ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ
  • Xavier ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉด ์•ž ๋ ˆ์ด์–ด์˜ ๋…ธ๋“œ๊ฐ€ ๋งŽ์„์ˆ˜๋ก ๋‹ค์Œ ๋ ˆ์ด์–ด์˜ ๋…ธ๋“œ์˜ ์ดˆ๊นƒ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์ค‘์น˜๊ฐ€ ์ข๊ฒŒ ํผ์ง.
import numpy as np
from visual import *
np.random.seed(100)

def sigmoid(x):
    result = 1 / (1 + np.exp(-x))
    return result

def relu(x):
    result = np.maximum(0,x)
    return result


def main():
    # 100๊ฐœ์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ์— ๋“ค์–ด๊ฐˆ 1000๊ฐœ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ
    x_sig = np.random.randn(1000,100)
    x_relu = np.random.randn(1000,100)
    
    node_num = 100
    hidden_layer_size = 5
    
    activations_sig = {}
    activations_relu = {}
    
    for i in range(hidden_layer_size):
        if i != 0:
            x_sig = activations_sig[i-1]
            x_relu = activations_relu[i-1]
        
        # Xavier ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” - ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ 1/root(n)์ธ ์ •๊ทœ๋ถ„ํฌ
        w_sig = np.random.randn(100,100)*(1/np.sqrt(node_num))+0 
        w_relu = np.random.randn(100,100)*(1/np.sqrt(node_num))+0 
        
        a_sig = np.dot(x_sig, w_sig)
        a_relu = np.dot(x_relu, w_relu)
        
        z_sig = sigmoid(a_sig) # sigmoid ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์ด์šฉ
        z_relu = relu(a_relu) # relu ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์ด์šฉ
        
        activations_sig[i] = z_sig
        activations_relu[i] = z_relu
        
    Visual(activations_sig, activations_relu)
    
    return activations_sig, activations_relu

if __name__ == "__main__":
    main()

 

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ - activation ๊ฒฐ๊ณผ์˜ ๋ถ„ํฌ๋„ ]

  • (์™ผ์ชฝ : sigmoid + Xavier ์ดˆ๊ธฐํ™”) activation ๊ฒฐ๊ณผ ๊ฐ’์ด ์–ด๋А ํ•œ์ชฝ์œผ๋กœ ๋ชฐ๋ฆฌ์ง€ ์•Š๊ณ  ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌ๋จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ
  • (์˜ค๋ฅธ์ชฝ : ReLU + Xavier ์ดˆ๊ธฐํ™”) activation ๊ฒฐ๊ณผ ๊ฐ’์ด ํ•œ์ชฝ(0)์œผ๋กœ ๋ชฐ๋ฆผ -> ReLU ํ•จ์ˆ˜์—๋Š” Xavier ์ดˆ๊ธฐํ™”๊ฐ€ ๋ถ€์ ํ•ฉ

Xavier ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” (์™ผ์ชฝ - sigmoid ํ™œ์„ฑํ™” ํ•จ์ˆ˜ / ์˜ค๋ฅธ์ชฝ - relu ํ™œ์„ฑํ™” ํ•จ์ˆ˜)

 

 

 

[ 3. He ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ]

  • He ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• : ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ReLU๋ฅผ ์“ธ ๋•Œ ํ™œ์„ฑํ™” ๊ฒฐ๊ด๊ฐ’๋“ค์ด ํ•œ์ชฝ์œผ๋กœ ์น˜์šฐ์น˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ ๋ฐฉ๋ฒ•
  • ์•ž ๋ ˆ์ด์–ด์˜ ๋…ธ๋“œ๊ฐ€ n๊ฐœ์ผ ๋•Œ ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ $ \frac{\sqrt{2}}{\sqrt{n}}$์ธ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ
  • Xavier ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์€ ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ $ \frac{1}{\sqrt{n}}$.
    • ReLU๋Š” ์Œ์˜ ์˜์—ญ์— ๋Œ€ํ•œ ํ•จ์ˆซ๊ฐ’์ด 0์ด๋ผ์„œ ๋” ๋„“๊ฒŒ ๋ถ„ํฌ์‹œํ‚ค๊ธฐ ์œ„ํ•ด $ \sqrt{2} $๋ฐฐ์˜ ๊ณ„์ˆ˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Œ.
import numpy as np
from visual import *
np.random.seed(100)
    
def relu(x):
    result = np.maximum(0,x)
    return result

def main():
    # 100๊ฐœ์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ์— ๋“ค์–ด๊ฐˆ 1000๊ฐœ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ - ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ ๋”ฐ๋ฆ„
    x_relu = np.random.randn(1000,100)
    
    node_num = 100
    hidden_layer_size = 5
    
    activations_relu = {}
    
    for i in range(hidden_layer_size):
        if i != 0:
            x_relu = activations_relu[i-1]
            
        # He ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” - ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ root(2)/root(n)์ธ ์ •๊ทœ๋ถ„ํฌ
        w_relu = np.random.randn(100,100)*np.sqrt(2/node_num)+0
        
        a_relu = np.dot(x_relu,w_relu)
        
        z_relu = relu(a_relu)
        
        activations_relu[i] = z_relu
        
    Visual(activations_relu)
    
    return activations_relu    

if __name__ == "__main__":
    main()

 

 

 

[ ์ฝ”๋“œ ์‹คํ–‰ ๊ฒฐ๊ณผ - activation ๊ฒฐ๊ณผ์˜ ๋ถ„ํฌ๋„ ]

  • ์•ž์„  'ReLU + Xavier๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•'๋ณด๋‹ค 'ReLU + He ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•'์ด activation ๊ฒฐ๊ณผ๊ฐ€ ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌ๋˜์–ด ์žˆ์Œ

์™ผ์ชฝ - ReLU+Xavier ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ• / ์˜ค๋ฅธ์ชฝ - ReLU+He  ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•

 

Comments