๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

[๋”ฅ๋Ÿฌ๋‹] 2. Backpropagation์˜ ํ•™์Šต ๋ณธ๋ฌธ

๐Ÿค– AI/Deep Learning

[๋”ฅ๋Ÿฌ๋‹] 2. Backpropagation์˜ ํ•™์Šต

jimingee 2024. 6. 7. 14:59

 

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ์˜ ํ•™์Šต : loss function์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ (gradient descent ๋“ฑ) ์‚ฌ์šฉ

  • ๋”ฅ๋Ÿฌ๋‹์—์„œ๋Š” ์—ญ์ „ํŒŒ(backpropagation)์„ ํ†ตํ•ด ๊ฐ ๊ฐ€์ค‘์น˜๋“ค์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ
  • backpropagation ์ •์˜ : ๋ชฉํ‘œ target ๊ฐ’๊ณผ ์‹ค์ œ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ output ๊ฐ’์ด ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๋‚˜๋Š”์ง€ ๊ตฌํ•œ ํ›„, ์˜ค์ฐจ ๊ฐ’์„ ๋‹ค์‹œ ๋’ค๋กœ ์ „ํŒŒํ•ด๊ฐ€๋ฉฐ ๋ณ€์ˆ˜๋“ค์„ ๊ฐฑ์‹ ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

1. Backpropagation์˜ ํ•™์Šต ๋‹จ๊ณ„

backpropataion์€ ์ฒด์ธ ๋ฃฐ(chain rule)์„ ์‚ฌ์šฉํ•˜์—ฌ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธ

 

1. Forward Propagation

  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ ๊ฒฝ๋ง์„ ํ†ต๊ณผํ•˜๋ฉด์„œ ๊ฐ ์ธต์—์„œ์˜ ์ถœ๋ ฅ ๊ณ„์‚ฐ
  • ๊ฐ ๋‰ด๋Ÿฐ์˜ ์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋จ : 

$$ z=W⋅x+b $$

$$ a=σ(z) $$

์—ฌ๊ธฐ์„œ $ W $๋Š” ๊ฐ€์ค‘์น˜, $ x $๋Š” ์ž…๋ ฅ, $ b $๋Š” ๋ฐ”์ด์–ด์Šค, $ \sigma $๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜.

 

 

2. Loss Calculation (์†์‹ค ํ•จ์ˆ˜ ๊ณ„์‚ฐ)

  • ์ถœ๋ ฅ์ธต์—์„œ ๋‚˜์˜จ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ ๋ ˆ์ด๋ธ”์„ ๋น„๊ตํ•˜์—ฌ ์†์‹ค ํ•จ์ˆ˜ ๊ณ„์‚ฐ
  • ์˜ˆ๋ฅผ ๋“ค์–ด, ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(MSE)์˜ ๊ฒฝ์šฐ :

$$ L=\frac{1}{n}(y−\hat{y})^2 $$

์—ฌ๊ธฐ์„œ $ y $๋Š” ์‹ค์ œ๊ฐ’,  $ \hat{y} $๋Š” ์˜ˆ์ธก๊ฐ’.

 

 

3. ์ถœ๋ ฅ์ธต์—์„œ์˜ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ

  • ์†์‹ค ํ•จ์ˆ˜ $ L $์— ๋Œ€ํ•œ ์ถœ๋ ฅ์ธต์˜ ๊ฐ€์ค‘์น˜ $ W $์˜ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ
  • ์ถœ๋ ฅ์ธต์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ $ \sigma $์ผ ๋•Œ, ์ถœ๋ ฅ์ธต์˜ ์†์‹ค ํ•จ์ˆ˜์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ:

$$ \frac{\partial L}{\partial \hat{y}} = \hat{y} - y $$

$$  \frac{\partial \hat{y}}{\partial z} = \sigma'(z) $$

$$ \frac{\partial z}{\partial W} = a_{\text{previous}} $$

์—ฌ๊ธฐ์„œ $ a_{\text{previous}} $๋Š” ์ด์ „ ์ธต์˜ ํ™œ์„ฑํ™” ๊ฐ’.

 

 

 

 

4. ์—ญ์ „ํŒŒ(Backpropagation) ๋‹จ๊ณ„

  • ์ถœ๋ ฅ์ธต์—์„œ ๊ณ„์‚ฐ๋œ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ด์ „ ์ธต์œผ๋กœ ์ „ํŒŒ
  • ์ฒด์ธ ๋ฃฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ „ ์ธต์˜ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•จ:

$$ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial W} $$

  • ์ด๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ๊ฐ ์ธต์˜ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ. ๊ฐ ์ค‘๊ฐ„ ์ธต $ l $์—์„œ ๊ธฐ์šธ๊ธฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋จ:

$$ \delta^l = (\delta^{l+1} \cdot W^{l+1}) \odot \sigma'(z^l) $$ 

์—ฌ๊ธฐ์„œ $ \delta^l $ ๋Š” ์ธต $ l $์—์„œ์˜ ์˜ค์ฐจ,  $ \delta^{l+1} $ ๋Š” ๋‹ค์Œ ์ธต์˜ ์˜ค์ฐจ, $ W^{l+1} $๋Š” ๋‹ค์Œ ์ธต์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ, $ \odot $ ๋Š” ์š”์†Œ๋ณ„ ๊ณฑ(element-wise product).

 

 

5. ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ:

  • ๊ฐ ์ธต์˜ ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค๋ฅผ ๊ธฐ์šธ๊ธฐ์™€ ํ•™์Šต๋ฅ  $ \eta $๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—…๋ฐ์ดํŠธํ•จ:

$$ W := W - \eta \frac{\partial L}{\partial W} $$ 

$$b := b - \eta \frac{\partial L}{\partial b} $$ 

์ด ๊ณผ์ •์„ ๋ชจ๋“  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ฐ˜๋ณตํ•˜์—ฌ ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค๋ฅผ ์ ์ง„์ ์œผ๋กœ ์กฐ์ •ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ด.

๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ด ๋ณต์žกํ•œ ๊ณผ์ •์„ ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์—, ์ƒ๋Œ€์ ์œผ๋กœ ์‰ฝ๊ฒŒ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ.

 

 

 

์•ž ์„  backpropagation ๊ณผ์ •์„ ์ฝ”๋“œ๋กœ ํ‘œํ˜„ํ•˜๋ฉด, 

(์˜ˆ์ œ๋Š” 2์ธต ์‹ ๊ฒฝ๋ง(1๊ฐœ์˜ ์€๋‹‰์ธต๊ณผ 1๊ฐœ์˜ ์ถœ๋ ฅ์ธต)์œผ๋กœ ๊ตฌ์„ฑ, ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์†์‹ค ํ•จ์ˆ˜๋กœ ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(MSE)๋ฅผ ์‚ฌ์šฉํ•จ)

 

* backward ํ•จ์ˆ˜์˜ ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค ์—…๋ฐ์ดํŠธ ๋ถ€๋ถ„ ์ค‘์ ์ ์œผ๋กœ ๋ณด๊ธฐ!

import numpy as np

# ํ™œ์„ฑํ™” ํ•จ์ˆ˜(sigmmoid)์™€ ๊ทธ ๋ฏธ๋ถ„ ๋„ํ•จ์ˆ˜
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# ์†์‹ค ํ•จ์ˆ˜(MSE)์™€ ๊ทธ ๋ฏธ๋ถ„ ๋„ํ•จ์ˆ˜ 
def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

def mean_squared_error_derivative(y_true, y_pred):
    return y_pred - y_true

# ์‹ ๊ฒฝ๋ง ํด๋ž˜์Šค ์ •์˜
class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค ์ดˆ๊ธฐํ™”
        self.weights_input_hidden = np.random.randn(input_size, hidden_size)
        self.bias_hidden = np.zeros((1, hidden_size))
        self.weights_hidden_output = np.random.randn(hidden_size, output_size)
        self.bias_output = np.zeros((1, output_size))
        
    def forward(self, X):
        # ์ „๋ฐฉ ์ „๋‹ฌ (forward propagation)
        self.hidden_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = sigmoid(self.hidden_input)
        self.output_input = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.output = sigmoid(self.output_input)
        return self.output

    def backward(self, X, y, output):
        # ์ถœ๋ ฅ์ธต์˜ ์˜ค์ฐจ ๊ณ„์‚ฐ
        error = mean_squared_error_derivative(y, output)
        d_output = error * sigmoid_derivative(output)
        
        # ์€๋‹‰์ธต์˜ ์˜ค์ฐจ ๊ณ„์‚ฐ
        error_hidden = d_output.dot(self.weights_hidden_output.T)
        d_hidden = error_hidden * sigmoid_derivative(self.hidden_output)
        
        # ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค ์—…๋ฐ์ดํŠธ
        self.weights_hidden_output -= self.hidden_output.T.dot(d_output) * learning_rate
        self.bias_output -= np.sum(d_output, axis=0, keepdims=True) * learning_rate
        self.weights_input_hidden -= X.T.dot(d_hidden) * learning_rate
        self.bias_hidden -= np.sum(d_hidden, axis=0, keepdims=True) * learning_rate

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            output = self.forward(X)
            self.backward(X, y, output)
            if epoch % 100 == 0:
                loss = mean_squared_error(y, output)
                print(f'Epoch {epoch}, Loss: {loss}')

# ๋ฐ์ดํ„ฐ ์ƒ์„ฑ (์˜ˆ: XOR ๊ฒŒ์ดํŠธ)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# ์‹ ๊ฒฝ๋ง ์ดˆ๊ธฐํ™”
input_size = 2
hidden_size = 2
output_size = 1
learning_rate = 0.1
epochs = 10000

nn = SimpleNeuralNetwork(input_size, hidden_size, output_size)

# ์‹ ๊ฒฝ๋ง ํ•™์Šต
nn.train(X, y, epochs, learning_rate)

# ์˜ˆ์ธก ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print("Predictions:")
print(nn.forward(X))

 

Comments