๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

05. Regularization ๋ณธ๋ฌธ

๐Ÿค– AI/Machine Learning

05. Regularization

jimingee 2022. 3. 25. 12:51

๋ชฉ์ฐจ

    ์ €๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ์ง€๋„ํ•™์Šต์˜ Linear Regression๋ชจ๋ธ๊ณผ Logistic Regression ๋ชจ๋ธ์„ ๋ฐฐ์› ๋‹ค.

    ์ด๋ฒˆ์—๋Š” hํ•จ์ˆ˜๋ฅผ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.๐Ÿ˜€

    1.  Overfitting Problem

    ๊ฐ€์žฅ ์™ผ์ชฝ ๊ทธ๋ž˜ํ”„์˜ ๊ฒฝ์šฐ,  hํ•จ์ˆ˜(๊ฐ€์„ค ํ•จ์ˆ˜)๋ฅผ θ์— ๋Œ€ํ•œ 1์ฐจ ๋ฐฉ์ •์‹์œผ๋กœ ์ •์˜ โžก ๋ฐ์ดํ„ฐ์˜ ์˜ˆ์ธก์ด ์ผ์น˜ํ•˜์ง€ ์•Š๋Š”๋‹ค.

    ์ผ๋ฐ˜์ ์œผ๋กœ ๋„ˆ๋ฌด ๋‹จ์ˆœํ•˜๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ์ ์€ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋Šฅ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•œ๋‹ค.

    ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ๋ฅผ Underfit ๋˜๋Š” High Bias๋ผ ํ•œ๋‹ค.

     

    ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ ๊ทธ๋ž˜ํ”„์˜ ๊ฒฝ์šฐ, hํ•จ์ˆ˜๋ฅผ ๋‹ค์ฐจ์›๋ฐฉ์ •์‹์œผ๋กœ ์ •์˜ โžก ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ ๊ฒฐ๊ณผ ๊ฐ’์„ ๋งŒ์กฑํ•˜๋Š” ํ˜•ํƒœ๋ฅผ ๋ˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

    training data set์—์„œ๋Š” ์ตœ์ ํ™”๊ฐ€ ์ž˜ ๋˜์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ƒˆ๋กœ์šด data์— ๋Œ€ํ•œ ์ •ํ™•๋„๋Š” ์žฅ๋‹ดํ•  ์ˆ˜ ์—†๋‹ค.

    ์ฆ‰, ๋„ˆ๋ฌด ์ƒ˜ํ”Œ๋ฐ์ดํ„ฐ์— ๊ณผํ•˜๊ฒŒ ์ตœ์ ํ™”๋˜์–ด์žˆ์–ด ์ผ๋ฐ˜ํ™”ํ•˜๊ธฐ ์–ด๋ ต๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

    ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ๋ฅผ Overfit ๋˜๋Š” High variance๋ผ ํ•œ๋‹ค.

     

    ์ด์ œ ๊ฐ€์šด๋ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋„๋ก ํ•˜์ž. 

    h ํ•จ์ˆ˜๋ฅผ 2์ฐจ ๋ฐฉ์ •์‹์œผ๋กœ ์ •์˜ํ•˜์—ฌ dataset์— ์ ํ•ฉํ•˜๋ฉด์„œ feature๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋‹ค.

    ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ๋ฅผ Just Right๋ผ๊ณ  ํ•˜๋ฉฐ, ์ตœ์ ํ™”๊ฐ€ ์ ์ ˆํžˆ ๋˜์—ˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

     


    ์ด์ œ Logistic Regression์˜ ์˜ˆ๋ฅผ ๋ณด๋„๋ก ํ•˜์ž.

    Linear Regression์˜ overfitting์„ ์ดํ•ดํ–ˆ๋‹ค๋ฉด ๋˜‘๊ฐ™์€ ๋กœ์ง์œผ๋กœ ์ดํ•ด๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

    ๋” ๋งŽ์€ feature๋ฅผ ์‚ฝ์ž… ํ• ์ˆ˜๋ก, training data set์— ์ž˜ fit ์ด ๋œ๋‹ค.
    ํ•˜์ง€๋งŒ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ์ž…๋ ฅ๋˜์—ˆ์„ ๋•Œ ํšŒ๊ท€ ๋˜๋Š” ๋ถ„๋ฅ˜๋ชจ๋ธ์—์„œ ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ํฌ๋‹ค. 
    ์ด์™€ ๊ฐ™์ด training data์— ์ง€๋‚˜์น˜๊ฒŒ fit ๋˜์–ด ์ผ๋ฐ˜์ ์ธ ์ถ”์„ธ๋ฅผ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๋ฅผ overfitting์ด๋ผ ํ•œ๋‹ค.

     

    ์—ฌ๊ธฐ, overfitting์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

     

    1. feature์˜ ์ˆ˜ ์ค„์ด๊ธฐ

    • ์ค‘์š”ํ•œ feature๋งŒ ๋‚จ๊ธฐ๊ธฐ
    • model selection algorithm(๋ชจ๋ธ์ด ๋นผ์•ผ ํ•˜๋Š” feature ์•Œ๋ ค์ค€๋‹ค.)

    2. regularization(์ •๊ทœํ™”)

    • ๋ชจ๋“  feature๋Š” ์œ ์ง€ํ•˜๋˜, parameter θ์˜ ๊ทœ๋ชจ(magnitude) ์ค„์ด๊ธฐ

     

     

    2.  Cost Function

    ์ •๊ทœํ™”์˜ ๊ฐœ๋…์„ ์•Œ๊ธฐ ์œ„ํ•ด์„œ cost function์˜ ์„ค๋ช…์ด ํ•„์š”ํ•˜๋‹ค. 

     

    ๊ทธ๋ฆผ์˜ ์˜ค๋ฅธ์ชฝ ๊ทธ๋ž˜ํ”„๋Š” Linear Regressino์—์„œ ๊ณผ์ ํ•ฉ์ด ๋œ ์˜ˆ์‹œ์ด๋‹ค.

    ์ด๋•Œ, θ3, θ4์— ๊ฐ๊ฐ 1000์„ ๊ณฑํ•œ cost function์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž. ์ด cost function์€ θ์˜ ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์„ ๊ตฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— parameter(θ3,θ4)์˜ ๊ฐ’์€ ๊ฑฐ์˜ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์ด ๋  ๊ฒƒ์ด๋‹ค. 

    ๊ฒฐ๊ตญ, h ํ•จ์ˆ˜์—์„œ ๋’ค์— 2๊ฐœ ํ•ญ์ด 0์— ๊ทผ์ ‘ํ•˜๋ฏ€๋กœ 2์ฐจ ๋ฐฉ์ •์‹์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. 

     

    n๊ฐœ์˜ parameters์—์„œ  ์ผ๋ถ€ parameter๋ฅผ (0์— ๊ทผ์‚ฌํ•œ)์ž‘์€ ๊ฐ’์œผ๋กœ ๋งŒ๋“ค์–ด h ํ•จ์ˆ˜๊ฐ€ ์‹ฌํ”Œํ•ด์ง€๋„๋ก ํ•œ๋‹ค.
    ์ด ๋ฐฉ๋ฒ•์„ overfitting์„ ์ ๊ฒŒ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ •๊ทœํ™”๋ผ๊ณ  ํ•œ๋‹ค.

     

    3.  Regularization

     

    ์ด๋ฅผ ๊ณต์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด cost function์— regulation์‹์„ ์ถ”๊ฐ€ํ•˜์—ฌ Jํ•จ์ˆ˜๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. 

    ์œ„์˜ ์‹์—์„œ λ(lambda)๋ฅผ regularization parameter์ด๋ผ ํ•œ๋‹ค.  λ๋Š” cost function๊ฐ€ ์ž˜ ์ ์šฉ์ด ๋ ์ˆ˜ ์žˆ๋„๋ก ์กฐ์ ˆํ•œ๋‹ค.

    ** λ๊ฐ€ ๋„ˆ๋ฌด ํฐ ๊ฒฝ์šฐ์—๋Š” θ์˜ parameter ๊ฐ’์ด ์ „๋ถ€ 0์ด ๋˜์–ด underfittingํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๋ž˜ํ•œ๋‹ค. 

     

     

     

    4.  Regularized Linear Regression

    ์ด์ œ linear regression๊ณผ logistic regression์— regularize๋ฅผ ์ ์šฉํ•ด๋ณด์ž.

    Linear regression์˜ ์ตœ์  $ \theta $parameter๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์€ gradient descent์™€ normal equation, 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

     

    Gradient Descent

    ์•ž์—์„œ cost function์— $ {\lambda} $ ํ•ญ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ •๊ทœํ™”๋œ cost function์„ ๋งŒ๋“ค์—ˆ๋‹ค.  ์ด์ œ gradient descent ๊ณต์‹์— ์ ์šฉํ•ด ๋ณด์ž.

    $ \theta_{0} $๋Š” ์ •๊ทœํ™”๋˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ, ์‹์„ ๋ถ„๋ฆฌํ•˜๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•œ๋‹ค. 

     

    

    ์‹์„ ํ•˜๋‚˜๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    ์ด ์‹์—์„œ ํฅ๋ฏธ๋กœ์šด ์ ์€ ํ•ญ์ƒ $ 1 - \alpha \lambda / m < 1$์ด๋‹ค. ๊ทธ๋ž˜์„œ $ \theta_{j} $๊ฐ€ update ํ•  ๋•Œ๋งˆ๋‹ค ์ค„์–ด๋“ ๋‹ค.

    ์˜ค๋ฅธ์ชฝ ํ•ญ์€ ๊ธฐ์กด์˜ linear regression์˜ gradient descent์™€ ๋˜‘๊ฐ™๋‹ค.

     

    Normal Equation

    X matrix๋Š” m x (n+1)์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„๋‹ค. Normal equation์— regularize๋ฅผ ์ ์šฉํ•˜๋ ค๋ฉด $ X^{T}X $ํ•ญ ๋’ค์— $ \lambda L $์„ ๋”ํ•œ๋‹ค. 

    ์ด ๋•Œ L matrix๋Š” ์ฒซ๋ฒˆ์งธ ๊ฐ’๋งŒ 0์ธ identity matrix์ด๋‹ค. [์ฐจ์›์€ (n+1) x (n+1)]

     

     

    m < n์ด๋ฉด, $ X^{T}X $๋Š” non-invertible, singular์ด๋‹ค. 

    ํ•˜์ง€๋งŒ  ์ •๊ทœํ™”๋ฅผ ํ•ด์ฃผ๋ฉด,  $ X^{T}X + \lambda*L $์€ invertible ํ•ด์ง„๋‹ค.

     

     

     

    5.  Regularized Logistic Regression

    Logistic regression์—์„œ regularize๋ฅผ ํ•˜๋Š” ๋ฐฉ์‹์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.

    [์œ„์— ์‹๋„ ๋‚ด๊ฐ€ ์ •๋ฆฌํ•˜๊ธฐ ] https://wikidocs.net/4331 ์ฒ˜๋Ÿผ ๊น”๋”ํ•˜๊ฒŒ!!!

    Regularized ๋œ cost function์€ ๊ทธ๋ฆผ์˜ ์•„๋ž˜ $ J(\theta) $ ๊ณผ ๊ฐ™๋‹ค. ๊ธฐ์กด cost function์— $ \lambda $ ํ•ญ์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค. 

    $ theta_{0} $๋ฅผ ์ •๊ทœํ™”ํ•˜์ง€ ์•Š์€ ๊ฒƒ์— ์ฃผ์˜ํ•ด์•ผ ํ•œ๋‹ค.

     

    ์ด๋ฅผ gradient descent ์— ์ ์šฉํ•˜๋ฉด, 

    ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ $ \theta_{0} $๋Š” ์ •๊ทœํ™”๋˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ, ์‹์„ ๋ถ„๋ฆฌํ•œ๋‹ค.

    ์ด ๊ณต์‹์€ regularized linear regression์˜  gradient descent์™€ ์‹์ด ๊ฐ™์ง€๋งŒ $h$ ํ•จ์ˆ˜๊ฐ€ ๋‹ฌ๋ผ ๋‹ค๋ฅธ ํ•จ์ˆ˜์ด๋‹ค. 

     

    ์œ„์˜ ์‹์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

     

     

     

    ๊น”๋”ํ•˜๊ฒŒ ์‹ ์ •๋ฆฌ!!1

     

    + ์ฐธ๊ณ  ์ž๋ฃŒ

    https://towardsdatascience.com/regularization-an-important-concept-in-machine-learning-5891628907ea

     

    REGULARIZATION: An important concept in Machine Learning

    Hello reader,

    towardsdatascience.com

    https://wikidocs.net/4329

     

    1) Cost Function

    [TOC] # Intuition ![](https://wikidocs.net/images/page/4329/rg201.PNG) ์ด ๋•Œ, $\theta_3$, $\the ...

    wikidocs.net

     

    '๐Ÿค– AI > Machine Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

    04. Logistic Regression  (1) 2022.03.23
    03. Linear Regression with Multiple Variable  (1) 2022.02.10
    02. Linear regression with one variable  (1) 2022.02.04
    01. introduction  (0) 2022.01.20
    Machine Learning ์ •๋ฆฌ  (0) 2022.01.20
    Comments