๊ด€๋ฆฌ ๋ฉ”๋‰ด

Done is Better Than Perfect

[๋”ฅ๋Ÿฌ๋‹] 7. CNN ๋ณธ๋ฌธ

๐Ÿค– AI/Deep Learning

[๋”ฅ๋Ÿฌ๋‹] 7. CNN

jimingee 2024. 6. 27. 22:12

1. ์ด๋ฏธ์ง€์™€ Convolution ์—ฐ์‚ฐ

 

๊ธฐ์กด์˜ ๋”ฅ๋Ÿฌ๋‹์—์„œ ์‚ฌ์šฉํ•˜๋Š” Fully-connected Layer๋Š” 1์ฐจ์› ๋ฐ์ดํ„ฐ (์„ ํ˜• ๋ฐ์ดํ„ฐ)๋ฅผ input์œผ๋กœ ์š”๊ตฌํ•จ

  • ์ด๋ฏธ์ง€๋ฅผ ๋‹จ์ˆœํ•˜๊ฒŒ 1์ฐจ์›์œผ๋กœ ๋ฐ”๊พธ๋ฉด 2์ฐจ์› ์ƒ์—์„œ ๊ฐ€์ง€๋Š” ์ •๋ณด (์‚ฌ๋ฌผ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ๊ด€๊ณ„, ์ƒ‰์˜ ๋ณ€ํ™” ๋“ฑ)๋ฅผ ํฌ๊ธฐํ•ด์•ผ ํ•จ
  • ์ฆ‰, ๊ณต๊ฐ„ ์ •๋ณด (spatial information)๊ฐ€ ๋ฌด๋„ˆ์ง (-> FC layer๋กœ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†์Œ)

๋”ฐ๋ผ์„œ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ์— ํŠนํ™”๋œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋“ฑ์žฅ -> CNN (Convolutional Neural Network)

 

 

  • CNN์˜ ๋Œ€ํ‘œ ๊ตฌ์„ฑ์š”์†Œ 
    • Convolutional Layer
    • Pooling Layer
    • ๋ถ„๋ฅ˜๊ธฐ (classifier) : fully-connected layer๋กœ ๊ตฌ์„ฑ

 

[ Convolution ์—ฐ์‚ฐ ] 

  • CNN์„ ๊ตฌํ˜„ํ•˜๋Š” ํ•ต์‹ฌ ์—ฐ์‚ฐ
  • ์ปค๋„(kernel)๊ณผ Convolution ์—ฐ์‚ฐ
  • ์ด๋ฏธ์ง€(input)์™€ ์ปค๋„(kernel = filter) ๊ฐ„์˜ convolution ์—ฐ์‚ฐ์œผ๋กœ ์ฒ˜๋ฆฌ

 

 

  • 2์ฐจ์› ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ : ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ ( ํ–‰๋ ฌ์˜ ๊ฐ ์›์†Œ๋Š” ํ•ด๋‹น ์œ„์น˜์˜ ์ด๋ฏธ์ง€ ํ”ฝ์…€ ๊ฐ’ )
  • Convolution kernel : ํ–‰๋ ฌ๋กœ ํ‘œํ˜„

Convolution ์—ฐ์‚ฐ์€ 2์ฐจ์› ์ƒ์—์„œ ์—ฐ์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋ฏ€๋กœ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ˜•์—†์ด ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

 

 

convolution ์—ฐ์‚ฐ ๊ณผ์ •

  • ํ–‰๋ ฌ ๋‚ด๋ถ€์˜ ์š”์†Œ๋“ค์„ ์š”์†Œ ๋ณ„๋กœ(element-wise) ๊ณฑํ•ด์„œ ๋”ํ•จ -> ์„ ํ˜• ์—ฐ์‚ฐ
  • kernel์„ ์ด๋ฏธ์ง€ ์˜์—ญ ๋‚ด์—์„œ convolution ์—ฐ์‚ฐ ์ˆ˜ํ–‰
  • ์—ฐ์‚ฐ ๊ณผ์ • : ์ปค๋„์ด ์ด๋ฏธ์ง€์˜ ๋…ธ๋ž€์ƒ‰ ์˜์—ญ์— ๊ฒน์ณ์ง

 

 

[ Feature Map (Activation Map) ]

  • Convolution ์—ฐ์‚ฐ ๊ฒฐ๊ณผ : Feature Map ๋˜๋Š” Activation Map ์ด๋ผ ๋ถ€๋ฆ„ ( Feature Map : ์ด๋ฏธ์ง€์˜ ํŠน์ง• ์ถ”์ถœ )
  • ์ปค๋„๊ณผ ์ด๋ฏธ์ง€๊ฐ€ ๊ฒน์น˜๋Š” ์˜์—ญ : ์ˆ˜์šฉ ์˜์—ญ(Receptive Field)

 

[ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์˜ convolution ์—ฐ์‚ฐ ]

  • ์•ž์„  ์˜ˆ์‹œ๋Š” ์ด๋ฏธ์ง€์˜ ์ฑ„๋„์ด 1๊ฐœ ํ‘๋ฐฑ ์ด๋ฏธ์ง€
  • ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋Š” ์ฑ„๋„์ด 3๊ฐœ   ์ปค๋„๋„ ์ฑ„๋„์„ 3๊ฐœ๋กœ ์ค€๋น„
  • ๊ฐ ์ฑ„๋„ ๋ณ„๋กœ Convolution ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ฐ ๊ฒฐ๊ณผ(Feature Map)๋ฅผ ์š”์†Œ๋ณ„๋กœ(element-wise) ๋”ํ•ด ํ•˜๋‚˜์˜ Feature Map์„ ์ƒ์„ฑ

 

 

 

[ Convolution ์—ฐ์‚ฐ ํ™•์žฅ ]

  • ์ปค๋„์„ ์—ฌ๋Ÿฌ๊ฐœ ๋‘๋ฉด Feature Map๋„ ์—ฌ๋Ÿฌ๊ฐœ ์ƒ์„ฑ
    • ๋…ธ๋ž€์ƒ‰ ํ–‰๋ ฌ : Filter 1(kernel 1)์„ ์‚ฌ์šฉํ•œ feature map (3๊ฐœ์˜ feature map์„ ๋”ํ•ด ํ•˜๋‚˜์˜ feature map ํ˜•์„ฑ)
    • ์ฃผํ™ฉ์ƒ‰ ํ–‰๋ ฌ : Filter 2(kernel 2)์„ ์‚ฌ์šฉํ•œ feature map
    • output : ์ด 2๊ฐœ์˜ ์ฑ„๋„์„ ๊ฐ€์ง€๋Š” feature map ์ƒ์„ฑ

 

 

 

2. Convolutional Neural Network

  • ์ง€๊ธˆ๊นŒ์ง€ ์‚ฌ์šฉํ•œ ์ปค๋„๋“ค์€ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์ปค๋„  -> ์ฆ‰, ์ปค๋„ ํ–‰๋ ฌ์˜ ๊ฐ ๊ฐ’๋“ค์ด ๊ฐ€์ค‘์น˜(Weight)
  • ์ด๋Ÿฌํ•œ ์ปค๋„๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ Layer๋ฅผ Convolutional Layer๋ผ๊ณ ๋ถ€๋ฆ„
  • ์ด Layer๋“ค์„ ์Œ“์•„์„œ CNN์„ ๊ตฌ์„ฑ

  • ์ฒซ๋ฒˆ์งธ convolutional layer : ๋นจ๊ฐ„์ƒ‰ feature map๊ณผ ํŒŒ๋ž€์ƒ‰ feature map ์‚ฌ์ด layer
    ( 6๊ฐœ์˜ kernel์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‘๋ฒˆ์งธ ํŒŒ๋ž€์ƒ‰ feature map์ด 6๊ฐœ์˜ channel์„ ๊ฐ€์ง ) 
  • ๋‘๋ฒˆ์งธ convolutional layer : 10๊ฐœ์˜ kernel์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๋ก์ƒ‰ feature map์ด 10๊ฐœ์˜ channel์„ ๊ฐ€์ง
  • ํ•˜๋‚˜์˜ convolutional layer๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ channel๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์Œ -> ์ด layer๋ฅผ ์Œ“์•„ CNN ๊ตฌ์„ฑ

 

Convolutional Layer

  • kernel์„ ์ด์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ feature๋ฅผ ์ถ”์ถœํ•˜๋Š” layer
  • convolutional layer์—์„œ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š” hyperparameter : ์ปค๋„์˜ ๊ฐœ์ˆ˜, ์ปค๋„์˜ ํฌ๊ธฐ, stride ๋“ฑ

 

 

 

 

[ Layer์˜ ์—ญํ•  ]

  • ์ด๋ฏธ์ง€๊ฐ€ ๊ฐ€์ง€๋Š” ํŠน์ • Feature๋ฅผ ๋ฝ‘์•„๋‚ด๋„๋ก ์ปค๋„์„ ํ•™์Šต
  • ์ปค๋„์— ๋”ฐ๋ผ ์ถ”์ถœํ•˜๋Š” Feature๋ฅผ ๋‹ค๋ฅด๊ฒŒ ํ•™์Šต (Feture ์˜ˆ์‹œ : ์ด๋ฏธ์ง€๋‚ด์˜ ๋Œ€๊ฐ์„ , ์›ํ˜•, ์ƒ‰์กฐ ๋“ฑ๋“ฑ)
  • ๊ทธ๋ฆผ์—์„œ ํ–‰๋ ฌ์˜ ํ•œ์นธ = Feature Map ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ, ์ฆ‰ ์ด 64๊ฐœ์˜ chennel์ด ์žˆ์Œ

 

 

 

 

 

 

Convolution ์—ฐ์‚ฐ๊ณผ์ •์„ ์กฐ์ ˆํ•˜๊ธฐ ์œ„ํ•œ Hyper parameter  =>  stride, padding

 

[ Stride ] 

  • ์ปค๋„์ด ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ์ด๋™ํ•˜๋Š” ์นธ ์ˆ˜๋ฅผ ์กฐ์ ˆ
  • ์•ž์„  Convolution ์—ฐ์‚ฐ์—์„œ ๋ณด์—ฌ์ค€ ์˜ˆ์‹œ๋Š” ๋ชจ๋‘ 1์นธ ( ์•„๋ž˜ ๊ทธ๋ฆผ์€ Stride๊ฐ€ 2์นธ์ผ ๊ฒฝ์šฐ ์˜ˆ์‹œ )

 

[ Padding ] 

  • ์•ž์„  ์˜ˆ์‹œ๋Š” Convolution ์—ฐ์‚ฐ ๊ฒฐ๊ณผ Feature Map ์‚ฌ์ด์ฆˆ๊ฐ€ ๊ณ„์† ์ค„์–ด๋“ฆ
  • Padding์„ ์ถ”๊ฐ€ํ•˜์—ฌ Feature Map ์‚ฌ์ด์ฆˆ๊ฐ€ ์ค„์–ด๋“œ๋Š” ํ˜„์ƒ ๋ฐฉ์ง€
  • ์ด๋ฏธ์ง€์˜ ํ…Œ๋‘๋ฆฌ ์ •๋ณด๋„ ๊ท ์ผํ•˜๊ฒŒ ํ™œ์šฉ ๊ฐ€๋Šฅ
  • ์ฃผ๋กœ padding ์— ๋“ค์–ด๊ฐ€๋Š” ๊ฐ’์€ 0์ด๋ฉฐ, ์ด๋ฅผ zero padding์ด๋ผ ๋ถ€๋ฆ„

 

 

[ Convolutional Layer ์˜์˜ ]

  • ์™œ ์ด๋ฏธ์ง€ ํŠน์ง•์„ ์ž˜ ๋ฝ‘์•„๋‚ด๋Š”๊ฐ€?
    • Convolution ์—ฐ์‚ฐ์€ ํ•˜๋‚˜์˜ ์ปค๋„์ด ํ”ฝ์…€ ๊ฐ„์˜ ์ •๋ณด๋ฅผ ๋ณด๊ฒŒ ๋งŒ๋“ฆ
    • ์ฆ‰, ํ•˜๋‚˜์˜ ์ปค๋„์ด ์ด๋ฏธ์ง€ ์ „์ฒด ์˜์—ญ์„ ํ•™์Šต
  • Parameter Sharing
    • ์ปค๋„์ด ๊ฐ€์ง„ Parameter๋ฅผ ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์˜์—ญ์—์„œ ๊ณต์œ 
    • Parameter ๊ฐœ์ˆ˜๋ฅผ FC Layer์— ๋น„ํ•ด ๊ทน์ ์œผ๋กœ ์ค„์ž„ ๊ณผ์ ํ•ฉ๋ฐฉ์ง€์—์œ ๋ฆฌ
      • FC layer์˜ ๊ฒฝ์šฐ) ์ด๋ฏธ์ง€ 1*64, ์ปค๋„ 1*9 ๋ผ๋ฉด, ์ด 64*9๊ฐœ์˜ ๊ฐ€์ค‘์น˜ ํ•„์š”
      • Convolution Layer์˜ ๊ฒฝ์šฐ) ์ด๋ฏธ์ง€ 8*8, ์ปค๋„ 3*3 ๋ผ๋ฉด, ์ด 9๊ฐœ(kernel ํ–‰๋ ฌ์˜ ๋‚ด๋ถ€ ๊ฐ’)์˜ ๊ฐ€์ค‘์น˜ ํ•„์š” 

 

 

[Convolutional Layer ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ]

  • Convolution ์—ฐ์‚ฐ์€ ์„ ํ˜•์—ฐ์‚ฐ ( ๋ชจ๋‘ ๊ณฑ์…ˆ๊ณผ ๋ง์…ˆ์œผ๋กœ๋งŒ ์ด๋ฃจ์–ด์ง )
  • ๋”ฐ๋ผ์„œ, FC Layer์ฒ˜๋Ÿผ ๋น„์„ ํ˜•์„ฑ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์‚ฌ์šฉ (CNN์€ ์ฃผ๋กœ ReLUํ•จ์ˆ˜ ์‚ฌ์šฉ)

 

 

 

[ Pooling Layer ]

  • CNN์—์„œ ๊ฑฐ์˜ ํ•ญ์ƒ ๊ฐ™์ด ์“ฐ์ด๋Š” Layer
  • ์ฑ„๋„ ๋ณ„๋กœ ์—ฐ์‚ฐ
  • Convolution ์—ฐ์‚ฐ๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ, ๊ฐ ์ฑ„๋„ ๋ณ„๋กœ ์ˆ˜ํ–‰๋œ ์—ฐ์‚ฐ์„ ๋”ํ•˜์ง€ ์•Š์Œ 
  • ์ฃผ์—ญํ•  : Feature Map์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์ค„์—ฌ Parameter๊ฐœ์ˆ˜๋ฅผ ์ค„์ด๋Š” ๊ฒƒ ๊ณผ์ ํ•ฉ ์กฐ์ ˆ

 

 

  • Max Pooling
    • ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€๋‚˜ Feature Map์„ ๊ฒน์น˜์ง€ ์•Š๋Š” ๋” ์ž‘์€ ์˜์—ญ์œผ๋กœ๋ถ„ํ• 
    • ์œ„ ๊ทธ๋ฆผ์€ ๊ฐ ์˜์—ญ์˜ ํฌ๊ธฐ๊ฐ€ 2x2๊ฐ€ ๋˜๋„๋ก ๋ถ„ํ• 
    • ๊ฐ ์˜์—ญ์—์„œ ์ตœ๋Œ€๊ฐ’์„ ๋ฝ‘์•„๋‚ด์–ด ์ƒˆ๋กœ์šด Feature Map์„ ๊ตฌ์„ฑ

 

 

  • Average Pooling
    • Max Pooling๊ณผ ๊ฑฐ์˜ ๋™์ผํ•˜๋‚˜, ๊ฐ ์˜์—ญ์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ƒˆ๋กœ์šด Feature Map์„ ๊ตฌ์„ฑ

 

 

[ Pooling Layer ์ •๋ฆฌ ]

  • ์ผ๋ฐ˜์ ์œผ๋กœ Max Pooling์„ ๋งŽ์ด ์‚ฌ์šฉ ( Feature Map์— ์กด์žฌํ•˜๋Š” Feature ์ค‘ ๊ฐ€์žฅ ์˜ํ–ฅ๋ ฅ์ด ํฐ Feature๋งŒ ์‚ฌ์šฉ )
  • Feature Map์˜ ์ฑ„๋„์ด ์—ฌ๋Ÿฌ ๊ฐœ๋ฉด ๊ฐ ์ฑ„๋„๋ณ„๋กœ Pooling ์—ฐ์‚ฐ ์ˆ˜ํ–‰ (pooling์˜ ๊ฒฐ๊ณผ ์ฑ„๋„๋ณ„๋กœ ํ•ฉ์น˜์ง€ ์•Š์Œ)
  • ์ถ”๊ฐ€ Pooling Layer
    • Global Average Pooling: ์ „์ฒด Feature Map์—์„œ ํ•˜๋‚˜์˜ ํ‰๊ท ๊ฐ’ ๊ณ„์‚ฐ
    • Global Max Pooling: ์ „์ฒด Feature Map์—์„œ ํ•˜๋‚˜์˜ ์ตœ๋Œ€๊ฐ’์„ ๊ณ„์‚ฐ
    • ์—ฌ๊ธฐ์„  Global Average Pooling์„ ๋งŽ์ด ์‚ฌ์šฉ (๊ทธ๋ฆผ์˜ pooling์€ global average pooling)

 

 

 

 

 

[ Classifier ] 

  • CNN์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ
  • Feature Map์„ Fully-connected Layer์— ํ†ต๊ณผ์‹œ์ผœ ๋ถ„๋ฅ˜ ์ˆ˜ํ–‰
  • ์ด๋ฅผ ์œ„ํ•ด Feature Map์„ 1์ฐจ์›์œผ๋กœ ๋ณ€ํ˜•
    • 1์ฐจ์›์œผ๋กœ ๋ณ€ํ˜•ํ•˜๋Š” ๋ฐฉ๋ฒ• :
      - feature map์„ ๋‹จ์ˆœํžˆ flatten ํ•จ 
      - global average pooling ๋“ฑ์˜ ๋ฐฉ๋ฒ• ์ด์šฉํ•จ (channel์˜ ๊ฐœ์ˆ˜ = ๋ฒกํ„ฐ์˜ ๊ธธ์ด)

 

 

[ Tensorflow ๋กœ Convolution Layer ๊ตฌํ˜„ ]

import tensorflow as tf
from tensorflow import keras

''' ๋ฐฉ๋ฒ• 1 : Tensorflow๋กœ conv2d ์‚ฌ์šฉ '''

# input : 1๋กœ ๊ตฌ์„ฑ๋œ 3x3 ํฌ๊ธฐ์˜ ๊ฐ„๋‹จํ•œ ํ–‰๋ ฌ (3x3 x1 ์ด๋ฏธ์ง€๊ฐ€ 1๊ฐœ)
inp = tf.ones((1, 3, 3, 1)) 

# Filter : 1๋กœ ๊ฐ€๋“์ฐฌ 2x2์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„ ํ–‰๋ ฌ
filter = tf.ones((2, 2, 1, 1)) 

# stride : [๋†’์ด, ๋„ˆ๋น„]์˜ ํ˜•์‹์œผ๋กœ ์ž…๋ ฅ - 1์นธ์”ฉ ์ด๋™ํ•˜๋„๋ก 1, 1์„ ์ž…๋ ฅ
stride = [1, 1] # [๋†’์ด, ๋„ˆ๋น„]

# ์ค€๋น„๋œ ์ž…๋ ฅ๊ฐ’, filter, stride๋กœ Convolution ์—ฐ์‚ฐ ์ˆ˜ํ–‰ (padding์„ 'VALID'์œผ๋กœ ์„ค์ • = ํŒจ๋”ฉ์„ ํ•˜์ง€ ์•Š์Œ)
output = tf.nn.conv2d(inp, filter, stride, padding = 'VALID') 
print(output)
# [[  [[4.] [4.]]
#     [[4.] [4.]]  ]], shape=(1, 2, 2, 1), dtype=float32)

## ๊ฒฐ๊ณผ : Padding์ด ์—†๋Š” ์ƒํƒœ์—์„œ Convolution์„ ์ˆ˜ํ–‰ํ•˜๋‹ˆ ์ž…๋ ฅ์˜ ํฌ๊ธฐ(3x3)๋ณด๋‹ค ์ถœ๋ ฅ์˜ ํฌ๊ธฐ(2x2)๊ฐ€ ์ž‘์•„์ง


# padding์˜ต์…˜์„ 'VALID'๊ฐ€ ์•„๋‹Œ 'SAME'์œผ๋กœ ์„ค์ • (์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ํ˜•ํƒœ๊ฐ€ ๊ฐ™๋„๋ก ํŒจ๋”ฉ์„ ์ ์šฉ)
output = tf.nn.conv2d(inp, filter, stride, padding = 'SAME')  
print(output)
#  [[ [[4.] [4.] [2.]]
#     [[4.] [4.] [2.]]
#     [[2.] [2.] [1.]] ]], shape=(1, 3, 3, 1), dtype=float32)

## ๊ฒฐ๊ณผ : Convolution Layer์—์„œ padding์„ 'SAME'์œผ๋กœ ์„ค์ •ํ•˜๋ฉด ์—ฌ๋Ÿฌ๋ฒˆ ์—ฐ์‚ฐํ•ด๋„ ํฌ๊ธฐ๋Š” ์ค„์–ด๋“ค์ง€ ์•Š์Œ


''' padding์„ ์ง์ ‘ ์„ค์ •ํ•ด์„œ ์ „๋‹ฌ '''

# ์œ„,์•„๋ž˜,์˜ค๋ฅธ์ชฝ,์™ผ์ชฝ์— padding์„ ๊ฐ๊ฐ ํ•œ ์นธ์”ฉ ์ถ”๊ฐ€
padding = [[0, 0], [1, 1], [1, 1], [0, 0]] # [[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]]
output1 = tf.nn.conv2d(inp, filter, stride, padding = padding) 
print(output1)
# [[ [[1.]  [2.]  [2.]  [1.]]
#    [[2.]  [4.]  [4.]  [2.]]
#    [[2.]  [4.]  [4.]  [2.]]
#    [[1.]  [2.]  [2.]  [1.]] ]]


''' ๋ฐฉ๋ฒ• 1 : Tensorflow.Keras๋กœ Conv2D ์‚ฌ์šฉ '''

input_shape=(1, 3, 3, 1)

x = tf.ones(input_shape) # 3x3 x1 ์ด๋ฏธ์ง€๊ฐ€ 1๊ฐœ (1, ๋†’์ด, ๋„ˆ๋น„, 1)
print(x)

y = tf.keras.layers.Conv2D( filters = 1, # ํ•„ํ„ฐ์˜ ๊ฐฏ์ˆ˜ 
                            kernel_size = [2, 2], # "kernel_size = 2" ์™€ ๊ฐ™์€ ์˜๋ฏธ (๋†’์ด, ๋„ˆ๋น„)
                            strides = (1, 1), 
                            padding = 'same', # keras.layers.Conv2D ์˜ padding์€ ์†Œ๋ฌธ์ž 'same', 'valid'
                            activation = 'relu', 
                            input_shape = input_shape[1:]) (x) # ์ž…๋ ฅ : x
print(y)
# [[ [[0.36910588] [0.36910588] [0.54728895]]
#    [[0.36910588] [0.36910588] [0.54728895]]
#    [[0.8551657 ] [0.8551657 ] [0.6025906 ]] ]], shape=(1, 3, 3, 1), dtype=float32)

 

 

[ Fully-connected Layer๋ฅผ ์Œ“์•„ ๋งŒ๋“  Multilayer Perceptron(MLP) ๋ชจ๋ธ vs CNN ๋ชจ๋ธ ๋น„๊ต ]

  • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ๊ตฌํ˜„
  •  ๋ฐ์ดํ„ฐ์…‹ : CIFAR-10 ๋ฐ์ดํ„ฐ์…‹ 
    • ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ 32×32์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋กœ ๊ตฌ์„ฑ
    • ๋น„ํ–‰๊ธฐ, ์ž๋™์ฐจ, ์ƒˆ ๋“ฑ์˜ 10๊ฐœ์˜ ํด๋ž˜์Šค์— ์†ํ•จ
    • ํ•™์Šต(Train) ๋ฐ์ดํ„ฐ์…‹์€ 50000๊ฐœ, ํ…Œ์ŠคํŠธ(Test) ๋ฐ์ดํ„ฐ์…‹์€ 10000๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ํฌํ•จ - ์•„๋ž˜ ์ฝ”๋“œ์—์„œ๋Š” 
import tensorflow as tf
from tensorflow.keras import layers, Sequential, Input
from tensorflow.keras.optimizers import Adam

import numpy as np
import matplotlib.pyplot as plt

SEED = 42

def load_cifar10_dataset():
    train_X = np.load("./dataset/cifar10_train_X.npy")
    train_y = np.load("./dataset/cifar10_train_y.npy")
    test_X = np.load("./dataset/cifar10_test_X.npy")
    test_y = np.load("./dataset/cifar10_test_y.npy")
    train_X, test_X = train_X / 255.0, test_X / 255.0
    
    return train_X, train_y, test_X, test_y
    
''' MLP ๋ชจ๋ธ '''
def build_mlp_model(img_shape, num_classes=10): 
    model = Sequential()
    model.add(Input(shape=img_shape))
    model.add(layers.Flatten()) # 2์ฐจ์› ์ด๋ฏธ์ง€ -> 1์ฐจ์›
    model.add(layers.Dense(units=4096, activation='relu'))
    model.add(layers.Dense(units=1024, activation='relu'))
    model.add(layers.Dense(units=256, activation='relu'))
    model.add(layers.Dense(units=64, activation='relu'))
    model.add(layers.Dense(units=num_classes, activation='softmax'))

    return model

''' CNN ๋ชจ๋ธ '''
def build_cnn_model(img_shape, num_classes=10): 
    model = Sequential()
    model.add(layers.Conv2D(filters=16, kernel_size=(3,3), padding='same', activation='relu', input_shape = img_shape)) # convolution layer๋Š” ์ฒ˜์Œ์— input_shape ์ง€์ •ํ•ด์•ผ ํ•จ
    model.add(layers.Conv2D(filters=32, kernel_size=(3,3), padding='same',activation='relu'))
    model.add(layers.MaxPool2D(pool_size=(2,2), strides=(2,2))) # pooling : ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๊ฐ€ 2๋ฐฐ๋กœ ์ค„๋„๋ก ์„ค์ •
    model.add(layers.Conv2D(filters=64, kernel_size=(3,3), padding='same', strides=(2,2),activation='relu')) # strides=(2,2)์ด๋ฏ€๋กœ feature map๊ฐ€ ๊ฐ€๋กœ,์„ธ๋กœ๋กœ 2๋ฐฐ์”ฉ ์ค„์–ด๋“ค์Œ - maxpooling๊ณผ ๊ฐ™์€ ํšจ๊ณผ
    model.add(layers.Conv2D(filters=64, kernel_size=(3,3), padding='same', strides=(2,2),activation='relu'))
    model.add(layers.MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(units=128, activation='relu')) # fully connected layer ์‚ฌ์šฉ
    model.add(layers.Dense(units=num_classes, activation='softmax'))
    
    return model
    
def plot_history(hist):
    train_loss = hist.history["loss"]
    train_acc = hist.history["accuracy"]
    valid_loss = hist.history["val_loss"]
    valid_acc = hist.history["val_accuracy"]
    
    fig = plt.figure(figsize=(8, 6))
    plt.plot(train_loss)
    plt.plot(valid_loss)
    plt.title('Loss')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Train', 'Valid'], loc='upper right')
    plt.savefig("loss.png")
    
    fig = plt.figure(figsize=(8, 6))
    plt.plot(train_acc)
    plt.plot(valid_acc)
    plt.title('Accuracy')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.legend(['Train', 'Valid'], loc='upper left')
    plt.savefig("accuracy.png")
    
def run_model(model, train_X, train_y, test_X, test_y, epochs=10):
    optimizer = Adam(learning_rate=0.001)
    model.summary()
    model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    hist = model.fit(train_X, train_y, epochs=epochs, batch_size=64, validation_split=0.2, shuffle=True, verbose=2)
    
    plot_history(hist)
    test_loss, test_acc = model.evaluate(test_X, test_y)
    print("Test Loss: {:.5f}, Test Accuracy: {:.3f}%".format(test_loss, test_acc * 100))
    
    return optimizer, hist

def main():
    tf.random.set_seed(SEED)
    np.random.seed(SEED)
    
    train_X, train_y, test_X, test_y = load_cifar10_dataset()
    img_shape = train_X[0].shape

    mlp_model = build_mlp_model(img_shape)
    cnn_model = build_cnn_model(img_shape)
    
    print("=" * 30, "MLP ๋ชจ๋ธ", "=" * 30)
    run_model(mlp_model, train_X, train_y, test_X, test_y)
    
    print()
    print("=" * 30, "CNN ๋ชจ๋ธ", "=" * 30)
    run_model(cnn_model, train_X, train_y, test_X, test_y)

if __name__ == "__main__":
    main()

 

 

[ ์ฝ”๋“œ ๊ฒฐ๊ณผ ํ•ด์„ ]

## MLP ๋ชจ๋ธ
Test Loss: 1.87062, Test Accuracy: 34.200%

## CNN ๋ชจ๋ธ
Test Loss: 1.35545, Test Accuracy: 50.800%

 

  • loss์™€ accuracy๋ฅผ ๋น„๊ตํ–ˆ์„ ๋•Œ, CNN ๋ชจ๋ธ์ด MLP ๋ชจ๋ธ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์Œ
  • Trainable params = ์‹ค์ œ ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜
  • MLP ๋ชจ๋ธ์—์„œ ํ•„์š”ํ•œ parameter์˜ ๊ฐœ์ˆ˜(17,061,834)๋ณด๋‹ค CNN ๋ชจ๋ธ์—์„œ ํ•„์š”ํ•œ parameter์˜ ๊ฐœ์ˆ˜(94,698)ํ˜„์ €ํžˆ ์ž‘์Œ

 

 

 

3. ๋Œ€ํ‘œ์ ์ธ CNN ๋ชจ๋ธ

 

  • LeNet (1990)
    • ์šฐํŽธ๋ฒˆํ˜ธ ์ธ์‹์„ ์œ„ํ•œ ๋ชจ๋ธ
    • subsampling์€ pooling๊ณผ ๋™์ผํ•œ ์—ญํ•  ์ˆ˜ํ–‰
    • ๋งˆ์ง€๋ง‰์— full connection (FC layer)๋ฅผ ํ†ตํ•ด ๋ถ„๋ฅ˜๊ธฐ ํ˜•์„ฑ

 

 

  • AlexNet (2012)
    • 2012๋…„ ImageNet Challenge ์šฐ์Šน ๊ธฐ์กด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํฐํญ์œผ๋กœ ์ƒํšŒ
    • ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์†Œ๊ฐœ
    • ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต์— GPU๋ฅผ ํ™œ์šฉ ์ดํ›„๋กœ ๋Œ€๋ถ€๋ถ„์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ GPU๋กœ ํ•™์Šต
    • ๋‹น์‹œ GPU์˜ ํ•œ๊ณ„๋กœ, 2๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ๋กœ 2๊ฐœ๋กœ ๋‚˜๋ˆ ์„œ ํ•™์Šตํ•จ

 

 

  • VGGNet (2014)
    • ์ปค๋„ ์‚ฌ์ด์ฆˆ๋ฅผ ๋ชจ๋‘ 3x3์œผ๋กœ ํ†ต์ผ
    • Parameter์ˆ˜ ์ฆ๊ฐ€๋ฅผ ์–ต์ œํ•˜๋ฉด์„œ ๋ชจ๋ธ ์ธต์„ ๋” ๋งŽ์ด ์Œ“์„ ์ˆ˜ ์žˆ๊ฒŒ ๋จ
    • ์ธต์ด ๋งŽ์„์ˆ˜๋ก(์ฆ‰,๋ชจ๋ธ์ด ๊นŠ์„์ˆ˜๋ก)์ผ๋ฐ˜์ ์œผ๋กœ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋จ

 

 

[ 16๊ฐœ์˜ layer๋กœ ์ด๋ฃจ์–ด์ง„ VGGNet, VGG-16 ๊ตฌํ˜„ ]

  • VGGNet๋ถ€ํ„ฐ๋Š” Layer ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์ด ๋Š˜์–ด๋‚จ์— ๋”ฐ๋ผ Block ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ๊ตฌ์„ฑ.
  • ๊ฐ Block์€ 2๊ฐœ ํ˜น์€ 3๊ฐœ์˜ Convolutional Layer์™€ Max Pooling Layer๋กœ ๊ตฌ์„ฑ.
  • parameter๊ฐ€ ์กด์žฌํ•˜๋Š” layer๋งŒ layer ๊ฐœ์ˆ˜๋ฅผ ์…ˆ - pooling, flatten layer๋Š” layer ๊ฐœ์ˆ˜์— ํฌํ•จ X
  • trainable params๊ฐ€ 138,357,544 ๊ฐœ -> ๋”ฅ๋Ÿฌ๋‹ ์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•„์š”ํ•œ parameter ๊ฐœ์ˆ˜ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€
import tensorflow as tf
from tensorflow.keras import Sequential, layers

def build_vgg16():
    model = Sequential()
    
    # ์ฒซ๋ฒˆ์งธ Block
    model.add(layers.Conv2D(filters=64, kernel_size=(3,3), padding='same', activation='relu', input_shape=(224, 224, 3)))
    model.add(layers.Conv2D(filters=64, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(2)) # pooling : ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๊ฐ€ 2๋ฐฐ๋กœ ์ค„๋„๋ก ์„ค์ •
    
    # ๋‘๋ฒˆ์งธ Block
    model.add(layers.Conv2D(filters=128, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=128, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(2))
    
    # ์„ธ๋ฒˆ์งธ Block
    model.add(layers.Conv2D(filters=256, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=256, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=256, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(2))
    
    # ๋„ค๋ฒˆ์งธ Block
    model.add(layers.Conv2D(filters=512, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=512, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=512, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(2))
    
    # ๋‹ค์„ฏ๋ฒˆ์งธ Block
    model.add(layers.Conv2D(filters=512, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=512, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.Conv2D(filters=512, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(layers.MaxPooling2D(2))
    
    # Fully Connected Layer
    model.add(layers.Flatten())
    model.add(layers.Dense(4096, activation="relu"))
    model.add(layers.Dense(4096, activation="relu"))
    model.add(layers.Dense(1000, activation="softmax"))
    
    return model

def main():
    model = build_vgg16()
    model.summary()
    
if __name__ == "__main__":
    main()

 

 

 

 

 

  • ResNet (2015)
    • Layer ๊ฐœ์ˆ˜๋ฅผ ์ตœ๋Œ€ 152๊ฐœ๊นŒ์ง€ ๋Š˜๋ฆผ
    • ๊นŠ์€ ๋ชจ๋ธ์—์„œ ํ•„์—ฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ํ˜„์ƒ : Vanishing Gradient
    • Vanishing Gradient (๊ธฐ์šธ๊ธฐ์†Œ์‹ค)
      • ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด ์ ์  ์ž‘์•„์ง€๋‹ค 0์— ์ˆ˜๋ ดํ•˜๋ฉด์„œ ๋ฐœ์ƒ
      • ๋ชจ๋ธ ํ•™์Šต์— ์˜ค๋žœ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๊ฑฐ๋‚˜ ์•„์˜ˆ ํ•™์Šต์ด ๋ฉˆ์ถ”๊ฒŒ ๋จ
      • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด residual connection ๊ตฌ์กฐ๊ฐ€ ์ถ”๊ฐ€๋จ
  • Residual connection : vanishing gradient ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๊ตฌ์กฐ
    • residual connection์„ ์‚ฌ์šฉํ•˜๋ ค layer ๊ฐœ์ˆ˜๋ฅผ ๊ทน์ ์œผ๋กœ ๋Š˜๋ฆผ
    • ๊ธฐ์กด convolutional layer๋“ค์„ ์šฐํšŒํ•˜๋Š” ์—ฐ๊ฒฐ 
      • ์ž…๋ ฅ Feature Map(x)์ด ์šฐํšŒ๋กœ๋ฅผ ํ†ต๊ณผํ•˜์—ฌ Convolutioinal Layer์˜ Feature Map( $ F(x) $ )๊ณผ ๋”ํ•ด์ง
      • ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด ํ•ญ์ƒ 1 ์ด์ƒ์ด ๋˜์–ด ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ ๋ฐฉ์ง€

  • ๊ทธ๋ฆผ์˜ weight layer๊ฐ€ convolution layer์™€ ๋™์ผํ•œ ์—ญํ• 
  • pooling layer๋Š” ๊ฐ€์ค‘์น˜ ๊ฐœ์ˆ˜์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์œผ๋ฏ€๋กœ, ๊ทธ๋ฆผ์—์„œ๋Š” ์ƒ๋žตํ–ˆ์Œ

 

 

[ ResNet ๊ตฌํ˜„ ]

  • Residual Connection์€ ๋ณดํ†ต ResNet์˜ ๊ฐ Block ๋‹จ์œ„๋กœ ๊ตฌํ˜„. ๋”ฐ๋ผ์„œ ์ผ๋ฐ˜์ ์œผ๋กœ Residual Connection์„ ๊ฐ€์ง€๋Š” ๋ถ€๋ถ„์„ Residual Block์ด๋ผ ํ•˜์—ฌ Block ๋‹จ์œ„๋กœ ๊ตฌํ˜„ํ•œ ํ›„์— ์ด๋“ค์„ ์—ฐ๊ฒฐํ•˜๋Š” ์‹์œผ๋กœ ๋ชจ๋“ˆํ™” ํ•˜์—ฌ ์ „์ฒด ๋ชจ๋ธ ๊ตฌํ˜„
import tensorflow as tf
from tensorflow.keras import layers, Model, Sequential

''' Residual Block ๋ชจ๋“ˆ '''
class ResidualBlock(Model):
    def __init__(self, num_kernels, kernel_size):
        super(ResidualBlock, self).__init__()

        # 2๊ฐœ์˜ Conv2D Layer
        self.conv1 = layers.Conv2D(filters=num_kernels, kernel_size=kernel_size, padding='same',activation='relu')
        self.conv2 = layers.Conv2D(filters=num_kernels, kernel_size=kernel_size, padding='same',activation='relu')
        
        # Relu Layer : ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋„ layerํ˜•์‹์œผ๋กœ ์ทจ๊ธ‰ ๊ฐ€๋Šฅ
        self.relu = layers.Activation("relu") 
        
        # Add Layer : ๋‘๊ฐœ์˜ ํ…์„œ๋ฅผ ๋”ํ•˜๋Š” layer
        self.add = layers.Add()


    def call(self, input_tensor):
        x = self.conv1(input_tensor) # 1๋ฒˆ์˜ convolution layer ๊ฒฐ๊ณผ
        x = self.conv2(x) # 2๋ฒˆ์˜ convolution layer ๊ฒฐ๊ณผ

        x = self.add([x, input_tensor]) # ๋‘ ๊ฐ’์„ ๋”ํ•˜๋Š” ๊ณผ์ • (original + 2๊ฐœ์˜ convolution ๊ฒฐ๊ณผ)
        x = self.relu(x) # relu layer ๊ตฌํ˜„
        
        return x
        
def build_resnet(input_shape, num_classes):
    model = Sequential()
    
    model.add(layers.Conv2D(64, kernel_size=(3, 3), padding="same", activation="relu", input_shape=input_shape))
    model.add(layers.MaxPool2D(2))
    
    model.add(ResidualBlock(64, (3, 3)))
    model.add(ResidualBlock(64, (3, 3)))
    model.add(ResidualBlock(64, (3, 3)))
    
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dense(num_classes, activation="softmax"))
    
    return model
    
def main():
    input_shape = (32, 32, 3)
    num_classes = 10

    model = build_resnet(input_shape, num_classes)
    model.summary()

if __name__=="__main__":
    main()

 

 

 

 

์ง€๊ธˆ๊นŒ์ง€ ๋‚˜์˜จ ๋ชจ๋ธ์€ ๋ชจ๋‘ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž„. ๋ถ„๋ฅ˜ ์ž‘์—…์ด ์•„๋‹Œ ๊ฒฝ์šฐ์— ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์€?

  • ์ผ๋ฐ˜์ ์œผ๋กœ ๋ถ„๋ฅ˜ ๋ชจ๋ธ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ CNN ๊ตฌ์„ฑ
  • but, ๋ชจ๋ธ์˜ ์ถœ๋ ฅ๊ฐ’, ์†์‹คํ•จ์ˆ˜, ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ ๋“ฑ์ด ์™„์ „ํžˆ ๋‹ค๋ฅด๊ฒŒ ์ด๋ฃจ์–ด์ง
  • ์˜ˆ) YOLO(๊ฐ์ฒด ์ธ์‹), R-CNN(๊ฐ์ฒด ์ธ์‹), U-Net(์ด๋ฏธ์ง€ segmentation) ๋“ฑ
Comments