学习前言
由于项目的需要,回顾下我以前读过的一篇论文的deeplabv3,顺便谈下他的实现代码。
github代码
https://github.com/yanjingke/deeplabv3
什么是Deeplabv3?
Deeplabv3是一款很经典语义分割模型,它使语义分割达到了新高峰,融合了许多先进的思想。Deeplabv3的结构如下图:
Deeplabv3的优点及总结:
1.加入了空洞卷积,调整filters的接受野(field-of-view)。如下图所示,左图中标准卷积中的卷积核大小为 3x3,其感受野也为 3x3,在卷积核中间插入 0 之后变为右图空洞卷积,其中实际参与计算的卷积核大小仍为 3x3,而感受野已经扩大到了 5x5。
2.采用了ASPP结构,翻译为中文叫做基于空洞卷积的空间金字塔池化。 ASPP 能够有效的捕获多尺度信息.采用多个 atrous rate 的空洞卷积卷积层并联的方式,使模型在多尺度物体上的表现更好。整体的 ASPP 结构就是图中黄色框中的部分,也由 (a), (b) 两个部分组成。
(a) 包含 1 个 1x1 的卷积和 3 个atrous rate 分别为 (6, 12, 18) 的 3x3 的空洞卷积;
(b)将最终 feature map 经过池化+1x1卷积+bn+bilinear unsample,这么做是为了包含更多的全局信息。
总结:
总的来说,在Encoder部分Deeplabv3 利用ASPP结构,将不同rate的Atrous Convolution串行进行特征提取,再进行并行合并在一起。在经过1x1的卷积,压缩特征层。
Decoder的部分,Encoder压缩后的特征层,传入Decoder部分,与直接传入Decoder进行一定的处理后Concat在一起。最终经过卷积和Upsample(双线性差值)得到输出结果。
Deeplabv3代码部分
主干网络–mobilenetv2
这里我采用了mobilenetv2,它的优点是在mobilenetv1之上采用了残差结构。
from keras.models import Model
from keras import layers
from keras.layers import Input
from keras.layers import Lambda
from keras.layers import Activation
from keras.layers import Concatenate
from keras.layers import Add
from keras.layers import Dropout
from keras.layers import BatchNormalization
from keras.layers import Conv2D
from keras.layers import DepthwiseConv2D
from keras.layers import ZeroPadding2D
from keras.layers import GlobalAveragePooling2D
from keras.activations import relu
def _make_divisible(v, divisor, min_value=None): if min_value is None: min_value = divisor new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) if new_v < 0.9 * v: new_v += divisor return new_v
def relu6(x): return relu(x, max_value=6)
def _inverted_res_block(inputs, expansion, stride, alpha, filters, block_id, skip_connection, rate=1): in_channels = inputs.shape[-1].value # inputs._keras_shape[-1] pointwise_conv_filters = int(filters * alpha) pointwise_filters = _make_divisible(pointwise_conv_filters, 8) x = inputs prefix = 'expanded_conv_{}_'.format(block_id) if block_id: # Expand x = Conv2D(expansion * in_channels, kernel_size=1, padding='same', use_bias=False, activation=None, name=prefix + 'expand')(x) x = BatchNormalization(epsilon=1e-3, momentum=0.999, name=prefix + 'expand_BN')(x) x = Activation(relu6, name=prefix + 'expand_relu')(x) else: prefix = 'expanded_conv_' # Depthwise x = DepthwiseConv2D(kernel_size=3, strides=stride, activation=None, use_bias=False, padding='same', dilation_rate=(rate, rate), name=prefix + 'depthwise')(x) x = BatchNormalization(epsilon=1e-3, momentum=0.999, name=prefix + 'depthwise_BN')(x) x = Activation(relu6, name=prefix + 'depthwise_relu')(x) # Project x = Conv2D(pointwise_filters, kernel_size=1, padding='same', use_bias=False, activation=None, name=prefix + 'project')(x) x = BatchNormalization(epsilon=1e-3, momentum=0.999, name=prefix + 'project_BN')(x) if skip_connection: return Add(name=prefix + 'add')([inputs, x]) # if in_channels == pointwise_filters and stride == 1: # return Add(name='res_connect_' + str(block_id))([inputs, x]) return x
def mobilenetV2(inputs,alpha=1): first_block_filters = _make_divisible(32 * alpha, 8) # 416,416 -> 208,208 x = Conv2D(first_block_filters, kernel_size=3, strides=(2, 2), padding='same', use_bias=False, name='Conv')(inputs) x = BatchNormalization( epsilon=1e-3, momentum=0.999, name='Conv_BN')(x) x = Activation(relu6, name='Conv_Relu6')(x) x = _inverted_res_block(x, filters=16, alpha=alpha, stride=1, expansion=1, block_id=0, skip_connection=False) # 208,208 -> 104,104 x = _inverted_res_block(x, filters=24, alpha=alpha, stride=2, expansion=6, block_id=1, skip_connection=False) x = _inverted_res_block(x, filters=24, alpha=alpha, stride=1, expansion=6, block_id=2, skip_connection=True) skip1 = x # 104,104 -> 52,52 x = _inverted_res_block(x, filters=32, alpha=alpha, stride=2, expansion=6, block_id=3, skip_connection=False) x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1, expansion=6, block_id=4, skip_connection=True) x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1, expansion=6, block_id=5, skip_connection=True) #---------------------------------------------------------------# x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1, expansion=6, block_id=6, skip_connection=False) x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1, rate=2, expansion=6, block_id=7, skip_connection=True) x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1, rate=2, expansion=6, block_id=8, skip_connection=True) x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1, rate=2, expansion=6, block_id=9, skip_connection=True) x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1, rate=2, expansion=6, block_id=10, skip_connection=False) x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1, rate=2, expansion=6, block_id=11, skip_connection=True) x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1, rate=2, expansion=6, block_id=12, skip_connection=True) x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1, rate=2, # 1! expansion=6, block_id=13, skip_connection=False) x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1, rate=4, expansion=6, block_id=14, skip_connection=True) x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1, rate=4, expansion=6, block_id=15, skip_connection=True) x = _inverted_res_block(x, filters=320, alpha=alpha, stride=1, rate=4, expansion=6, block_id=16, skip_connection=False) return x,skip1
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
deeplabv3 encode部分
img_input = Input(shape=input_shape) # (64, 64, 320) x,skip1 = mobilenetV2(img_input,alpha) size_before = tf.keras.backend.int_shape(x) # 全部求平均后,再利用expand_dims扩充维度,1x1 # shape = 320 b4 = GlobalAveragePooling2D()(x) # 1x1x320 b4 = Lambda(lambda x: K.expand_dims(x, 1))(b4) b4 = Lambda(lambda x: K.expand_dims(x, 1))(b4) # 压缩filter b4 = Conv2D(256, (1, 1), padding='same', use_bias=False, name='image_pooling')(b4) b4 = BatchNormalization(name='image_pooling_BN', epsilon=1e-5)(b4) b4 = Activation('relu')(b4) # 直接利用resize_images扩充hw # b4 = 64,64,256 b4 = Lambda(lambda x: tf.image.resize_images(x, size_before[1:3]))(b4) # 调整通道 b0 = Conv2D(256, (1, 1), padding='same', use_bias=False, name='aspp0')(x) b0 = BatchNormalization(name='aspp0_BN', epsilon=1e-5)(b0) b0 = Activation('relu', name='aspp0_activation')(b0) # rate值与OS相关,SepConv_BN为先3x3膨胀卷积,再1x1卷积,进行压缩 # 其膨胀率就是rate值 b1 = SepConv_BN(x, 256, 'aspp1', rate=6, depth_activation=True, epsilon=1e-5) b2 = SepConv_BN(x, 256, 'aspp2', rate=12, depth_activation=True, epsilon=1e-5) b3 = SepConv_BN(x, 256, 'aspp3', rate=18, depth_activation=True, epsilon=1e-5) x = Concatenate()([b4, b0, b1, b2, b3]) # 利用conv2d压缩 # 52,52,256 x = Conv2D(256, (1, 1), padding='same', use_bias=False, name='concat_projection')(x) x = BatchNormalization(name='concat_projection_BN', epsilon=1e-5)(x) x = Activation('relu')(x) x = Dropout(0.1)(x) # skip1.shape[1:3] 为 104,104 # skip1 104, 104, 256 x = Lambda(lambda xx: tf.image.resize_images(x, skip1.shape[1:3]))(x)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
deeplabv3 decode部分
# 104, 104, 48 dec_skip1 = Conv2D(48, (1, 1), padding='same', use_bias=False, name='feature_projection0')(skip1) dec_skip1 = BatchNormalization( name='feature_projection0_BN', epsilon=1e-5)(dec_skip1) dec_skip1 = Activation('relu')(dec_skip1) # 104,104,304 x = Concatenate()([x, dec_skip1]) x = SepConv_BN(x, 256, 'decoder_conv0', depth_activation=True, epsilon=1e-5) x = SepConv_BN(x, 256, 'decoder_conv1', depth_activation=True, epsilon=1e-5) # 416,416,2 size_before3 = tf.keras.backend.int_shape(img_input) # 104,104,2 x = Conv2D(classes, (1, 1), padding='same')(x) x = Lambda(lambda xx:tf.image.resize_images(xx,size_before3[1:3]))(x) x = Reshape((-1,classes))(x) x = Softmax()(x)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
LOSS函数的组成
因为语义分割是分类每个像素的概率,输出为[416(图片大小),416(图片大小),nclasses]。之后利用Softmax估计属于每一个种类的概率。
x = Conv2D(classes, (1, 1), padding='same')(x) size_before3 = tf.keras.backend.int_shape(img_input) x = Lambda(lambda xx:tf.image.resize_images(xx,size_before3[1:3]))(x) x = Reshape((-1,classes))(x) x = Softmax()(x) inputs = img_input model = Model(inputs, x, name='deeplabv3plus')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
文章来源: blog.csdn.net,作者:快了的程序猿小可哥,版权归原作者所有,如需转载,请联系作者。
原文链接:blog.csdn.net/qq_35914625/article/details/108172252