【昇腾CANN训练营第二期】【模型营】使用PyCharm的ModelArts插件训练LeNet网络

昇腾CANN训练营模型营的实操作业出来了：https://bbs.huaweicloud.com/forum/forum.php?mod=viewthread&tid=135955&fromuid=446160

具体题目为：使用华为云Ascend910在MNIST上面训练LeNet网络，上传loss截图和推理精度截图

姐夫丁大大已经使用AI Gallery和ModelArts做了这个作业了，大家可移驾：https://bbs.huaweicloud.com/blogs/281479 跟着操作即可。

本文拟采用 PyCharm的 ModelArts插件，完成本作业。

一、准备PyCharm ModelArts插件和MindSpore本地环境，并准备MindSpore的LeNet源代码

使用PyCharm安装配置ModelArts的方法可参见张小白的博客：https://bbs.huaweicloud.com/blogs/207322 搜索：PyCharm Kit的登场

配置完PyCharm之后，我们去MindSpore官网 https://www.mindspore.cn/tutorial/training/zh-CN/r1.2/index.html

点开教程-》训练-》快速入门-》实现一个图片分类应用：

会发现居然有那么多方法去玩转LeNet：可以下载Notebook在本地玩，可以在ModelArts玩，可以在HuaweiCloud上玩（https://www.hiascend.com/zh/college/onlineExperiment/codeLabMindSpore/mindSpore ，这应该是个沙箱实验室）。。

因为计划使用PyCharm，那么就先去看源码吧：

点开黄色部分的 https://gitee.com/mindspore/docs/tree/r1.2/tutorials/tutorial_code/lenet 这个链接：

好像只有2个代码：

lenet.py

# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Lenet Tutorial
This sample code is applicable to CPU, GPU and Ascend.
"""
import os
import argparse
import mindspore.dataset as ds
import mindspore.nn as nn
from mindspore import context, Model, load_checkpoint, load_param_into_net
from mindspore.common.initializer import Normal
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
import mindspore.dataset.vision.c_transforms as CV
import mindspore.dataset.transforms.c_transforms as C
from mindspore.dataset.vision import Inter
from mindspore.nn.metrics import Accuracy
from mindspore import dtype as mstype
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
from utils.dataset import download_dataset

def create_dataset(data_path, batch_size=32, repeat_size=1,
                   num_parallel_workers=1):
    """ create dataset for train or test
    Args:
        data_path: Data path
        batch_size: The number of data records in each group
        repeat_size: The number of replicated data records
        num_parallel_workers: The number of parallel workers
    """
    # define dataset
    mnist_ds = ds.MnistDataset(data_path)

    # define operation parameters
    resize_height, resize_width = 32, 32
    rescale = 1.0 / 255.0
    shift = 0.0
    rescale_nml = 1 / 0.3081
    shift_nml = -1 * 0.1307 / 0.3081

    # define map operations
    resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)  # Resize images to (32, 32)
    rescale_nml_op = CV.Rescale(rescale_nml, shift_nml) # normalize images
    rescale_op = CV.Rescale(rescale, shift) # rescale images
    hwc2chw_op = CV.HWC2CHW() # change shape from (height, width, channel) to (channel, height, width) to fit network.
    type_cast_op = C.TypeCast(mstype.int32) # change data type of label to int32 to fit network

    # apply map operations on images
    mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(operations=resize_op, input_columns="image", num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(operations=rescale_op, input_columns="image", num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns="image", num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns="image", num_parallel_workers=num_parallel_workers)

    # apply DatasetOps
    buffer_size = 10000
    mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)  # 10000 as in LeNet train script
    mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
    mnist_ds = mnist_ds.repeat(repeat_size)

    return mnist_ds


class LeNet5(nn.Cell):
    """Lenet network structure."""
    # define the operator required
    def __init__(self, num_class=10, num_channel=1):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
        self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
        self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))
        self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))
        self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))
        self.relu = nn.ReLU()
        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
        self.flatten = nn.Flatten()

    # use the preceding operators to construct networks
    def construct(self, x):
        x = self.max_pool2d(self.relu(self.conv1(x)))
        x = self.max_pool2d(self.relu(self.conv2(x)))
        x = self.flatten(x)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x


def train_net(network_model, epoch_size, data_path, repeat_size, ckpoint_cb, sink_mode):
    """Define the training method."""
    print("============== Starting Training ==============")
    # load training dataset
    ds_train = create_dataset(os.path.join(data_path, "train"), 32, repeat_size)
    network_model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor()], dataset_sink_mode=sink_mode)


def test_net(network, network_model, data_path):
    """Define the evaluation method."""
    print("============== Starting Testing ==============")
    # load the saved model for evaluation
    param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt")
    # load parameter to the network
    load_param_into_net(network, param_dict)
    # load testing dataset
    ds_eval = create_dataset(os.path.join(data_path, "test"))
    acc = network_model.eval(ds_eval, dataset_sink_mode=False)
    print("============== Accuracy:{} ==============".format(acc))


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='MindSpore LeNet Example')
    parser.add_argument('--device_target', type=str, default="CPU", choices=['Ascend', 'GPU', 'CPU'],
                        help='device where the code will be implemented (default: CPU)')
    args = parser.parse_args()
    context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
    dataset_sink_mode = not args.device_target == "CPU"
    # download mnist dataset
    download_dataset()
    # learning rate setting
    lr = 0.01
    momentum = 0.9
    dataset_size = 1
    mnist_path = "./MNIST_Data"
    # define the loss function
    net_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
    train_epoch = 1
    # create the network
    net = LeNet5()
    # define the optimizer
    net_opt = nn.Momentum(net.trainable_params(), lr, momentum)
    config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)
    # save the network model and parameters for subsequence fine-tuning
    ckpoint = ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck)
    # group layers into an object with training and evaluation features
    model = Model(net, net_loss, net_opt, metrics={"Accuracy": Accuracy()})

    train_net(model, train_epoch, mnist_path, dataset_size, ckpoint, dataset_sink_mode)
    test_net(net, model, mnist_path)

utils/dataset.py

# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""download MNIST dataset"""
import os
import sys
import requests
from urllib.parse import urlparse
import gzip

def unzipfile(gzip_path):
    """unzip dataset file
    Args:
        gzip_path: dataset file path
    """
    open_file = open(gzip_path.replace('.gz', ''), 'wb')
    gz_file = gzip.GzipFile(gzip_path)
    open_file.write(gz_file.read())
    gz_file.close()

def download_progress(url, file_name):
    """download mnist dataset
    Args:
        url: download url
        file_name: dataset name
    """
    res = requests.get(url, stream=True, verify=False)
    # get mnist dataset size
    total_size = int(res.headers["Content-Length"])
    temp_size = 0
    with open(file_name, "wb+") as f:
        for chunk in res.iter_content(chunk_size=1024):
            temp_size += len(chunk)
            f.write(chunk)
            f.flush()
            done = int(100 * temp_size / total_size)
            # show download progress
            sys.stdout.write("\r[{}{}] {:.2f}%".format("█" * done, " " * (100 - done), 100 * temp_size / total_size))
            sys.stdout.flush()
    print("\n============== {} is already ==============".format(file_name))
    unzipfile(file_name)
    os.remove(file_name)

def download_dataset():
    """Download the dataset from http://yann.lecun.com/exdb/mnist/."""
    print("************** Downloading the MNIST dataset **************")
    train_path = "./MNIST_Data/train/"
    test_path = "./MNIST_Data/test/"
    train_path_check = os.path.exists(train_path)
    test_path_check = os.path.exists(test_path)
    if not train_path_check and not test_path_check:
        os.makedirs(train_path)
        os.makedirs(test_path)
    train_url = {"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz", "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"}
    test_url = {"http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz", "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"}
    for url in train_url:
        url_parse = urlparse(url)
        # split the file name from url
        file_name = os.path.join(train_path, url_parse.path.split('/')[-1])
        if not os.path.exists(file_name.replace('.gz', '')):
            download_progress(url, file_name)
    for url in test_url:
        url_parse = urlparse(url)
        # split the file name from url
        file_name = os.path.join(test_path,url_parse.path.split('/')[-1])
        if not os.path.exists(file_name.replace('.gz', '')):
            download_progress(url, file_name)

我们把这两个代码拷贝到自己建的项目 littleMi下面：

张小白本机安装过 MindSpore了。具体快速安装的方法参见：https://www.mindspore.cn/install/

按照安装命令执行即可。

看下目前本机（笔记本电脑）MindSpore的版本：1.2.0-rc1。

二、本地CPU训练LeNet网络

先用本地CPU来运行下：

看起来代码应该不需要修改。

先运行一下吧。

好像报错了。

看了下代码，好像没啥问题。

再运行一遍：

好像在慢慢下载数据集了。。。

耐心等待下载完毕。

它下载完毕后，接着就开始训练了。。。

。。。

等这1个epoch的 1875个训练结束，

loss is 0.052374754

'Accuracy': 0.9599358974358975

作业要求：

loss收敛到0.5以下得3分，推理精度达到90%以上得4分

其实已经满足要求了。

但是不是在Ascend910上跑的。

三、在Ascend910环境训练LeNet网络

下面我们就将这块代码跑到Ascend910上去：

将LeNet.py的device改成Ascend（搜索代码中多处CPU，替换为Ascend）

用OBS Browser plus建一个桶，以供PyCharm的ModelArts插件运行时使用：

登录进去：

创建桶：mindspore-lenet

把前面在本地CPU环境训练的时候下载好的数据集也传上去：

填写 PyCharm菜单-》Edit Traning Job Configurations的相关内容：

AI Engine: 选择 Ascend-Powered-Engine->MindSpore 1.1.1 python3.7 aarch64

Boot File Path：填写 C:\Users\zhang\PycharmProjects\littleMi\LeNet.py

Code Directory：填写 C:\Users\zhang\PycharmProjects\littleMi

OBS Path：填写 /mindspore-lenet/

Data Path in OBS：填写 /mindspore-lenet/MNIST_Data/

Running Parameters：填写 device_target=Ascend

由于 PyCharm上传代码后，会自动建立 data_url, train_url这两个变量，所以 LeNet代码中需要对这两个代码做一些处理：

parser.add_argument('--data_url', type=str, default="./MNIST_Data",help='path where the dataset is saved')
parser.add_argument('--train_url', type=str, default="",help='train_url')

另外，数据集由于已经在OBS上了，那么可以不调用 download_dataset,直接用mox从OBS拷贝过来：

import moxing as mox

    #context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
    context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")

    dataset_sink_mode = not args.device_target == "Ascend"
    # download mnist dataset
    #download_dataset()
    mox.file.copy_parallel(src_url="obs://mindspore-lenet/MNIST_Data/", dst_url="MNIST_Data")

再来Apply and Run:

此时PyCharm会将代码传上去：