昇腾CANN训练营 模型营的实操作业出来了:https://bbs.huaweicloud.com/forum/forum.php?mod=viewthread&tid=135955&fromuid=446160
具体题目为:使用华为云Ascend910在MNIST上面训练LeNet网络,上传loss截图和推理精度截图
姐夫丁大大已经使用AI Gallery和ModelArts做了这个作业了,大家可移驾:https://bbs.huaweicloud.com/blogs/281479 跟着操作即可。
本文拟采用 PyCharm的 ModelArts插件,完成本作业。
一、准备PyCharm ModelArts插件和MindSpore本地环境,并准备MindSpore的LeNet源代码
使用PyCharm安装配置ModelArts的方法可参见张小白的博客:https://bbs.huaweicloud.com/blogs/207322 搜索:PyCharm Kit的登场
配置完PyCharm之后,我们去MindSpore官网 https://www.mindspore.cn/tutorial/training/zh-CN/r1.2/index.html
点开教程-》训练-》快速入门-》实现一个图片分类应用:
会发现居然有那么多方法去玩转LeNet:可以下载Notebook在本地玩,可以在ModelArts玩,可以在HuaweiCloud上玩(https://www.hiascend.com/zh/college/onlineExperiment/codeLabMindSpore/mindSpore ,这应该是个沙箱实验室 )。。
因为计划使用PyCharm,那么就先去看源码吧:
点开黄色部分的 https://gitee.com/mindspore/docs/tree/r1.2/tutorials/tutorial_code/lenet 这个链接:
好像只有2个代码:
lenet.py
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Lenet Tutorial
This sample code is applicable to CPU, GPU and Ascend.
"""
import os
import argparse
import mindspore.dataset as ds
import mindspore.nn as nn
from mindspore import context, Model, load_checkpoint, load_param_into_net
from mindspore.common.initializer import Normal
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
import mindspore.dataset.vision.c_transforms as CV
import mindspore.dataset.transforms.c_transforms as C
from mindspore.dataset.vision import Inter
from mindspore.nn.metrics import Accuracy
from mindspore import dtype as mstype
from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
from utils.dataset import download_dataset
def create_dataset(data_path, batch_size=32, repeat_size=1,
num_parallel_workers=1):
""" create dataset for train or test
Args:
data_path: Data path
batch_size: The number of data records in each group
repeat_size: The number of replicated data records
num_parallel_workers: The number of parallel workers
"""
# define dataset
mnist_ds = ds.MnistDataset(data_path)
# define operation parameters
resize_height, resize_width = 32, 32
rescale = 1.0 / 255.0
shift = 0.0
rescale_nml = 1 / 0.3081
shift_nml = -1 * 0.1307 / 0.3081
# define map operations
resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) # Resize images to (32, 32)
rescale_nml_op = CV.Rescale(rescale_nml, shift_nml) # normalize images
rescale_op = CV.Rescale(rescale, shift) # rescale images
hwc2chw_op = CV.HWC2CHW() # change shape from (height, width, channel) to (channel, height, width) to fit network.
type_cast_op = C.TypeCast(mstype.int32) # change data type of label to int32 to fit network
# apply map operations on images
mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(operations=resize_op, input_columns="image", num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(operations=rescale_op, input_columns="image", num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns="image", num_parallel_workers=num_parallel_workers)
mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns="image", num_parallel_workers=num_parallel_workers)
# apply DatasetOps
buffer_size = 10000
mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script
mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
mnist_ds = mnist_ds.repeat(repeat_size)
return mnist_ds
class LeNet5(nn.Cell):
"""Lenet network structure."""
# define the operator required
def __init__(self, num_class=10, num_channel=1):
super(LeNet5, self).__init__()
self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))
self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))
self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))
self.relu = nn.ReLU()
self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
self.flatten = nn.Flatten()
# use the preceding operators to construct networks
def construct(self, x):
x = self.max_pool2d(self.relu(self.conv1(x)))
x = self.max_pool2d(self.relu(self.conv2(x)))
x = self.flatten(x)
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.fc3(x)
return x
def train_net(network_model, epoch_size, data_path, repeat_size, ckpoint_cb, sink_mode):
"""Define the training method."""
print("============== Starting Training ==============")
# load training dataset
ds_train = create_dataset(os.path.join(data_path, "train"), 32, repeat_size)
network_model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor()], dataset_sink_mode=sink_mode)
def test_net(network, network_model, data_path):
"""Define the evaluation method."""
print("============== Starting Testing ==============")
# load the saved model for evaluation
param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt")
# load parameter to the network
load_param_into_net(network, param_dict)
# load testing dataset
ds_eval = create_dataset(os.path.join(data_path, "test"))
acc = network_model.eval(ds_eval, dataset_sink_mode=False)
print("============== Accuracy:{} ==============".format(acc))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='MindSpore LeNet Example')
parser.add_argument('--device_target', type=str, default="CPU", choices=['Ascend', 'GPU', 'CPU'],
help='device where the code will be implemented (default: CPU)')
args = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
dataset_sink_mode = not args.device_target == "CPU"
# download mnist dataset
download_dataset()
# learning rate setting
lr = 0.01
momentum = 0.9
dataset_size = 1
mnist_path = "./MNIST_Data"
# define the loss function
net_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
train_epoch = 1
# create the network
net = LeNet5()
# define the optimizer
net_opt = nn.Momentum(net.trainable_params(), lr, momentum)
config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)
# save the network model and parameters for subsequence fine-tuning
ckpoint = ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck)
# group layers into an object with training and evaluation features
model = Model(net, net_loss, net_opt, metrics={"Accuracy": Accuracy()})
train_net(model, train_epoch, mnist_path, dataset_size, ckpoint, dataset_sink_mode)
test_net(net, model, mnist_path)
utils/dataset.py
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""download MNIST dataset"""
import os
import sys
import requests
from urllib.parse import urlparse
import gzip
def unzipfile(gzip_path):
"""unzip dataset file
Args:
gzip_path: dataset file path
"""
open_file = open(gzip_path.replace('.gz', ''), 'wb')
gz_file = gzip.GzipFile(gzip_path)
open_file.write(gz_file.read())
gz_file.close()
def download_progress(url, file_name):
"""download mnist dataset
Args:
url: download url
file_name: dataset name
"""
res = requests.get(url, stream=True, verify=False)
# get mnist dataset size
total_size = int(res.headers["Content-Length"])
temp_size = 0
with open(file_name, "wb+") as f:
for chunk in res.iter_content(chunk_size=1024):
temp_size += len(chunk)
f.write(chunk)
f.flush()
done = int(100 * temp_size / total_size)
# show download progress
sys.stdout.write("\r[{}{}] {:.2f}%".format("█" * done, " " * (100 - done), 100 * temp_size / total_size))
sys.stdout.flush()
print("\n============== {} is already ==============".format(file_name))
unzipfile(file_name)
os.remove(file_name)
def download_dataset():
"""Download the dataset from http://yann.lecun.com/exdb/mnist/."""
print("************** Downloading the MNIST dataset **************")
train_path = "./MNIST_Data/train/"
test_path = "./MNIST_Data/test/"
train_path_check = os.path.exists(train_path)
test_path_check = os.path.exists(test_path)
if not train_path_check and not test_path_check:
os.makedirs(train_path)
os.makedirs(test_path)
train_url = {"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz", "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"}
test_url = {"http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz", "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"}
for url in train_url:
url_parse = urlparse(url)
# split the file name from url
file_name = os.path.join(train_path, url_parse.path.split('/')[-1])
if not os.path.exists(file_name.replace('.gz', '')):
download_progress(url, file_name)
for url in test_url:
url_parse = urlparse(url)
# split the file name from url
file_name = os.path.join(test_path,url_parse.path.split('/')[-1])
if not os.path.exists(file_name.replace('.gz', '')):
download_progress(url, file_name)
我们把这两个代码拷贝到 自己建的项目 littleMi下面:
张小白本机安装过 MindSpore了。具体快速安装的方法参见:https://www.mindspore.cn/install/
按照安装命令执行即可。
看下目前本机(笔记本电脑)MindSpore的版本:1.2.0-rc1。
二、本地CPU训练LeNet网络
先用本地CPU来运行下:
看起来代码应该不需要修改。
先运行一下吧。
好像报错了。
看了下代码,好像没啥问题。
再运行一遍:
好像在慢慢下载数据集了。。。
耐心等待下载完毕。
它下载完毕后,接着就开始训练了。。。
。。。
等这1个epoch的 1875个训练结束,
loss is 0.052374754
'Accuracy': 0.9599358974358975
作业要求:
loss收敛到0.5以下得3分,推理精度达到90%以上得4分
其实已经满足要求了。
但是不是在Ascend910上跑的。
三、在Ascend910环境训练LeNet网络
下面我们就将这块代码跑到Ascend910上去:
将LeNet.py的device改成Ascend(搜索代码中多处CPU,替换为Ascend)
用OBS Browser plus建一个桶,以供PyCharm的ModelArts插件运行时使用:
登录进去:
创建桶:mindspore-lenet
把前面在本地CPU环境训练的时候下载好的数据集也传上去:
填写 PyCharm菜单-》Edit Traning Job Configurations的相关内容:
AI Engine: 选择 Ascend-Powered-Engine->MindSpore 1.1.1 python3.7 aarch64
Boot File Path:填写 C:\Users\zhang\PycharmProjects\littleMi\LeNet.py
Code Directory:填写 C:\Users\zhang\PycharmProjects\littleMi
OBS Path:填写 /mindspore-lenet/
Data Path in OBS:填写 /mindspore-lenet/MNIST_Data/
Running Parameters:填写 device_target=Ascend
由于 PyCharm上传代码后,会自动建立 data_url, train_url这两个变量,所以 LeNet代码中需要对这两个代码做一些处理:
parser.add_argument('--data_url', type=str, default="./MNIST_Data",help='path where the dataset is saved')
parser.add_argument('--train_url', type=str, default="",help='train_url')
另外,数据集由于已经在OBS上了,那么可以不调用 download_dataset,直接用mox从OBS拷贝过来:
import moxing as mox
#context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")
dataset_sink_mode = not args.device_target == "Ascend"
# download mnist dataset
#download_dataset()
mox.file.copy_parallel(src_url="obs://mindspore-lenet/MNIST_Data/", dst_url="MNIST_Data")
再来Apply and Run:
此时PyCharm会将代码传上去:
ModelArts的日志会报以下内容:
启动:
拷贝数据集Moxing:
开始训练:
等1875的step结束:
epoch: 1 step: 1875, loss is 0.18389125
============== Accuracy:{'Accuracy': 0.9458133012820513} ==============
作业要求:
loss收敛到0.5以下得3分,推理精度达到90%以上得4分
也已经满足作业的要求了。
那就交差吧!
(全文完,谢谢阅读)