模型训练
airockchip/yolov5 仓库的yolov5对rknpu设备进行了部署优化:
- 优化focus/SPPF块,以相同的结果获得更好的性能
- 更改输出节点,从模型中删除post_process(后处理量化方面不友好)
- 使用ReLU代替SiLU作为激活层(仅在使用该仓库训练新模型时才会替换)
模型训练
通过如下命令拉取 airockchip/yolov5 仓库:
git clone -b https://github.com/airockchip/yolov5.git
cd yolov5
本章节测试使用 airockchip/yolov5 重新训练模型,也可以使用自己Ultralytics官方仓库训练的模型,然后使用该库导出适配rknpu的模型。
# 可以自定义数据集添加数据集配置文件,修改模型配置文件等等,最后使用该仓库训练模型
# 教程测试就重新训练coco128,使用预训练权重yolov5s.pt
(toolkit2) root@ubuntu:~/yolov5$ python3 train.py --data coco128.yaml --weights yolov5s.pt --img 640
# 训练会自动拉取权重和数据集,训练过程中结束输出信息很多,其中会输出:
...
Optimizer stripped from runs/train/exp/weights/last.pt, 14.9MB
Optimizer stripped from runs/train/exp/weights/best.pt, 14.9MB
Validating runs/train/exp/weights/best.pt...
...
# 训练一些分析和权重保存在runs/train/exp/目录下
模型最后保存在runs/train/exp/weights/目录下的best.bt,接下来将该模型导出为onnx模型。
# export.py 参数说明
# --weights指定权重文件的路径
# --rknpu指定平台(rk_platform支持 rk1808, rv1109, rv1126, rk3399pro, rk3566, rk3568, rk3588, rv1103, rv1106)
# --include 设置导出模型格式,默认导出torchscript格式模型,也可以指定onnx
python3 export.py --weights runs/train/exp/weights/best.pt --rknpu rk3588
export: data=data/coco128.yaml, weights=['runs/train/exp4/weights/best.pt'], imgsz=[640, 640],
batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False,
dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100,
topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['torchscript'], rknpu=rk3588
YOLOv5 🚀 v6.2-4-g23a20ef3 Python-3.9.18 torch-2.1.2+cu121 CPU
Fusing layers...
Model summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
---> save anchors for RKNN
[[10.0, 13.0], [16.0, 30.0], [33.0, 23.0], [30.0, 61.0], [62.0, 45.0], [59.0, 119.0], [116.0, 90.0], [156.0, 198.0], [373.0, 326.0]]
PyTorch: starting from runs/train/exp4/weights/best.pt with output shape (1, 255, 80, 80) (14.2 MB)
TorchScript: starting export with torch 2.1.2+cu121...
TorchScript: export success, saved as runs/train/exp4/weights/best.torchscript (27.9 MB)
Export complete (2.82s)
Results saved to /mnt/f/wsl_file/wsl_ai/yolov5/airockchip/yolov5-6.2.1/runs/train/exp4/weights
Detect: python detect.py --weights runs/train/exp4/weights/best.torchscript
Validate: python val.py --weights runs/train/exp4/weights/best.torchscript
PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'custom', 'runs/train/exp4/weights/best.torchscript')
Visualize: https://netron.app
# 或者使用下面命令,导出onnx模型,会在runs/train/exp/weights目录下生成best.onnx文件,注意下环境可能需要安装onnx。
python3 export.py --weights runs/train/exp/weights/best.pt --include onnx --rknpu rk3588
导出torchscript模型,会保存在对应模型目录runs/train/exp/weights下,名称为best.torchscript,如果是onnx模型,则会在目录下生成best.onnx文件。
模型转换
测试使用前面的转换文件,直接运行程序导出rknn模型文件, 或者使用 这里 的工具进行模型转换,模型评估,模型部署等。
# 运行转换程序,将torchscript或者onnx模型转成rknn模型,这里测试onnx模型,
# lubancat-4(指定参数rk3588),lubancat-2(指定参数rk3568)
# python onnx2rknn.py <onnx_model> <TARGET_PLATFORM> <dtype(optional)> <output_rknn_path(optional)>
(toolkit2) root@ubuntu:~/yolov5$ python onnx2rknn.py ./yolov5n.onnx rk3588 i8
I rknn-toolkit2 version: 2.2.0
--> Config model
done
--> Loading model
I Loading : 100%|██████████████████████████████████████████████| 121/121 [00:00<00:00, 73212.75it/s]
done
--> Building model
I OpFusing 0: 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 1144.06it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 846.42it/s]
I OpFusing 2 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 308.64it/s]
W build: found outlier value, this may affect quantization accuracy
const name abs_mean abs_std outlier value
onnx::Conv_347 0.68 0.89 -11.603
I GraphPreparing : 100%|███████████████████████████████████████| 149/149 [00:00<00:00, 10361.63it/s]
I Quantizating : 100%|████████████████████████████████████████████| 149/149 [00:13<00:00, 11.03it/s]
# 省略..........
I rknn buiding done.
done
--> Export rknn model
done
模型测试
然后再使用rknn-toolkit2工具连接板卡,进行模型测试,评估等等。需要板卡通过usb或者连接PC,确认adb连接正常,以及启动rknn_server。
# 连接板卡测试模型,结果保存在result.jpg
(toolkit2) root@ubuntu:~/yolov5$ python test.py
*************************
all device(s) with adb mode:
192.168.103.152:5555
*************************
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI: API: 1.6.0 (535b468 build@2023-12-11T09:05:46)
D RKNNAPI: DRV: rknn_server: 1.5.0 (17e11b1 build: 2023-05-18 21:43:39)
D RKNNAPI: DRV: rknnrt: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
D RKNNAPI: ==============================================
D RKNNAPI: Input tensors:
D RKNNAPI: index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 0,
# 省略....
--> Running model
done
class score xmin, ymin, xmax, ymax
--------------------------------------------------
person 0.884 [ 209, 244, 286, 506]
person 0.868 [ 478, 238, 559, 526]
person 0.825 [ 110, 238, 230, 534]
person 0.339 [ 79, 354, 122, 516]
bus 0.705 [ 94, 129, 553, 468]
Save results to result.jpg!