模型训练

airockchip/yolov5 仓库的yolov5对rknpu设备进行了部署优化：

优化focus/SPPF块，以相同的结果获得更好的性能
更改输出节点，从模型中删除post_process（后处理量化方面不友好）
使用ReLU代替SiLU作为激活层（仅在使用该仓库训练新模型时才会替换）

模型训练

通过如下命令拉取 airockchip/yolov5 仓库：

git clone -b https://github.com/airockchip/yolov5.git
cd yolov5

本章节测试使用 airockchip/yolov5 重新训练模型，也可以使用自己Ultralytics官方仓库训练的模型，然后使用该库导出适配rknpu的模型。

# 可以自定义数据集添加数据集配置文件，修改模型配置文件等等，最后使用该仓库训练模型
# 教程测试就重新训练coco128，使用预训练权重yolov5s.pt
(toolkit2) root@ubuntu:~/yolov5$ python3 train.py --data coco128.yaml --weights yolov5s.pt  --img 640

# 训练会自动拉取权重和数据集，训练过程中结束输出信息很多，其中会输出：
...
Optimizer stripped from runs/train/exp/weights/last.pt, 14.9MB
Optimizer stripped from runs/train/exp/weights/best.pt, 14.9MB

Validating runs/train/exp/weights/best.pt...
...
# 训练一些分析和权重保存在runs/train/exp/目录下

模型最后保存在runs/train/exp/weights/目录下的best.bt，接下来将该模型导出为onnx模型。

# export.py 参数说明
# --weights指定权重文件的路径
# --rknpu指定平台(rk_platform支持 rk1808, rv1109, rv1126, rk3399pro, rk3566, rk3568, rk3588, rv1103, rv1106)
# --include 设置导出模型格式，默认导出torchscript格式模型，也可以指定onnx
python3 export.py --weights runs/train/exp/weights/best.pt --rknpu rk3588
export: data=data/coco128.yaml, weights=['runs/train/exp4/weights/best.pt'], imgsz=[640, 640],
batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False,
dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100,
topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['torchscript'], rknpu=rk3588
YOLOv5 🚀 v6.2-4-g23a20ef3 Python-3.9.18 torch-2.1.2+cu121 CPU

Fusing layers...
Model summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
---> save anchors for RKNN
[[10.0, 13.0], [16.0, 30.0], [33.0, 23.0], [30.0, 61.0], [62.0, 45.0], [59.0, 119.0], [116.0, 90.0], [156.0, 198.0], [373.0, 326.0]]

PyTorch: starting from runs/train/exp4/weights/best.pt with output shape (1, 255, 80, 80) (14.2 MB)

TorchScript: starting export with torch 2.1.2+cu121...
TorchScript: export success, saved as runs/train/exp4/weights/best.torchscript (27.9 MB)

Export complete (2.82s)
Results saved to /mnt/f/wsl_file/wsl_ai/yolov5/airockchip/yolov5-6.2.1/runs/train/exp4/weights
Detect:          python detect.py --weights runs/train/exp4/weights/best.torchscript
Validate:        python val.py --weights runs/train/exp4/weights/best.torchscript
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'runs/train/exp4/weights/best.torchscript')
Visualize:       https://netron.app

# 或者使用下面命令，导出onnx模型，会在runs/train/exp/weights目录下生成best.onnx文件，注意下环境可能需要安装onnx。
python3 export.py --weights runs/train/exp/weights/best.pt --include onnx --rknpu rk3588

导出torchscript模型，会保存在对应模型目录runs/train/exp/weights下，名称为best.torchscript，如果是onnx模型，则会在目录下生成best.onnx文件。

模型转换

测试使用前面的转换文件，直接运行程序导出rknn模型文件，或者使用这里的工具进行模型转换,模型评估,模型部署等。

# 运行转换程序，将torchscript或者onnx模型转成rknn模型，这里测试onnx模型,
# lubancat-4(指定参数rk3588)，lubancat-2（指定参数rk3568）
# python onnx2rknn.py  <onnx_model> <TARGET_PLATFORM> <dtype(optional)> <output_rknn_path(optional)>
(toolkit2) root@ubuntu:~/yolov5$ python onnx2rknn.py ./yolov5n.onnx rk3588 i8
I rknn-toolkit2 version: 2.2.0
--> Config model
done
--> Loading model
I Loading : 100%|██████████████████████████████████████████████| 121/121 [00:00<00:00, 73212.75it/s]
done
--> Building model
I OpFusing 0: 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 1144.06it/s]
I OpFusing 1 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 846.42it/s]
I OpFusing 2 : 100%|█████████████████████████████████████████████| 100/100 [00:00<00:00, 308.64it/s]
W build: found outlier value, this may affect quantization accuracy
                        const name          abs_mean    abs_std     outlier value
                        onnx::Conv_347      0.68        0.89        -11.603
I GraphPreparing : 100%|███████████████████████████████████████| 149/149 [00:00<00:00, 10361.63it/s]
I Quantizating : 100%|████████████████████████████████████████████| 149/149 [00:13<00:00, 11.03it/s]
# 省略..........
I rknn buiding done.
done
--> Export rknn model
done

模型测试

然后再使用rknn-toolkit2工具连接板卡，进行模型测试，评估等等。需要板卡通过usb或者连接PC，确认adb连接正常，以及启动rknn_server。

# 连接板卡测试模型，结果保存在result.jpg
(toolkit2) root@ubuntu:~/yolov5$ python test.py
*************************
all device(s) with adb mode:
192.168.103.152:5555
*************************
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:36)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.6.0 (535b468 build@2023-12-11T09:05:46)
D RKNNAPI:   DRV: rknn_server: 1.5.0 (17e11b1 build: 2023-05-18 21:43:39)
D RKNNAPI:   DRV: rknnrt: 1.6.0 (9a7b5d24c@2023-12-13T17:31:11)
D RKNNAPI: ==============================================
D RKNNAPI: Input tensors:
D RKNNAPI:   index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 0,
# 省略....
--> Running model
done
class        score      xmin, ymin, xmax, ymax
--------------------------------------------------
person       0.884     [ 209,  244,  286,  506]
person       0.868     [ 478,  238,  559,  526]
person       0.825     [ 110,  238,  230,  534]
person       0.339     [  79,  354,  122,  516]
 bus         0.705     [  94,  129,  553,  468]
Save results to result.jpg!

模型训练​

模型转换​

模型测试​

模型训练

模型转换

模型测试