沐曦C500在龙蜥OS 23.2部署Qwen2.5-0.5B-Instruct大模型实战

[复制链接]
104 2

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?立即注册

x
基于国产沐曦C500 GPU与龙蜥操作系统(Anolis OS 23.2),部署阿里通义千问Qwen2.5-0.5B-Instruct对话模型,依托vLLM框架实现高性能推理。通过该实践验证国产GPU在轻量化大模型场景的落地能力,为AI部署提供高性价比选择,技术栈已开源适配脚本,方便初学者快速体验沐曦软件生态模型的部署和使用。
一、前置条件说明
1. 操作系统版本要求:建议采用Anolis OS 23.2版本
2. MXMACA驱动及SDK安装,安装后可参考对应的版本如
备注说明<对应的MACA安装步骤详见安装手册>:曦云系列_通用计算GPU_快速上手指南_CN_V08
forum.png
为了方便大家使用,在线学习平台已经制作了打包好对应操作系统及MACA版本的镜像,如社区同学可使用MetaX-Anolis OS 23 Base镜像开始后续学习。

二、vLLM部署安装过程
1. 部署前置条件,需要安装对应的依赖包(rust、cargo)
2. 确认系统环境变量
备注说明:默认安装手册中没有ompi的LD_LIBRARY_PATH,需要特别注意!!
3. 部署安装vLLM-0.6.2推理引擎
(1) 从软件中心下载vLLM及PyTorch依赖包:
l maca-pytorch2.1-py38-2.29.0.4-x86_64.tar.xz
l mxc500-vllm-py38-2.27.0.9-linux-x86_64.tar.xz
(2) 安装vLLM依赖包pytorch
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm metax]# tar xvf maca-pytorch2.1-py38-2.29.0.4-x86_64.tar.xz
2.29.0.4/.layerspec/
2.29.0.4/.layerspec/Dockerfil
2.29.0.4/.layerspec/conda-py38-x86_64.tar
2.29.0.4/.layerspec/install.sh
2.29.0.4/.layerspec/config
2.29.0.4/wheel/triton-2.1.0+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/torch-2.1.2+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/torchvision-0.15.1+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/torchaudio-2.0.1+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/apex-0.1+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/flash_attn-2.6.3+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/fused_dense_lib-2.6.3+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/dropout_layer_norm-0.1+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/rotary_emb-0.1+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/xentropy_cuda_lib-0.1+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/xformers-0.0.22+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/flashinfer-0.1.5+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
2.29.0.4/wheel/mcspconv-2.1.0+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm metax]# cd 2.29.0.4/wheel/
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm wheel]# pip install *
Processing ./apex-0.1+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
Processing ./dropout_layer_norm-0.1+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./flash_attn-2.6.3+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./flashinfer-0.1.5+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./fused_dense_lib-2.6.3+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./mcspconv-2.1.0+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./rotary_emb-0.1+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./torch-2.1.2+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
Processing ./torchaudio-2.0.1+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
Processing ./torchvision-0.15.1+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
Processing ./triton-2.1.0+metax2.29.0.4-cp38-cp38-linux_x86_64.whl
Processing ./xentropy_cuda_lib-0.1+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./xformers-0.0.22+metax2.29.0.4torch2.1-cp38-cp38-linux_x86_64.whl

(3) 安装vLLM
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm metax]# tar xvf mxc500-vllm-py38-2.27.0.9-linux-x86_64.tar.xz
mxc500-vllm-2.27.0.9/.layerspec/
mxc500-vllm-2.27.0.9/.layerspec/config
mxc500-vllm-2.27.0.9/.layerspec/Dockerfile
mxc500-vllm-2.27.0.9/.layerspec/install.sh
mxc500-vllm-2.27.0.9/minSDKVersion
mxc500-vllm-2.27.0.9/wheel/flash_attn_vllm-2.6.3+263metax2.27.1.2torch2.1-cp38-cp38-linux_x86_64.whl
mxc500-vllm-2.27.0.9/wheel/ray-2.9.3+maca2.27.0.9-cp38-cp38-linux_x86_64.whl
mxc500-vllm-2.27.0.9/wheel/vllm-0.6.2+maca2.27.0.11torch2.1-cp38-cp38-linux_x86_64.whl
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm metax]# cd mxc500-vllm-2.27.0.9/cd wheel/
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm wheel]# ls
flash_attn_vllm-2.6.3+263metax2.27.1.2torch2.1-cp38-cp38-linux_x86_64.whl  vllm-0.6.2+maca2.27.0.11torch2.1-cp38-cp38-linux_x86_64.whl
ray-2.9.3+maca2.27.0.9-cp38-cp38-linux_x86_64.whl
(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm wheel]# pip install *
Processing ./flash_attn_vllm-2.6.3+263metax2.27.1.2torch2.1-cp38-cp38-linux_x86_64.whl
Processing ./ray-2.9.3+maca2.27.0.9-cp38-cp38-linux_x86_64.whl
Processing ./vllm-0.6.2+maca2.27.0.11torch2.1-cp38-cp38-linux_x86_64.whl
Requirement already satisfied: ninja in /opt/miniconda3/lib/python3.8/site-packages (from flash-attn-vllm==2.6.3+263metax2.27.1.2torch2.1) (1.11.1.4)
Requirement already satisfied: torch in /opt/miniconda3/lib/python3.8/site-packages (from flash-attn-vllm==2.6.3+263metax2.27.1.2torch2.1) (2.1.2+metax2.29.0.4)
Requirement already satisfied: packaging in /opt/miniconda3/lib/python3.8/site-packages (from flash-attn-vllm==2.6.3+263metax2.27.1.2torch2.1) (24.2)
Requirement already satisfied: einops in /opt/miniconda3/lib/python3.8/site-packages (from flash-attn-vllm==2.6.3+263metax2.27.1.2torch2.1) (0.8.1)
Requirement already satisfied: jsonschema in /opt/miniconda3/lib/python3.8/site-packages (from ray==2.9.3+maca2.27.0.9) (4.23.0)

(4) 验证vLLM安装结果

(base) [root@openhydra-deploy-zhangjinnan-84956f99bb-kv9bm ~]# python -c "import vllm; print(f'vLLM version:{vllm.__version__}')"
/opt/miniconda3/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libpng16.so.16: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/opt/miniconda3/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
  warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/miniconda3/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
  warnings.warn(_BETA_TRANSFORMS_WARNING)
vLLM version:0.6.2
三、Qwen 2.5-0.5B模型部署验证
用一个简单的测试程序验证vLLM可以启动Qwen 2.5-0.5B模型,test1.py文件如下:
from vllm import LLM, SamplingParams
# 初始化模型(使用0.5B小模型快速验证,已经下载的本地权重文件目录)
llm = LLM(model="/root/notebook/model/Qwen/Qwen2___5-0___5B-Instruct")
# 如下写法默认从HuggingFace下载镜像,
# llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
# 配置生成参数
sampling_params = SamplingParams(temperature=0, max_tokens=50)
# 执行推理
outputs = llm.generate(["AI的未来是"], sampling_params)
# 打印结果
print("输出:", outputs[0].outputs[0].text)

备注说明:默认vllm从Hugging下载,有如下报错:
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen/Qwen1.5-0.5B-Chat/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f9f0dbb2310>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: f6f97878-b718-4ac3-ad03-c63903ea0693)')

解决方案,用modelscope下载模型权重文件到本地目录,参考:模型的下载 · 文档中心。

精彩评论2

HKUST  注册会员  发表于 2025-4-15 09:15:23 | 显示全部楼层
可有后续的报道?
shibozhang  管理员  发表于 2025-4-15 18:25:37 | 显示全部楼层
HKUST 发表于 2025-4-15 09:15
可有后续的报道?

您好,更新了帖子内容。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

©沐曦 沪ICP备2020031767号-1
快速回复 返回顶部 返回列表