Feature and Model Support Matrix

Contents

Feature and Model Support Matrix#

This page provides an overview of UCM (Unified Cache Manager) compatibility across different models and inference frameworks. Use this matrix as a compatibility reference for model selection, deployment, and feature validation.

Legend 🧭#

Symbol	Description
✅	Fully supported
❌	Not supported
🟡	Not tested or verified

Model Support and Feature Compatibility 🧩#

Prefix Cache Support#

This section presents prefix cache support for each model across the supported inference frameworks. This information serves as a reference for evaluating framework compatibility in deployments that require prefix cache.

Model	vLLM (v0.17.0)	vLLM-Ascend (v0.17.0rc1)	SGLang (v0.5.9)
DeepSeek V3.2	✅	✅	✅
DeepSeek R1	✅	✅	✅
DeepSeek V3/3.1	✅	✅	✅
Qwen3.5	❌	❌	❌
Qwen3	✅	✅	✅
Qwen3-Moe	✅	✅	✅
Qwen3-Next	❌	❌	❌
Qwen2.5	✅	✅	✅
GLM-5	✅	✅	❌
GLM-4.x	✅	✅	✅
MiniMax-M2.5	✅	✅	✅
Kimi-K2.5	❌	❌	❌

Note: The table lists a selected set of representative models. The framework versions shown in the table indicate the latest adapted versions. See Prefix Cache for more details.

Inference Enhancement Features#

This section presents support information for inference enhancement features, including Sparse Attention, ReRoPE, and CacheBlend, across the listed models and framework versions.

Model	GsaOnDevice vLLM / vLLM-Ascend 0.11.0	ReRoPE vLLM 0.11.0	CacheBlend vLLM 0.9.2
DeepSeek V3.2	✅	✅	✅
DeepSeek R1	✅	✅	✅
DeepSeek V3/3.1	✅	✅	✅
Qwen3	✅	✅	✅
Qwen2.5	✅	✅	✅

Note: See Sparse Attention and ReRoPE for more details.

Notes and Limitations 📌#

This matrix is provided as a compatibility reference for the configurations listed on this page.
Actual behavior may vary depending on hardware, runtime settings, backend changes, and model variants.
This support matrix is continuously updated. For the latest information, please refer to the GitHub issues and pull requests.