Dolphin/deployment/vllm
2025-06-27 15:01:22 +08:00
..
api_client.py support inference by vllm 2025-06-27 15:01:22 +08:00
api_server.py support inference by vllm 2025-06-27 15:01:22 +08:00
demo_vllm.py support inference by vllm 2025-06-27 15:01:22 +08:00
ReadMe.md support inference by vllm 2025-06-27 15:01:22 +08:00

🚀 Dolphin vLLM Demo

Introduction

The Dolphin model employs a Swin Encoder + MBart Decoder architecture. In the HuggingFace Transformers Config, its architectures field is specified as "VisionEncoderDecoderModel". vLLM does not natively support this architecture. To enable vLLM deployment of the Dolphin model, we implemented two vllm plugins: vllm-dolphinPyPI version and vllm-mbartPyPI version. We also provide Dolphin vllm demos for both offline inference and online deployment.

🛠️ Installation

# Install vllm
pip install vllm>=0.9.0

# Install vllm-dolphin
pip install vllm-dolphin==0.1

Offline Inference

# predict elements reading order
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/page_imgs/page_1.jpeg --prompt "Parse the reading order of this document."

# recognize text/latex
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/element_imgs/block_formula.jpeg --prompt "Read text in the image."
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/element_imgs/para_1.jpg --prompt "Read text in the image."

# recognize table
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/element_imgs/table_1.jpeg --prompt "Parse the table in the image."

Online Inference

# 1. Start Api Server
python deployment/vllm/api_server.py --model="ByteDance/Dolphin" --hf-overrides "{\"architectures\": [\"DolphinForConditionalGeneration\"]}"

# 2. Predict
# predict elements reading order
python deployment/vllm/api_client.py --image_path ./demo/page_imgs/page_1.jpeg --prompt "Parse the reading order of this document."

# recognize text/latex
python deployment/vllm/api_client.py --image_path ./demo/element_imgs/block_formula.jpeg --prompt "Read text in the image."
python deployment/vllm/api_client.py --image_path ./demo/element_imgs/para_1.jpg --prompt "Read text in the image."

# recognize table
python deployment/vllm/api_client.py --image_path ./demo/element_imgs/table_1.jpeg --prompt "Parse the table in the image."