History

yingdong.han 6177c2686b support inference by vllm		2025-06-27 15:01:22 +08:00
..
api_client.py	support inference by vllm	2025-06-27 15:01:22 +08:00
api_server.py	support inference by vllm	2025-06-27 15:01:22 +08:00
demo_vllm.py	support inference by vllm	2025-06-27 15:01:22 +08:00
ReadMe.md	support inference by vllm	2025-06-27 15:01:22 +08:00

ReadMe.md

🚀 Dolphin vLLM Demo

✅ Introduction

The Dolphin model employs a Swin Encoder + MBart Decoder architecture. In the HuggingFace Transformers Config, its architectures field is specified as "VisionEncoderDecoderModel". vLLM does not natively support this architecture. To enable vLLM deployment of the Dolphin model, we implemented two vllm plugins: vllm-dolphin and vllm-mbart . We also provide Dolphin vllm demos for both offline inference and online deployment.

🛠️ Installation

# Install vllm
pip install vllm>=0.9.0

# Install vllm-dolphin
pip install vllm-dolphin==0.1

⚡ Offline Inference

# predict elements reading order
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/page_imgs/page_1.jpeg --prompt "Parse the reading order of this document."

# recognize text/latex
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/element_imgs/block_formula.jpeg --prompt "Read text in the image."
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/element_imgs/para_1.jpg --prompt "Read text in the image."

# recognize table
python deployment/vllm/demo_vllm.py --model ByteDance/Dolphin --image_path ./demo/element_imgs/table_1.jpeg --prompt "Parse the table in the image."

⚡ Online Inference

# 1. Start Api Server
python deployment/vllm/api_server.py --model="ByteDance/Dolphin" --hf-overrides "{\"architectures\": [\"DolphinForConditionalGeneration\"]}"

# 2. Predict
# predict elements reading order
python deployment/vllm/api_client.py --image_path ./demo/page_imgs/page_1.jpeg --prompt "Parse the reading order of this document."

# recognize text/latex
python deployment/vllm/api_client.py --image_path ./demo/element_imgs/block_formula.jpeg --prompt "Read text in the image."
python deployment/vllm/api_client.py --image_path ./demo/element_imgs/para_1.jpg --prompt "Read text in the image."

# recognize table
python deployment/vllm/api_client.py --image_path ./demo/element_imgs/table_1.jpeg --prompt "Parse the table in the image."