Files
nano-vllm/README.md
GeeeekExplorer 4a8aa090a7 fix
2025-06-14 00:56:07 +08:00

37 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Nano-vLLM
A lightweight vLLM implementation built from scratch.
## Key Features
* 🚀 **Fast offline inference** - Comparable inference speeds to vLLM
* 📖 **Readable codebase** - Clean implementation in ~ 1,200 lines of Python code
***Optimization Suite** - Prefix caching, Torch compilation, CUDA graph, etc.
## Installation
```bash
pip install git+https://github.com/GeeeekExplorer/nano-vllm.git
```
## Quick Start
See `example.py` for usage. The API mirrors vLLM's interface with minor differences in the `LLM.generate` method.
## Benchmark
See `bench.py` for benchmark.
**Test Configuration:**
- Hardware: RTX 4070
- Model: Qwen3-0.6B
- Total Requests: 256 sequences
- Input Length: Randomly sampled between 1001024 tokens
- Output Length: Randomly sampled between 1001024 tokens
**Performance Results:**
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
|----------------|-------------|----------|-----------------------|
| vLLM | 133,966 | 98.95 | 1353.86 |
| Nano-vLLM | 133,966 | 101.90 | 1314.65 |