release

2025-06-13 00:41:33 +08:00
parent 98a1551a7d
commit 135d1b38a2
5 changed files with 65 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -1 +1,36 @@
-# Nano-VLLM
+# Nano-vLLM
+
+A lightweight vLLM implementation built from scratch.
+
+## Key Features
+
+* 🚀 **Fase offline inference** - Comparable inference speeds to vLLM
+* 📖 **Readable codebase** - Clean implementation under 1,200 lines of Python code
+* ⚡ **Optimization Suite** - Prefix caching, Torch compilation, CUDA graph, etc
+
+## Installation
+
+```bash
+pip install git+https://github.com/GeeeekExplorer/nano-vllm.git
+```
+
+## Quick Start
+
+See `example.py` for usage. The API mirrors vLLM's interface with minor differences in the `LLM.generate` method.
+
+## Benchmark
+
+See `bench.py` for benchmark.
+
+**Test Configuration:**
+- Hardware: RTX 4070
+- Model: Qwen3-0.6B
+- Total Requests: 256 sequences
+- Input Length: Randomly sampled between 100–1024 tokens
+- Output Length: Randomly sampled between 100–1024 tokens
+
+**Performance Results:**
+| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
+|----------------|-------------|----------|-----------------------|
+| vLLM           | 133,966     | 98.95    | 1353.86               |
+| Nano-vLLM      | 133,966     | 101.90   | 1314.65               |