better

2025-06-17 23:15:02 +08:00
parent 7e42fa6f63
commit bc0ad5a116
6 changed files with 27 additions and 23 deletions
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ A lightweight vLLM implementation built from scratch.

 * 🚀 **Fast offline inference** - Comparable inference speeds to vLLM
 * 📖 **Readable codebase** - Clean implementation in ~ 1,200 lines of Python code
-* ⚡ **Optimization Suite** - Prefix caching, Torch compilation, CUDA graph, etc.
+* ⚡ **Optimization Suite** - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

 ## Installation

@@ -17,6 +17,14 @@ pip install git+https://github.com/GeeeekExplorer/nano-vllm.git
 ## Quick Start

 See `example.py` for usage. The API mirrors vLLM's interface with minor differences in the `LLM.generate` method.
+```python
+from nanovllm import LLM, SamplingParams
+llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1)
+sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
+prompts = ["Hello, Nano-vLLM."]
+outputs = llm.generate(prompts, sampling_params)
+outputs[0]["text"]
+```

 ## Benchmark