nano-vllm

Files

Zijie Tian 29e102720b 🐛 fix: support multiple EOS tokens for GLM-4

GLM-4 uses multiple EOS tokens [151329, 151336, 151338] where 151336
(<|user|>) should also stop generation. Previously only the first EOS
from tokenizer was used, causing generation to always hit max_tokens.

Changes:
- config.py: Change eos type to int | list[int]
- llm_engine.py: Read eos_token_id from hf_config (contains full list)
- scheduler.py: Use set for efficient multi-EOS lookup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-28 13:23:53 +08:00

block_manager.py

simplify

2025-08-31 20:02:51 +08:00

llm_engine.py

🐛 fix: support multiple EOS tokens for GLM-4

2026-01-28 13:23:53 +08:00

model_runner.py

✨ feat: add GLM-4-9B-Chat-1M model support

2026-01-28 13:15:57 +08:00

scheduler.py

🐛 fix: support multiple EOS tokens for GLM-4

2026-01-28 13:23:53 +08:00

sequence.py

[fix] Fixed needle test bug.

2026-01-05 18:34:09 +08:00