nano-vllm

Files

Zijie Tian c16bfcf40f ♻️ refactor: restructure Observer as base class with InferenceObserver

- Refactor Observer into base class with common enable/disable/reset interface
- Create InferenceObserver subclass for TTFT/TPOT metrics
- Fix TTFT calculation timing: compute after prefill completes instead of
  at decode start (fixes max_tokens=1 returning TTFT=0)
- Integrate InferenceObserver into bench.py and bench_offload.py for
  accurate internal timing metrics vs external wall-clock time
- Add get_summary() and print_summary() methods for structured output

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

2026-01-28 03:15:33 +08:00

block_manager.py

simplify

2025-08-31 20:02:51 +08:00

llm_engine.py

♻️ refactor: restructure Observer as base class with InferenceObserver

2026-01-28 03:15:33 +08:00

model_runner.py

🔀 merge: integrate tzj/minference-exp (GPU-only sparse attention)

2026-01-27 09:25:36 +08:00

scheduler.py

♻️ refactor: restructure Observer as base class with InferenceObserver

2026-01-28 03:15:33 +08:00

sequence.py

[fix] Fixed needle test bug.

2026-01-05 18:34:09 +08:00