Commit Graph

17 Commits

Author SHA1 Message Date
Zijie Tian
c16bfcf40f ♻️ refactor: restructure Observer as base class with InferenceObserver
- Refactor Observer into base class with common enable/disable/reset interface
- Create InferenceObserver subclass for TTFT/TPOT metrics
- Fix TTFT calculation timing: compute after prefill completes instead of
  at decode start (fixes max_tokens=1 returning TTFT=0)
- Integrate InferenceObserver into bench.py and bench_offload.py for
  accurate internal timing metrics vs external wall-clock time
- Add get_summary() and print_summary() methods for structured output

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-01-28 03:15:33 +08:00
Zijie Tian
b8b6478506 [feat] Need to optimized with async prefetch. 2025-12-15 06:58:40 +08:00
Zijie Tian
1081ab51ea [refactor] Refactor offload code to multi-chunk. 2025-12-15 01:13:58 +08:00
Zijie Tian
9b8165af5a [fix] Fixed kvcache offload problem. 2025-12-12 01:35:30 +08:00
Zijie Tian
babfa17354 [refactor] Translate into english, void Chinese due to claude. 2025-12-11 00:30:24 +08:00
Zijie Tian
e85c2b4776 [fix] Fixed kvcache offload bugs. 2025-12-10 22:34:00 +08:00
Zijie Tian
0a247ccb1b [feat] Added num_gpu_blocks limit gpu blocks. 2025-12-10 20:17:42 +08:00
Zijie Tian
01f19ee4a6 [feat] Added logger into nanovllm. 2025-12-10 19:53:38 +08:00
Zijie Tian
0b6f19242d [feat] Added chunked prefill and kvcache offload mechenism. 2025-12-10 03:47:37 +08:00
Zijie Tian
204fe2b38f [feat] Added metric into tqdm bar. 2025-12-10 00:52:13 +08:00
GeeeekExplorer
cde3fc22c2 simplify 2025-06-21 17:19:15 +08:00
GeeeekExplorer
bc0ad5a116 better 2025-06-17 23:33:38 +08:00
GeeeekExplorer
fc778a4da9 better 2025-06-15 10:36:45 +08:00
GeeeekExplorer
98a1551a7d support CUDA_VISIBLE_DEVICES 2025-06-12 23:14:01 +08:00
GeeeekExplorer
fee58d44e4 fix 2025-06-12 01:00:31 +08:00
GeeeekExplorer
08c84ec08d multi file loader 2025-06-12 01:00:09 +08:00
GeeeekExplorer
a5a4909e6a init commit 2025-06-10 00:27:01 +08:00