Zijie Tian
2fe50bab50
[claudesquad] update from 'debug_chunk-2' on 07 Jan 26 03:27 CST
2026-01-07 03:27:27 +08:00
Zijie Tian
f240903013
[docs] Add GPU mutex instructions for multi-instance debugging
...
Add instructions for Claude instances to check GPU availability before
running CUDA operations, preventing conflicts when multiple instances
debug in parallel on a single GPU.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-07 01:42:59 +08:00
Zijie Tian
edb5273e34
[WIP] Added basic test for quest.
2026-01-06 22:30:31 +08:00
Zijie Tian
e554d5482b
[refactor] Delete unnesscessory test, and refacrtor the offload prefix cache.
2026-01-05 20:31:42 +08:00
Zijie Tian
054aaff403
[fix] Fixed needle test bug.
2026-01-05 18:34:09 +08:00
Zijie Tian
9b52d25866
[docs] Update CLAUDE.md.
2026-01-03 20:46:00 +08:00
Zijie Tian
bf4c63c7ec
[docs] Added Sparse Attn.
2025-12-29 19:56:54 +08:00
Zijie Tian
82ed34fc2d
[opt] optimize nanovllm performance compareable with vllm.
2025-12-25 03:47:07 +08:00
Zijie Tian
16fcf8350b
[WIP] replace merge attention with triton kernel.
2025-12-25 01:07:05 +08:00
Zijie Tian
6ec1b23982
[WIP] NEED to modify communication.
2025-12-24 21:57:51 +08:00
Zijie Tian
782437c486
[WIP] remove num_prefetch_blocks varible.
2025-12-24 18:22:26 +08:00
Zijie Tian
1907b625b6
[refactor] Remove legacy mode path.
2025-12-22 20:17:56 +08:00
Zijie Tian
08d83185ce
[fix] fix bench*.py.
2025-12-22 19:53:50 +08:00
Zijie Tian
8df0c7517b
[docs] refactor CLAUDE.md.
2025-12-15 21:43:33 +08:00
Zijie Tian
b8b6478506
[feat] Need to optimized with async prefetch.
2025-12-15 06:58:40 +08:00
Zijie Tian
1081ab51ea
[refactor] Refactor offload code to multi-chunk.
2025-12-15 01:13:58 +08:00
Zijie Tian
5949537faf
[docs] Start ues CLAUDE rules.
2025-12-15 00:20:54 +08:00
Zijie Tian
a37f07943c
[docs] Update the CLAUDE.md.
2025-12-15 00:13:27 +08:00
Zijie Tian
761929390e
[bench] Added vllm vs nano-vllm bench.
2025-12-10 00:44:57 +08:00