Zijie Tian
|
bf4c63c7ec
|
[docs] Added Sparse Attn.
|
2025-12-29 19:56:54 +08:00 |
|
Zijie Tian
|
82ed34fc2d
|
[opt] optimize nanovllm performance compareable with vllm.
|
2025-12-25 03:47:07 +08:00 |
|
Zijie Tian
|
16fcf8350b
|
[WIP] replace merge attention with triton kernel.
|
2025-12-25 01:07:05 +08:00 |
|
Zijie Tian
|
6ec1b23982
|
[WIP] NEED to modify communication.
|
2025-12-24 21:57:51 +08:00 |
|
Zijie Tian
|
782437c486
|
[WIP] remove num_prefetch_blocks varible.
|
2025-12-24 18:22:26 +08:00 |
|
Zijie Tian
|
1907b625b6
|
[refactor] Remove legacy mode path.
|
2025-12-22 20:17:56 +08:00 |
|
Zijie Tian
|
08d83185ce
|
[fix] fix bench*.py.
|
2025-12-22 19:53:50 +08:00 |
|
Zijie Tian
|
8df0c7517b
|
[docs] refactor CLAUDE.md.
|
2025-12-15 21:43:33 +08:00 |
|
Zijie Tian
|
b8b6478506
|
[feat] Need to optimized with async prefetch.
|
2025-12-15 06:58:40 +08:00 |
|
Zijie Tian
|
1081ab51ea
|
[refactor] Refactor offload code to multi-chunk.
|
2025-12-15 01:13:58 +08:00 |
|
Zijie Tian
|
5949537faf
|
[docs] Start ues CLAUDE rules.
|
2025-12-15 00:20:54 +08:00 |
|
Zijie Tian
|
a37f07943c
|
[docs] Update the CLAUDE.md.
|
2025-12-15 00:13:27 +08:00 |
|
Zijie Tian
|
761929390e
|
[bench] Added vllm vs nano-vllm bench.
|
2025-12-10 00:44:57 +08:00 |
|