nano-vllm

Author	SHA1	Message	Date
Zijie Tian	82ed34fc2d	[opt] optimize nanovllm performance compareable with vllm.	2025-12-25 03:47:07 +08:00
Zijie Tian	16fcf8350b	[WIP] replace merge attention with triton kernel.	2025-12-25 01:07:05 +08:00
Zijie Tian	cf5e7df093	[WIP] Added sgDMA operator for scatter kvcache communication.	2025-12-24 23:48:52 +08:00
Zijie Tian	6ec1b23982	[WIP] NEED to modify communication.	2025-12-24 21:57:51 +08:00
Zijie Tian	782437c486	[WIP] remove num_prefetch_blocks varible.	2025-12-24 18:22:26 +08:00
Zijie Tian	b264de903d	[test] Added a simple test_prefill.py.	2025-12-23 00:26:25 +08:00
Zijie Tian	4dcef16c13	[WIP] NEED refactor nanovllm mechenism.	2025-12-22 23:52:56 +08:00
Zijie Tian	051f2295c9	[feat] Added sparse KVcache feature, NEED VERIFY.	2025-12-22 08:51:02 +08:00
Zijie Tian	1081ab51ea	[refactor] Refactor offload code to multi-chunk.	2025-12-15 01:13:58 +08:00
Zijie Tian	61edb8a344	[feat] Finished offload. Still need optimize performance.	2025-12-12 02:27:40 +08:00
Zijie Tian	babfa17354	[refactor] Translate into english, void Chinese due to claude.	2025-12-11 00:30:24 +08:00
Zijie Tian	190df5f70d	[refactor] Refactor current gpu and cpu block allocation strategy.	2025-12-10 21:23:31 +08:00
Zijie Tian	0a247ccb1b	[feat] Added `num_gpu_blocks` limit gpu blocks.	2025-12-10 20:17:42 +08:00
Zijie Tian	0b6f19242d	[feat] Added chunked prefill and kvcache offload mechenism.	2025-12-10 03:47:37 +08:00