docs: reorganize documentation files
- Move notes.md to docs/development_notes.md - Move Xattention_analysis.md to docs/xattention_analysis.md - Delete DEBUG_SUMMARY.md (no longer needed) - Update CLAUDE.md with documentation index entries Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -61,6 +61,8 @@ PYTHONPATH=/home/zijie/Code/nano-vllm:$PYTHONPATH python tests/test_needle.py
|
||||
| [`docs/gpu_only_performance_issue.md`](docs/gpu_only_performance_issue.md) | GPU-only mode slower than offload due to PagedAttention scatter overhead, optimization proposals |
|
||||
| [`docs/offload_accuracy_issue.md`](docs/offload_accuracy_issue.md) | **BUG**: CPU offload mode 66% accuracy vs 100% non-offload on RULER NIAH benchmark |
|
||||
| [`docs/64k_memory_analysis.md`](docs/64k_memory_analysis.md) | 64k inference memory analysis: GPU-only vs offload, OOM root cause (fragmentation), RTX 3090 limitations |
|
||||
| [`docs/xattention_analysis.md`](docs/xattention_analysis.md) | XAttention algorithm analysis: chunked estimation, block sparse attention, integration design |
|
||||
| [`docs/development_notes.md`](docs/development_notes.md) | Development notes and scratchpad for ongoing work |
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user