21 lines
547 B
Markdown
21 lines
547 B
Markdown
# Testing
|
|
|
|
## Chunked Attention Test
|
|
|
|
```bash
|
|
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
|
|
# Args: num_gpu_blocks input_len output_len num_prefetch_blocks
|
|
```
|
|
|
|
## CPU Offload Testing
|
|
|
|
```bash
|
|
# Basic test with limited GPU blocks to trigger offload
|
|
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 64 2
|
|
|
|
# Verify consistency (run multiple times, output should be identical)
|
|
for i in 1 2 3; do
|
|
CUDA_VISIBLE_DEVICES=4,5 python tests/test_chunked_attention.py 6 2048 32 2 2>&1 | tail -3
|
|
done
|
|
```
|