diff --git a/.claude/commands/commit.md b/.claude/commands/commit.md
new file mode 100644
index 0000000..a1a7fb6
--- /dev/null
+++ b/.claude/commands/commit.md
@@ -0,0 +1,166 @@
+---
+allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*)
+argument-hint: [message] | --no-verify | --amend
+description: Create well-formatted commits with conventional commit format and emoji
+---
+
+# Smart Git Commit
+
+Create well-formatted commit: $ARGUMENTS
+
+## Current Repository State
+
+- Git status: !`git status --porcelain`
+- Current branch: !`git branch --show-current`
+- Staged changes: !`git diff --cached --stat`
+- Unstaged changes: !`git diff --stat`
+- Recent commits: !`git log --oneline -5`
+
+## What This Command Does
+
+1. Unless specified with `--no-verify`, automatically runs pre-commit checks:
+   - `pnpm lint` to ensure code quality
+   - `pnpm build` to verify the build succeeds
+   - `pnpm generate:docs` to update documentation
+2. Checks which files are staged with `git status`
+3. If 0 files are staged, automatically adds all modified and new files with `git add`
+4. Performs a `git diff` to understand what changes are being committed
+5. Analyzes the diff to determine if multiple distinct logical changes are present
+6. If multiple distinct changes are detected, suggests breaking the commit into multiple smaller commits
+7. For each commit (or the single commit if not split), creates a commit message using emoji conventional commit format
+
+## Best Practices for Commits
+
+- **Verify before committing**: Ensure code is linted, builds correctly, and documentation is updated
+- **Atomic commits**: Each commit should contain related changes that serve a single purpose
+- **Split large changes**: If changes touch multiple concerns, split them into separate commits
+- **Conventional commit format**: Use the format `<type>: <description>` where type is one of:
+  - `feat`: A new feature
+  - `fix`: A bug fix
+  - `docs`: Documentation changes
+  - `style`: Code style changes (formatting, etc)
+  - `refactor`: Code changes that neither fix bugs nor add features
+  - `perf`: Performance improvements
+  - `test`: Adding or fixing tests
+  - `chore`: Changes to the build process, tools, etc.
+- **Present tense, imperative mood**: Write commit messages as commands (e.g., "add feature" not "added feature")
+- **Concise first line**: Keep the first line under 72 characters
+- **Emoji**: Each commit type is paired with an appropriate emoji:
+  - ✨ `feat`: New feature
+  - 🐛 `fix`: Bug fix
+  - 📝 `docs`: Documentation
+  - 💄 `style`: Formatting/style
+  - ♻️ `refactor`: Code refactoring
+  - ⚡️ `perf`: Performance improvements
+  - ✅ `test`: Tests
+  - 🔧 `chore`: Tooling, configuration
+  - 🚀 `ci`: CI/CD improvements
+  - 🗑️ `revert`: Reverting changes
+  - 🧪 `test`: Add a failing test
+  - 🚨 `fix`: Fix compiler/linter warnings
+  - 🔒️ `fix`: Fix security issues
+  - 👥 `chore`: Add or update contributors
+  - 🚚 `refactor`: Move or rename resources
+  - 🏗️ `refactor`: Make architectural changes
+  - 🔀 `chore`: Merge branches
+  - 📦️ `chore`: Add or update compiled files or packages
+  - ➕ `chore`: Add a dependency
+  - ➖ `chore`: Remove a dependency
+  - 🌱 `chore`: Add or update seed files
+  - 🧑‍💻 `chore`: Improve developer experience
+  - 🧵 `feat`: Add or update code related to multithreading or concurrency
+  - 🔍️ `feat`: Improve SEO
+  - 🏷️ `feat`: Add or update types
+  - 💬 `feat`: Add or update text and literals
+  - 🌐 `feat`: Internationalization and localization
+  - 👔 `feat`: Add or update business logic
+  - 📱 `feat`: Work on responsive design
+  - 🚸 `feat`: Improve user experience / usability
+  - 🩹 `fix`: Simple fix for a non-critical issue
+  - 🥅 `fix`: Catch errors
+  - 👽️ `fix`: Update code due to external API changes
+  - 🔥 `fix`: Remove code or files
+  - 🎨 `style`: Improve structure/format of the code
+  - 🚑️ `fix`: Critical hotfix
+  - 🎉 `chore`: Begin a project
+  - 🔖 `chore`: Release/Version tags
+  - 🚧 `wip`: Work in progress
+  - 💚 `fix`: Fix CI build
+  - 📌 `chore`: Pin dependencies to specific versions
+  - 👷 `ci`: Add or update CI build system
+  - 📈 `feat`: Add or update analytics or tracking code
+  - ✏️ `fix`: Fix typos
+  - ⏪️ `revert`: Revert changes
+  - 📄 `chore`: Add or update license
+  - 💥 `feat`: Introduce breaking changes
+  - 🍱 `assets`: Add or update assets
+  - ♿️ `feat`: Improve accessibility
+  - 💡 `docs`: Add or update comments in source code
+  - 🗃️ `db`: Perform database related changes
+  - 🔊 `feat`: Add or update logs
+  - 🔇 `fix`: Remove logs
+  - 🤡 `test`: Mock things
+  - 🥚 `feat`: Add or update an easter egg
+  - 🙈 `chore`: Add or update .gitignore file
+  - 📸 `test`: Add or update snapshots
+  - ⚗️ `experiment`: Perform experiments
+  - 🚩 `feat`: Add, update, or remove feature flags
+  - 💫 `ui`: Add or update animations and transitions
+  - ⚰️ `refactor`: Remove dead code
+  - 🦺 `feat`: Add or update code related to validation
+  - ✈️ `feat`: Improve offline support
+
+## Guidelines for Splitting Commits
+
+When analyzing the diff, consider splitting commits based on these criteria:
+
+1. **Different concerns**: Changes to unrelated parts of the codebase
+2. **Different types of changes**: Mixing features, fixes, refactoring, etc.
+3. **File patterns**: Changes to different types of files (e.g., source code vs documentation)
+4. **Logical grouping**: Changes that would be easier to understand or review separately
+5. **Size**: Very large changes that would be clearer if broken down
+
+## Examples
+
+Good commit messages:
+- ✨ feat: add user authentication system
+- 🐛 fix: resolve memory leak in rendering process
+- 📝 docs: update API documentation with new endpoints
+- ♻️ refactor: simplify error handling logic in parser
+- 🚨 fix: resolve linter warnings in component files
+- 🧑‍💻 chore: improve developer tooling setup process
+- 👔 feat: implement business logic for transaction validation
+- 🩹 fix: address minor styling inconsistency in header
+- 🚑️ fix: patch critical security vulnerability in auth flow
+- 🎨 style: reorganize component structure for better readability
+- 🔥 fix: remove deprecated legacy code
+- 🦺 feat: add input validation for user registration form
+- 💚 fix: resolve failing CI pipeline tests
+- 📈 feat: implement analytics tracking for user engagement
+- 🔒️ fix: strengthen authentication password requirements
+- ♿️ feat: improve form accessibility for screen readers
+
+Example of splitting commits:
+- First commit: ✨ feat: add new solc version type definitions
+- Second commit: 📝 docs: update documentation for new solc versions
+- Third commit: 🔧 chore: update package.json dependencies
+- Fourth commit: 🏷️ feat: add type definitions for new API endpoints
+- Fifth commit: 🧵 feat: improve concurrency handling in worker threads
+- Sixth commit: 🚨 fix: resolve linting issues in new code
+- Seventh commit: ✅ test: add unit tests for new solc version features
+- Eighth commit: 🔒️ fix: update dependencies with security vulnerabilities
+
+## Command Options
+
+- `--no-verify`: Skip running the pre-commit checks (lint, build, generate:docs)
+
+## Important Notes
+
+- By default, pre-commit checks (`pnpm lint`, `pnpm build`, `pnpm generate:docs`) will run to ensure code quality
+- If these checks fail, you'll be asked if you want to proceed with the commit anyway or fix the issues first
+- If specific files are already staged, the command will only commit those files
+- If no files are staged, it will automatically stage all modified and new files
+- The commit message will be constructed based on the changes detected
+- Before committing, the command will review the diff to identify if multiple commits would be more appropriate
+- If suggesting multiple commits, it will help you stage and commit the changes separately
+- Always reviews the commit diff to ensure the message matches the changes
\ No newline at end of file
diff --git a/.claude/commands/create-architecture-documentation.md b/.claude/commands/create-architecture-documentation.md
new file mode 100644
index 0000000..0c5aebf
--- /dev/null
+++ b/.claude/commands/create-architecture-documentation.md
@@ -0,0 +1,94 @@
+---
+allowed-tools: Read, Write, Edit, Bash
+argument-hint: "[framework] | --c4-model | --arc42 | --adr | --plantuml | --full-suite"
+description: Generate comprehensive architecture documentation with diagrams, ADRs, and interactive visualization
+---
+
+# Architecture Documentation Generator
+
+Generate comprehensive architecture documentation: $ARGUMENTS
+
+## Current Architecture Context
+
+- Project structure: !`find . -type f -name "*.json" -o -name "*.yaml" -o -name "*.toml" | head -5`
+- Documentation exists: @docs/ or @README.md (if exists)
+- Architecture files: !`find . -name "*architecture*" -o -name "*design*" -o -name "*.puml" | head -3`
+- Services/containers: @docker-compose.yml or @k8s/ (if exists)
+- API definitions: !`find . -name "*api*" -o -name "*openapi*" -o -name "*swagger*" | head -3`
+
+## Task
+
+Generate comprehensive architecture documentation with modern tooling and best practices:
+
+1. **Architecture Analysis and Discovery**
+   - Analyze current system architecture and component relationships
+   - Identify key architectural patterns and design decisions
+   - Document system boundaries, interfaces, and dependencies
+   - Assess data flow and communication patterns
+   - Identify architectural debt and improvement opportunities
+
+2. **Architecture Documentation Framework**
+   - Choose appropriate documentation framework and tools:
+     - **C4 Model**: Context, Containers, Components, Code diagrams
+     - **Arc42**: Comprehensive architecture documentation template
+     - **Architecture Decision Records (ADRs)**: Decision documentation
+     - **PlantUML/Mermaid**: Diagram-as-code documentation
+     - **Structurizr**: C4 model tooling and visualization
+     - **Draw.io/Lucidchart**: Visual diagramming tools
+
+3. **System Context Documentation**
+   - Create high-level system context diagrams
+   - Document external systems and integrations
+   - Define system boundaries and responsibilities
+   - Document user personas and stakeholders
+   - Create system landscape and ecosystem overview
+
+4. **Container and Service Architecture**
+   - Document container/service architecture and deployment view
+   - Create service dependency maps and communication patterns
+   - Document deployment architecture and infrastructure
+   - Define service boundaries and API contracts
+   - Document data persistence and storage architecture
+
+5. **Component and Module Documentation**
+   - Create detailed component architecture diagrams
+   - Document internal module structure and relationships
+   - Define component responsibilities and interfaces
+   - Document design patterns and architectural styles
+   - Create code organization and package structure documentation
+
+6. **Data Architecture Documentation**
+   - Document data models and database schemas
+   - Create data flow diagrams and processing pipelines
+   - Document data storage strategies and technologies
+   - Define data governance and lifecycle management
+   - Create data integration and synchronization documentation
+
+7. **Security and Compliance Architecture**
+   - Document security architecture and threat model
+   - Create authentication and authorization flow diagrams
+   - Document compliance requirements and controls
+   - Define security boundaries and trust zones
+   - Create incident response and security monitoring documentation
+
+8. **Quality Attributes and Cross-Cutting Concerns**
+   - Document performance characteristics and scalability patterns
+   - Create reliability and availability architecture documentation
+   - Document monitoring and observability architecture
+   - Define maintainability and evolution strategies
+   - Create disaster recovery and business continuity documentation
+
+9. **Architecture Decision Records (ADRs)**
+   - Create comprehensive ADR template and process
+   - Document historical architectural decisions and rationale
+   - Create decision tracking and review process
+   - Document trade-offs and alternatives considered
+   - Set up ADR maintenance and evolution procedures
+
+10. **Documentation Automation and Maintenance**
+    - Set up automated diagram generation from code annotations
+    - Configure documentation pipeline and publishing automation
+    - Set up documentation validation and consistency checking
+    - Create documentation review and approval process
+    - Train team on architecture documentation practices and tools
+    - Set up documentation versioning and change management
\ No newline at end of file
diff --git a/.claude/commands/ultra-think.md b/.claude/commands/ultra-think.md
new file mode 100644
index 0000000..da21c0e
--- /dev/null
+++ b/.claude/commands/ultra-think.md
@@ -0,0 +1,158 @@
+---
+description: Deep analysis and problem solving with multi-dimensional thinking
+argument-hint: [problem or question to analyze]
+---
+
+# Deep Analysis and Problem Solving Mode
+
+Deep analysis and problem solving mode
+
+## Instructions
+
+1. **Initialize Ultra Think Mode**
+   - Acknowledge the request for enhanced analytical thinking
+   - Set context for deep, systematic reasoning
+   - Prepare to explore the problem space comprehensively
+
+2. **Parse the Problem or Question**
+   - Extract the core challenge from: $ARGUMENTS
+   - Identify all stakeholders and constraints
+   - Recognize implicit requirements and hidden complexities
+   - Question assumptions and surface unknowns
+
+3. **Multi-Dimensional Analysis**
+   Approach the problem from multiple angles:
+   
+   ### Technical Perspective
+   - Analyze technical feasibility and constraints
+   - Consider scalability, performance, and maintainability
+   - Evaluate security implications
+   - Assess technical debt and future-proofing
+   
+   ### Business Perspective
+   - Understand business value and ROI
+   - Consider time-to-market pressures
+   - Evaluate competitive advantages
+   - Assess risk vs. reward trade-offs
+   
+   ### User Perspective
+   - Analyze user needs and pain points
+   - Consider usability and accessibility
+   - Evaluate user experience implications
+   - Think about edge cases and user journeys
+   
+   ### System Perspective
+   - Consider system-wide impacts
+   - Analyze integration points
+   - Evaluate dependencies and coupling
+   - Think about emergent behaviors
+
+4. **Generate Multiple Solutions**
+   - Brainstorm at least 3-5 different approaches
+   - For each approach, consider:
+     - Pros and cons
+     - Implementation complexity
+     - Resource requirements
+     - Potential risks
+     - Long-term implications
+   - Include both conventional and creative solutions
+   - Consider hybrid approaches
+
+5. **Deep Dive Analysis**
+   For the most promising solutions:
+   - Create detailed implementation plans
+   - Identify potential pitfalls and mitigation strategies
+   - Consider phased approaches and MVPs
+   - Analyze second and third-order effects
+   - Think through failure modes and recovery
+
+6. **Cross-Domain Thinking**
+   - Draw parallels from other industries or domains
+   - Apply design patterns from different contexts
+   - Consider biological or natural system analogies
+   - Look for innovative combinations of existing solutions
+
+7. **Challenge and Refine**
+   - Play devil's advocate with each solution
+   - Identify weaknesses and blind spots
+   - Consider "what if" scenarios
+   - Stress-test assumptions
+   - Look for unintended consequences
+
+8. **Synthesize Insights**
+   - Combine insights from all perspectives
+   - Identify key decision factors
+   - Highlight critical trade-offs
+   - Summarize innovative discoveries
+   - Present a nuanced view of the problem space
+
+9. **Provide Structured Recommendations**
+   Present findings in a clear structure:
+   ```
+   ## Problem Analysis
+   - Core challenge
+   - Key constraints
+   - Critical success factors
+   
+   ## Solution Options
+   ### Option 1: [Name]
+   - Description
+   - Pros/Cons
+   - Implementation approach
+   - Risk assessment
+   
+   ### Option 2: [Name]
+   [Similar structure]
+   
+   ## Recommendation
+   - Recommended approach
+   - Rationale
+   - Implementation roadmap
+   - Success metrics
+   - Risk mitigation plan
+   
+   ## Alternative Perspectives
+   - Contrarian view
+   - Future considerations
+   - Areas for further research
+   ```
+
+10. **Meta-Analysis**
+    - Reflect on the thinking process itself
+    - Identify areas of uncertainty
+    - Acknowledge biases or limitations
+    - Suggest additional expertise needed
+    - Provide confidence levels for recommendations
+
+## Usage Examples
+
+```bash
+# Architectural decision
+/ultra-think Should we migrate to microservices or improve our monolith?
+
+# Complex problem solving
+/ultra-think How do we scale our system to handle 10x traffic while reducing costs?
+
+# Strategic planning
+/ultra-think What technology stack should we choose for our next-gen platform?
+
+# Design challenge
+/ultra-think How can we improve our API to be more developer-friendly while maintaining backward compatibility?
+```
+
+## Key Principles
+
+- **First Principles Thinking**: Break down to fundamental truths
+- **Systems Thinking**: Consider interconnections and feedback loops
+- **Probabilistic Thinking**: Work with uncertainties and ranges
+- **Inversion**: Consider what to avoid, not just what to do
+- **Second-Order Thinking**: Consider consequences of consequences
+
+## Output Expectations
+
+- Comprehensive analysis (typically 2-4 pages of insights)
+- Multiple viable solutions with trade-offs
+- Clear reasoning chains
+- Acknowledgment of uncertainties
+- Actionable recommendations
+- Novel insights or perspectives
\ No newline at end of file
diff --git a/.claude/rules/commands.md b/.claude/rules/commands.md
index 026ec79..64c861c 100644
--- a/.claude/rules/commands.md
+++ b/.claude/rules/commands.md
@@ -1,20 +1,16 @@
 # Commands
 
-## Installation
+## Running (with PYTHONPATH)
 
-```bash
-pip install -e .
-```
-
-## Running
+For multi-instance development, use PYTHONPATH instead of pip install:
 
 ```bash
 # Run example
-python example.py
+PYTHONPATH=/path/to/nano-vllm:$PYTHONPATH python example.py
 
 # Run benchmarks
-python bench.py                    # Standard benchmark
-python bench_offload.py            # CPU offload benchmark
+PYTHONPATH=/path/to/nano-vllm:$PYTHONPATH python bench.py
+PYTHONPATH=/path/to/nano-vllm:$PYTHONPATH python bench_offload.py
 ```
 
 ## Config Defaults
diff --git a/.claude/rules/doc-management.md b/.claude/rules/doc-management.md
new file mode 100644
index 0000000..dd84110
--- /dev/null
+++ b/.claude/rules/doc-management.md
@@ -0,0 +1,105 @@
+# Documentation Management
+
+## CLAUDE.md Content Policy
+
+**CLAUDE.md should only contain operational requirements:**
+- Environment setup (PYTHONPATH, GPU mutex)
+- Execution requirements (how to run tests/benchmarks)
+- Quick configuration reference
+- Documentation index (links to detailed docs)
+
+**Technical details should go to docs/:**
+- Architecture and design explanations
+- Implementation details and code flows
+- Debugging techniques
+- Memory analysis and profiling
+- Algorithm explanations
+
+## When Adding New Technical Content
+
+Follow this workflow:
+
+### Step 1: Analyze and Document
+
+If doing technical analysis (e.g., memory profiling):
+1. Calculate theoretical values using formulas
+2. Run actual tests to measure real values
+3. Compare theoretical vs actual (expect < 10% error for valid models)
+4. Document findings with both theory and empirical validation
+
+### Step 2: Create/Update docs/
+
+Create a new doc or update existing one in `docs/`:
+```
+docs/
+├── architecture_guide.md      # Core components, design, flows
+├── sparse_attention_guide.md  # Sparse attention methods
+├── layerwise_offload_memory_analysis.md  # Memory analysis
+├── debugging_guide.md         # Debugging techniques
+└── <new_topic>_guide.md       # New technical topic
+```
+
+### Step 3: Update CLAUDE.md Documentation Index
+
+Add entry to the Documentation Index table:
+```markdown
+| Document | Purpose |
+|----------|---------|
+| [`docs/new_doc.md`](docs/new_doc.md) | Brief description |
+```
+
+### Step 4: Refactor if Needed
+
+If CLAUDE.md grows too large (> 150 lines), refactor:
+1. Identify technical details that can be moved
+2. Create appropriate doc in docs/
+3. Replace detailed content with reference link
+4. Keep only operational essentials in CLAUDE.md
+
+## Documentation Structure Template
+
+For new technical docs:
+
+```markdown
+# Topic Guide
+
+Brief overview of what this document covers.
+
+## Section 1: Concepts
+- Key concepts and terminology
+
+## Section 2: Implementation
+- Code locations
+- Key methods/functions
+
+## Section 3: Details
+- Detailed explanations
+- Code examples
+
+## Section 4: Validation (if applicable)
+- Theoretical analysis
+- Empirical measurements
+- Comparison table
+```
+
+## Memory Analysis Template
+
+When documenting memory behavior:
+
+```markdown
+## Theoretical Calculation
+
+| Component | Formula | Size |
+|-----------|---------|------|
+| Buffer X | `param1 × param2 × dtype_size` | X MB |
+
+## Empirical Validation
+
+| Metric | Theoretical | Actual | Error |
+|--------|-------------|--------|-------|
+| Peak memory | X GB | Y GB | Z% |
+
+## Key Findings
+1. Finding 1
+2. Finding 2
+```
diff --git a/.claude/rules/no-extra-docs.md b/.claude/rules/no-extra-docs.md
index 87a806b..165f949 100644
--- a/.claude/rules/no-extra-docs.md
+++ b/.claude/rules/no-extra-docs.md
@@ -2,39 +2,47 @@
 
 ## Do Not Create Unnecessary Documentation
 
-**IMPORTANT**: Do NOT create extra markdown documentation files unless explicitly requested by the user.
+**IMPORTANT**: Do NOT create extra markdown documentation files proactively unless:
+1. User explicitly requests documentation
+2. Refactoring CLAUDE.md to move technical details to docs/ (see `doc-management.md`)
 
 ### What NOT to do:
 
-- ❌ Do NOT create README files proactively
-- ❌ Do NOT create analysis documents (*.md) after completing tasks
-- ❌ Do NOT create tutorial/guide documents
-- ❌ Do NOT create summary documents
+- Do NOT create README files proactively
+- Do NOT create standalone analysis documents after completing tasks
+- Do NOT create summary documents without request
 
 ### What TO do:
 
-- ✅ Only create documentation when user explicitly asks for it
-- ✅ Provide information directly in conversation instead
-- ✅ Update existing documentation if changes require it
-- ✅ Add inline code comments where necessary
+- Provide information directly in conversation by default
+- When user requests documentation, follow `doc-management.md` workflow
+- Update existing docs in `docs/` when code changes affect them
+- Keep CLAUDE.md concise (< 150 lines), move technical details to docs/
 
-### Exceptions:
+### Documentation Locations:
 
-Documentation is acceptable ONLY when:
-1. User explicitly requests "create a README" or "write documentation"
-2. Updating existing documentation to reflect code changes
-3. Adding inline comments/docstrings to code itself
+| Type | Location |
+|------|----------|
+| Operational requirements | CLAUDE.md |
+| Technical details | docs/*.md |
+| Code comments | Inline in source |
 
 ### Examples:
 
-**Bad** (Don't do this):
+**Proactive docs (Don't do)**:
 ```
 User: "Profile the code"
-Assistant: [Creates profiling_results.md after profiling]
+Assistant: [Creates profiling_results.md without being asked]
 ```
 
-**Good** (Do this instead):
+**On-request docs (Do this)**:
 ```
-User: "Profile the code"
-Assistant: [Runs profiling, shows results in conversation]
+User: "Profile the code and document the findings"
+Assistant: [Runs profiling, creates/updates docs/memory_analysis.md]
+```
+
+**Refactoring (Do this)**:
+```
+User: "CLAUDE.md is too long, refactor it"
+Assistant: [Moves technical sections to docs/, updates CLAUDE.md index]
 ```