AEGIS
AI Operations Platform
A comprehensive infrastructure system integrating custom ROCm kernels,
Mixture-of-Experts routing, web search RAG, and execution hooks.
Features a FastAPI backend with React frontend for command and chat interfaces.
- Custom HIP/CUDA kernels for GPU-accelerated ops
- Dynamic MoE expert loading with memory budgeting
- Multi-tier caching (L1/L2/L3)
- Cross-platform network diagnostics
PyTorch
FastAPI
React
ROCm
Pensive
Hierarchical Context Retrieval
A sophisticated two-tier context management system with L1 hot cache,
L2 vector retrieval via FAISS, and L3 persistent archive. Handles 1M+
token contexts with sub-350ms latency.
- Hybrid BM25 + dense vector fusion
- Async dependency chain orchestration
- KV summarization & deduplication
- Multi-hop logic chaining
FAISS
sentence-transformers
BM25
mud-puppy
ROCm-First LLM Fine-tuning
A lightweight fine-tuning framework optimized for AMD GPUs. Supports
LoRA, QLoRA, DPO, GRPO, and GPTQ quantization without bitsandbytes dependency.
- Full, LoRA, and QLoRA fine-tuning
- DPO/IPO/KTO/ORPO preference tuning
- Custom ROCm kernels (qgemm, fbgemm)
- Memory-efficient streaming & offloading
ROCm
TRL
HuggingFace
GPTQ