Discover Watt Counts, the largest energy-aware benchmark for sustainable LLM inference across heterogeneous GPU architectures, reducing energy use up to 70...
Discover CSAttention, a novel sparse attention method boosting LLM inference speed by 4.6x while maintaining accuracy with high sparsity and long contexts.