Discover SpecBranch, a novel hybrid speculative decoding method that improves large language model inference speed by up to 4.5× with rollback-aware branch...
ECHO boosts large language model inference with elastic speculative decoding and sparse gating, achieving up to 5.35x speedup in high-concurrency scenarios...