Discover scalable pretraining of large Mixture of Experts language models using the Aurora supercomputer with high GPU efficiency and advanced optimization...
Discover Spectral Compact Training, a memory-efficient method enabling large language model training on consumer hardware with truncated SVD and Stiefel QR...
Discover First Logit Boosting, a training-free method to reduce object hallucination in large vision-language models for improved accuracy and reliability.