brightray.ai

Tri Dao· Chief Scientist· Together AI · Princeton University3h

FlashAttention-3 doubles attention throughput on H100 with FP8.

Albert Gu· Assistant Professor· Carnegie Mellon University · Cartesia4h

Editor's pick: an early SSM result promoted ahead of the automated bar.

Tri Dao· Chief Scientist· Together AI · Princeton University5h

Hardware-aware SSM scans approach roofline memory bandwidth on modern GPUs.

Albert Gu· Assistant Professor· Carnegie Mellon University · Cartesia6h

Mamba-2 reframes state-space models as a form of attention via the SSD framework.

Horace He· Engineer· Meta · PyTorch7h

Compiler-fused attention kernels close the eager/graph performance gap.

Tim Dettmers· Research Scientist· Allen Institute for AI · University of Washington8h

4-bit NF4 quantization fine-tunes a 65B model on a single 48GB GPU with no quality loss.

Horace He· Engineer· Meta · PyTorch9h

torch.compile fuses eager PyTorch into optimized kernels with one decorator.

succeeded