Installing flash-attn on an RTX 5080
约 183 字小于 1 分钟
LLM
2025-11-24
First, I have to thank MinChoi0129 for the information shared in this issue.
As a beginner trying to build the flash-attn package for the first time, I ran into the same failures as many other people. If your Python version, PyTorch version, and CUDA version are not chosen carefully, the install falls back to build-from-source mode. That mode not only consumes a huge amount of memory (~40 GB) and a lot of time (30 minutes to 1 hour), but also often ends in a failed build.
So here is one version combination that worked for me.
Hardware:
- GPU: RTX 5080
System:
- WSL 2 (Ubuntu 22.04.5)
Software:
- Python = 3.10
- CUDA = 12.8
- torch = 2.7.1
(pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128)
Flash Attention version:
flash-attn = 2.8.3
Installation command:
pip install flash-attn torch==2.7.1 --no-build-isolationWith this combination, memory usage was almost unchanged and the installation succeeded in about one minute.
