docs: flash-attn usage and install (#1706)
* docs: flash-attn usage and install Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix link Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
parent
96c54dba91
commit
be42b03f9b
35
docs/faq/index.md
vendored
35
docs/faq/index.md
vendored
@ -194,3 +194,38 @@ This is a collection of FAQ collected from the user questions on <https://github
|
|||||||
Also see [docling#725](https://github.com/docling-project/docling/issues/725).
|
Also see [docling#725](https://github.com/docling-project/docling/issues/725).
|
||||||
|
|
||||||
Source: Issue [docling-core#119](https://github.com/docling-project/docling-core/issues/119)
|
Source: Issue [docling-core#119](https://github.com/docling-project/docling-core/issues/119)
|
||||||
|
|
||||||
|
|
||||||
|
??? question "How to use flash attention?"
|
||||||
|
|
||||||
|
### How to use flash attention?
|
||||||
|
|
||||||
|
When running models in Docling on CUDA devices, you can enable the usage of the Flash Attention2 library.
|
||||||
|
|
||||||
|
Using environment variables:
|
||||||
|
|
||||||
|
```
|
||||||
|
DOCLING_CUDA_USE_FLASH_ATTENTION2=1
|
||||||
|
```
|
||||||
|
|
||||||
|
Using code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from docling.datamodel.accelerator_options import (
|
||||||
|
AcceleratorOptions,
|
||||||
|
)
|
||||||
|
|
||||||
|
pipeline_options = VlmPipelineOptions(
|
||||||
|
accelerator_options=AcceleratorOptions(cuda_use_flash_attention2=True)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
This requires having the [flash-attn](https://pypi.org/project/flash-attn/) package installed. Below are two alternative ways for installing it:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# Building from sources (required the CUDA dev environment)
|
||||||
|
pip install flash-attn
|
||||||
|
|
||||||
|
# Using pre-built wheels (not available in all possible setups)
|
||||||
|
FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn
|
||||||
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user