Issue with Flash attention while running Janus-Pro-1B model locally on Mac (Solved)

#8
by saurabhksa1 - opened

I was facing the issue on my mac (metal, M2) while running the sample code given by Deepskee on generation_inference.py. I was constantly getting error

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

I tried many ways including passing variables, setting environments but of no use. finally the issue resolved by removing the following parameter

"_attn_implementation": "flash_attention_2"

from config.json and it worked.

Just posting here in case someone else faces this issue and may able to save some time in troubleshooting

Flash attention is not supported on mac. It is meant for NVIDIA GPUs, that's why it works without that line.

You can wrap it around code like this so that it works in both environments.

                # Check for flash attention availability
                try:
                    import flash_attn
                    has_flash_attn = True
                    logger.info("Flash Attention 2 is available")
                    model_kwargs["attn_implementation"] = "flash_attention_2"
                except ImportError:
                    has_flash_attn = False
                    logger.info("Flash Attention 2 is not installed - falling back to default attention")

Sign up or log in to comment