Strange CUDA Error for Multi-GPU Setup
#6
by
floschne
- opened
Hi,
First of all thanks for sharing the model and providing such detailed docs!
I'm experiencing a strange CUDA error when running the model on 4 A40 (46GB) GPUs with the device map code you provided.
025-02-08 23:57:22.458 | ERROR | __main__:main:210 - Error during response generation for sample 0: varlen_fwd(): incompatible function arguments. The following argument types are supported: | 0/3012 [00:00<?, ?it/s]
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: Optional[torch.Tensor], arg4: torch.Tensor, arg5: torch.Tensor, arg6: Optional[torch.Tensor], arg7: Optional[torch.Tensor], arg8: Optional[torch.Tensor], arg9: Optional[torch.Tensor], arg10: int, arg11: int, arg12: float, arg13: float, arg14: $
ool, arg15: bool, arg16: int, arg17: int, arg18: float, arg19: bool, arg20: Optional[torch.Generator]) -> list[torch.Tensor]
Invoked with: tensor([[[0., 0., 0., ..., 0., -0., 0.],
[0., 0., 0., ..., 0., -0., 0.],
[0., 0., 0., ..., -0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., -0., ..., 0., 0., 0.],
[0., 0., -0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., -0., 0.],
[0., 0., 0., ..., 0., -0., 0.],
[0., 0., 0., ..., -0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., -0., ..., 0., 0., 0.],
[0., 0., -0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., -0., 0.],
[0., 0., 0., ..., 0., -0., 0.],
[0., 0., 0., ..., -0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., -0., ..., 0., 0., 0.],
[0., 0., -0., ..., 0., 0., 0.]],
...,
[-0.0012, -0.0014, -0.0009, ..., 0.0038, 0.0004, -0.0022]],
[[ 0.0004, -0.0002, -0.0007, ..., -0.0007, -0.0022, 0.0021],
[ 0.0006, -0.0008, 0.0017, ..., -0.0023, -0.0007, 0.0018],
[-0.0007, -0.0018, 0.0015, ..., -0.0010, -0.0012, -0.0011],
...,
[ 0.0023, -0.0028, 0.0023, ..., 0.0049, 0.0030, -0.0028],
[-0.0020, 0.0014, 0.0004, ..., 0.0001, -0.0033, -0.0050],
[-0.0012, -0.0014, -0.0009, ..., 0.0038, 0.0004, -0.0022]]],
device='cuda:1', dtype=torch.bfloat16), None, tensor([1748], device='cuda:1', dtype=torch.int32), tensor([1748], device='cuda:1', dtype=torch.int32), None, None, None, None, 4575867776795852673, 4575867776795852673, 0.0, 0.08838834764831845, False, True, -1, -1, 0.0, False, None
After some googling, I think it's somehow related to flash-attention, but I'm afraid I can't fix it w/o some serious monkey patching... I'm using flash-attn==2.7.3, CUDA==12.1, transformers==4.48.0. Do you know how to fix this?