Phi-3-small-128k-instruct / triton_flash_blocksparse_attn.py

Commit History

Resolve - 196 [rank0]: triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.
794ffcf
verified

moidhassan commited on