MiniMaxAI
/

MiniMax-M1-40k

Text Generation

Model card Files Files and versions

realolipop commited on 9 days ago

Commit

2d768f9

·

verified ·

1 Parent(s): 759c8f4

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -153,6 +153,31 @@ foundation for next-generation language model agents to reason and tackle real-w
 \* conducted on the text-only HLE subset.
 ## 3. Deployment Guide
 Download the model from HuggingFace repository:

 \* conducted on the text-only HLE subset.
+### SWE-bench methodology
+We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are:
+"astropy__astropy-7606",
+"astropy__astropy-8707",
+"astropy__astropy-8872",
+"django__django-10097",
+"matplotlib__matplotlib-20488",
+"psf__requests-2317",
+"psf__requests-2931",
+"psf__requests-5414",
+"pylint-dev__pylint-6528",
+"pylint-dev__pylint-7277",
+"sphinx-doc__sphinx-10435",
+"sphinx-doc__sphinx-7985",
+"sphinx-doc__sphinx-8269",
+"sphinx-doc__sphinx-8475"
+### TAU-bench methodology
+We evaluate TAU-Bench with the average passrate of 5 samples for each query, with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 30.
+We prepend a general principle to the policy prompt.
+#### General
+- In each round, you need to carefully examine the tools provided to you to determine if any can be used.
+- You must adhere to all of the policies. Pay attention to the details in the terms. Solutions for most situations can be found within these policies.
 ## 3. Deployment Guide
 Download the model from HuggingFace repository: