Maincoder-1B is a code-focused language model optimized for code generation and completion tasks. The model achieves strong performance on coding benchmarks while maintaining a compact size suitable for local deployment.
Key Features
- Code Generation: Optimized for Python code completion and generation tasks.
- Compact Size: 1 billion parameters, lightweight enough to run on consumer hardware.
- Deep Architecture: Modern transformer architecture with RoPE embeddings, grouped-query attention, QK normalization and high depth-to-width ratio.
- Advanced Data Mixing: Pre-trained and mid-trained on custom data mixes developed for high-performance coding.
- MCPO Algorithm: Fine-tuned with specialised reinforcement learning policy optimisation algorithm to improve training stability and accelerate convergence.
- SOTA Performance: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+.
Benchmark Results
| Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K |
|---|---|---|---|---|---|
| Maincode/Maincoder-1B | 0.7622 | 0.7256 | 0.7090 | 0.3054 | 0.2976 |
| deepseek-ai/deepseek-coder-1.3b-instruct | 0.5610 | 0.5305 | 0.6217 | 0.2705 | 0.0413 |
| HuggingFaceTB/SmolLM3-3B | 0.5366 | 0.5000 | 0.6799 | 0.5928 | 0.5505 |
| Qwen/Qwen2.5-Coder-1.5B-Instruct | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 |
| Qwen/Qwen3-1.7B | 0.4024 | 0.3780 | 0.5582 | 0.5571 | 0.6865 |
Model Overview
Maincoder uses a modern transformer decoder architecture with:
- Rotary Position Embeddings: With theta of 1,000,000.
- RMSNorm: Pre-normalization for stable training.
- Grouped Query Attention: 4:1 ratio of query to key-value heads.
- QK Normalization: RMSNorm applied to attention queries and keys.
- SwiGLU MLP: Gated linear units with SiLU activation.
| Attribute | Value |
|---|---|
| Parameters | 1B |
| Hidden Size | 1536 |
| Layers | 32 |
| Attention Heads | 16 (4 KV heads) |
| Head Dimension | 96 |
| Vocabulary Size | 151,936 |
| Context Length | 2,048 |
| Precision | bfloat16 |
Usage
Installation
pip install transformers torch
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Maincode/Maincoder-1B",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"Maincode/Maincoder-1B",
trust_remote_code=True,
)
# Code completion example
prompt = '''def fibonacci(n: int) -> int:
"""Return the n-th Fibonacci number."""
'''
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.2,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Code Completion
# Function completion
prompt = '''def quicksort(arr: list) -> list:
"""Sort a list using the quicksort algorithm."""
'''
# Class completion
prompt = '''class BinarySearchTree:
"""A binary search tree implementation."""
def __init__(self):
'''
# Algorithm implementation
prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple:
"""Find the shortest path using Dijkstra's algorithm.
Args:
graph: Adjacency list representation of the graph
start: Starting node
end: Target node
Returns:
Tuple of (distance, path)
"""
'''
Additional Notes
Reproducibility
Model evaluations were run on 8 AMD MI355X GPUs via the EleutherAI framework.
docker run --rm -it \
--device=/dev/kfd --device=/dev/dri --group-add=video \
--ipc=host --security-opt seccomp=unconfined \
-v $(pwd):/workspace -w /workspace \
-e HF_TOKEN \
-e PYTHONHASHSEED=0 \
-e TORCH_DETERMINISTIC=1 \
-e ROCBLAS_ATOMICS_MODE="0" \
-e MIOPEN_FIND_MODE="1" \
-e CUBLAS_WORKSPACE_CONFIG=":4096:8" \
-e HF_ALLOW_CODE_EVAL="1" \
rocm/pytorch:rocm7.1.1_ubuntu24.04_py3.12_pytorch_release_2.9.1 \
bash -c 'pip install "lm_eval[hf]" && \
accelerate launch -m lm_eval \
--model hf --model_args "pretrained=Maincode/Maincoder-1B,trust_remote_code=True,dtype=float32" \
--tasks humaneval,humaneval_plus,mbpp_plus,mmlu,gsm8k \
--device cuda:0 --batch_size 32 --seed 42 \
--confirm_run_unsafe_code'
Limitations
- Context length limited to 2,048 tokens
- Primarily optimized for Python, performance may vary on other languages
- May generate code with bugs or security issues - always review generated code
Disclaimer: This model has not undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case.
License
This model is released under the Apache 2.0 License.
Citation
@misc{maincoder2025,
title = {Maincoder-1B: A High-Performance 1B Parameter Coding Model},
author = {Maincode Team},
year = {2025},
organization = {Maincode},
howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}}
}
Contact
For questions, issues, or collaboration inquiries, please visit Maincode.
- Downloads last month
- 296
Model tree for Maincode/Maincoder-1B
Unable to build the model tree, the base model loops to the model itself. Learn more.