Runnable with vLLM| Model | Link |
|---|---|
| IQuest-Coder-V1-40B-Base-Stage1 | 🤗 Hugging Face |
| IQuest-Coder-V1-40B-Base | 🤗 Hugging Face |
| IQuest-Coder-V1-40B-Instruct | 🤗 Hugging Face |
| IQuest-Coder-V1-40B-Loop-Instruct | 🤗 Hugging Face |
Clarification: Regarding the Performance of IQuest-Coder-V1
IQuest-Coder-V1 is a new family of code large language models (LLMs) designed to advance autonomous software engineering and code intelligence. Built on the innovative code-flow multi-stage training paradigm, IQuest-Coder-V1 captures the dynamic evolution of software logic, delivering state-of-the-art performance across critical dimensions:
The IQuest-Coder-V1 series includes models ranging from 7B to 40B parameters, with both standard and Loop variants:
| Model | Parameters | Layers | Hidden Size | Attention Heads (Q/KV) | Context Length |
|---|---|---|---|---|---|
| IQuest-Coder-V1-7B-Instruct | 7B | 14 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-7B-Thinking | 7B | 14 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-14B-Instruct | 14B | 28 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-14B-Thinking | 14B | 28 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Instruct | 40B | 80 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Thinking | 40B | 80 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Loop-Instruct | 40B | 80 (2 iterations) | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Loop-Thinking | 40B | 80 (2 iterations) | 5120 | 40/8 | 128K |
Architecture Features:
For more details, please refer to our Technical Report, GitHub.
IQuest-Coder-V1 uses custom modeling code via Hugging Face's auto_map feature. We recommend using transformers>=4.52.4.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "IQuest/IQuest-Coder-V1-40B-Instruct"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the input
prompt = "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate response
generated_ids = model.generate(
**model_inputs,
max_new_tokens=8192
)
generated_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
response = tokenizer.decode(generated_ids, skip_special_tokens=True)
print(response)
For complex reasoning tasks, use the Thinking variant:
model_name = "IQuest/IQuest-Coder-V1-40B-Thinking"
# The Thinking model includes explicit reasoning traces
# Use similar code as above, but expect longer, more detailed responses
# with step-by-step problem decomposition
For production deployment, you can use vLLM to create an OpenAI-compatible API endpoint. Please refer to the vLLM PR for implementation details.
vllm serve IQuestLab/IQuest-Coder-V1-40B-Instruct --tensor-parallel-size 8
For Thinking models with reasoning support:
vllm serve IQuestLab/IQuest-Coder-V1-40B-Thinking --reasoning-parser qwen3 --tensor-parallel-size 8

If you find our work helpful, please cite:
@article{iquest-coder-v1-2025,
title={IQuest-Coder-V1 Technical Report},
author={IQuest Coder Team},
url={https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf}
year={2025}
}
```bibtex
@article{codescaling,
title={Scaling Laws for Code: Every Programming Language Matters}
, author={Yang, Jian and Guo, Shawn and Jing, Lin and Zhang, Wei and Liu, Aishan and Hao, Chuan and Li, Zhoujun and Zhao, Wayne Xin and Liu, Xianglong and Lv, Weifeng and others}, journal={arXiv preprint arXiv:2512.13472}, year={2025} }
@article{close_the_loop,
title={Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing}
, author={Yuwen Li, Wei Zhang, Zelong Huang, Mason Yang, Jiajun Wu, Shawn Guo, Huahao Hu, Lingyi Sun, Jian Yang, Mingjie Tang, Byran Dai}, journal={arXiv preprint arXiv:2512.23611}, year={2025} }
@article{loopcoder,
title={LoopCoder: Scaling Code Intelligence via Looped Language Models}
, author={Jian Yang, Wei Zhang, Shawn Guo, Yizhi Li, Lin Jing, Zhengmao Ye, Shark Liu, Yuyang Song, Jiajun Wu, Che Liu, T. Zheng, Siwei Wu, L. Liao, X. Ma, Chuan Hao, Ran Tao, Yan Xing, Jianzhou Wang, Mingjie Tang, Aishan Liu, Zhoujun Li, Xianglong Liu, Weifeng Lv1, Bryan Dai}, year={2025} }
@article{swe_compress,
title={Context as a Tool: Context Management for Long-Horizon SWE-Agents}
, author={hukai Liu, Jian Yang, Bo Jiang, Yizhi Li, Jinyang Guo, Xianglong Liu, Bryan Dai}, journal={arXiv preprint arXiv:2512.22087}, year={2025} }