🏆 #1 on SWE-bench Lite

Vinsoo Agent

Cloud-Based Secure AI Agent for Software Development

The first to achieve SOTA on SWE-bench Lite using a cost-effective model. This 88.67% result reduces complex engineering costs by ~10×, making advanced AI coding accessible worldwide. We are confident our agent strategies can further reduce costs by 1000×.

                        88.67%
                        Pass@1
                    

266/300 Solved

qwen3-max Model

Why This Matters

SWE-bench Lite is a challenging benchmark that evaluates AI systems on real-world GitHub issues from popular open-source Python projects. Each task requires understanding complex codebases, diagnosing bugs, implementing fixes, and passing test suites — mirroring the daily work of professional software engineers.

🥇

First Place

Highest score among all evaluated systems on SWE-bench Lite

💰

Cost-Effective Excellence

Accessible to individuals and enterprises worldwide

⚡

Production-Ready

Driving broader adoption of AI programming across society

Abstract

Vinsoo is an AI coding agent launched by AIYouthLab. Using the cost-effective Qwen3-max model, Vinsoo achieved 88.67% on SWE-bench Lite (266/300 tasks).

Vinsoo independently drives the entire development process — without human intervention. Now in invite-only release, soon available to everyone. Dramatically boosting productivity — enabling AI to autonomously generate income.

About AIYouthLab

AIYouthLab is a young startup team building cloud-based, secure AI agents for software development.

Founder & CEO: Xiaoyue Yin (LinkedIn).

Our mission is to make software creation accessible to everyone by dramatically lowering the cost and friction of building real products.

SWE-bench Lite Leaderboard

Rank	System	Model	Resolved
🥇 1	Vinsoo Agent	Qwen3-max	88.67%
2	ExpeRepair-v1.0	Claude 4 Sonnet	60.33%
3	Refact.ai Agent	Claude 3.7 + o4-mini	60.00%
4	KGCompass	Claude 4 Sonnet	58.33%
5	SWE-agent	Claude 4 Sonnet	56.67%
6	EntroPO + R2E	Qwen3-Coder-30B	49.67%
7	ExpeRepair-v1.0	-	48.33%
8	SWE-agent	Claude 3.7 Sonnet	48.00%
9	DARS Agent	-	47.00%
10	KGCompass	Claude 3.5 Sonnet	46.00%

* Data from SWE-bench official leaderboard as of January 9, 2026.

Agent Workflow

Vinsoo Agent follows a structured, iterative workflow for each SWE-bench task. This serves as high-level guidance — the Agent adapts its strategy based on task context.

Problem Analysis

Parse the issue description. Identify key symptoms, expected behavior, and affected components.

Repository Investigation

Explore the codebase structure. Locate relevant files, understand dependencies, and map the architecture.

Reproduction Script

Create and execute a script to reproduce the bug. Validate that the issue exists before attempting fixes.

Plan & Implement

Analyze codebase, formulate strategy, and apply modifications via modify_script.

Apply & Test

Implement the fix, run tests, and verify that the original issue is resolved without introducing regressions.

Iterate Until Solved

If validation fails, analyze via create_bug_report, return to step 4, and iterate until solved.

The Agent autonomously decides when to skip steps, repeat phases, or invoke additional reasoning. There is no rigid script — only strategic guidance.

System Overview

End-to-End Automation

While some AI coding solutions take a semi-autonomous approach — requiring users to manually invoke tools and guide each step — Vinsoo Agent operates independently from start to finish.

It decouples the complex decision-making process into deterministic engineering tasks, making behavior decomposable and traceable.

Systemic Perception

Traditional AI coding tools only see what's in the context window. Vinsoo transforms "imperceptible" data into structured event streams.

This eliminates information barriers between AI and complex scenarios, enabling full system state reasoning.

Technical Approach

Model Choice: Qwen3-max

Vinsoo uses Qwen3-max — selected for its optimal cost-performance ratio.

Context: Extended via DYCODE/COTER

Ultra-Long Context Engineering

Real-world codebases often exceed millions of tokens. Vinsoo achieves effective context scaling through two proprietary strategies:

DYCODE (Dynamic Code Encoding): Retains only mapping encodings for reconstructible information, quickly restoring via reverse decoder when needed.
COTER: Macroscopically regulates dynamic encoding based on global topological data modeling and entropy distribution prediction.

Automated Validation Pipeline

One of Vinsoo's key innovations is the automated validation pipeline — ensuring code quality through LLM-powered testing.

📝

Test Generation

Auto-generate test cases based on code changes.

→

⚙️

Execute & Record

Hash-based detection for regression tracking.

→

✅

LLM Validate

Beyond simple pass/fail checks.

Runs automatically after each code modification.

Tool Suite

Vinsoo is equipped with specialized tools covering the complete software engineering workflow:

Code Exploration

search_codebase
read_file_content
read_file_lines
get_directory_structure
read_picture_content

Code Editing

modify_script

Execution

run_command
get_session_log

Testing & Validationgenerate_script_test_case_group
add_script_test_case
submit_and_validate_script_testing_records
generate_task_json

                            LLM Auto-Validation
                            Hash-Based Detection
                        

Bug Trackingcreate_bug_report
transfer_bug
resolve_bug

                            Role Verification
                            Lifecycle Logs
                        

Execution Tracingrecord_action_start
record_action_complete
get_execution_logs

                            Full Audit
                            State Tracing
                        

System Infrastructure

Four core components enabling autonomous software development:

📦

Codebase Indexing

Intelligent indexing and semantic search.

21 Languages Vector Search Incremental

🌐

Workflow Orchestration

Task scheduling and state management.

Task Tracking Knowledge Base LLM Validation

✅

Runtime Detection

Monitors execution and validates outcomes.

Action Monitor Test Reports Validation

🖥️

Execution Environment

Isolated sandbox with full audit trails.

Cloud Sandbox Multi-Runtime Session Mgmt

Methodology

Our evaluation follows SWE-bench Lite guidelines strictly:

Input: Directly uses the problem_statement from SWE-bench Lite — no additional hints.
Model: Dashscope qwen3-max
Submission: Pass@1 — single attempt per task.
Compliance: No test knowledge used. No hints. No web browsing. Full autonomous operation.

Evaluation Results

Out of 300 tasks in SWE-bench Lite, Vinsoo Agent solved 266 — achieving an 88.67% success rate.

Metric	Value
Total Tasks	300
Solved Tasks	266
Unsolved Tasks	36
Success Rate	88.67%

Impact for Developers

This isn't just about benchmarks — it's about real-world impact:

🚀

Performance Breakthrough

88.67% solve rate — 28 points ahead of #2.

💰

Cost Breakthrough

SOTA with cost-effective Qwen3-max.

⚡

10x Faster Bug Fixes

Hours of debugging solved in minutes. You just review.

🎯

Focus on Creative Work

Delegate routine work. Focus on architecture and innovation.

🛡️

Reliable, Tested Code

Writes, tests, validates, and iterates. No 'it compiles' handoffs.

Experience Vinsoo Today

The same AI Agent is available in Vinsoo — ready to accelerate your workflow.

Try Vinsoo View on GitHub