Skip to content

Commit 48e42c7

Browse files
author
Tajudeen
committed
Update vision capability detection to include GPT-5 series, o-series models, and Mistral Pixtral
- Added GPT-5 series (gpt-5.1, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro) to vision detection - Added o-series reasoning models (o1, o3, o4-mini) to vision detection - Added Mistral Pixtral models (pixtral-large, pixtral-12b) to vision detection - Updated vision detection in modelRouter.ts and chatThreadService.ts - Updated routingSmokeTests.ts to include new vision models in test validation - Updated documentation to reflect expanded vision model support
1 parent b942c6a commit 48e42c7

File tree

4 files changed

+47
-9
lines changed

4 files changed

+47
-9
lines changed

docs/CortexIDE-vs-Other-AI-Editors.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,7 @@ This comparison is based on:
340340
**CortexIDE**: ✅ **Yes** - Extensive model support verified in `modelCapabilities.ts`:
341341
- **15+ providers**: OpenAI, Anthropic, xAI, Gemini, DeepSeek, Groq, Mistral, OpenRouter, Ollama, vLLM, LM Studio, OpenAI-compatible, LiteLLM, Google Vertex, Microsoft Azure, AWS Bedrock
342342
- **Reasoning models**: o1, o3, Claude 3.7/4, DeepSeek R1, QwQ, Qwen3, Phi-4
343-
- **Vision models**: GPT-4o, Claude 3.5/4, Gemini, local VLMs
343+
- **Vision models**: GPT-4o, GPT-4.1, GPT-5 series, o-series (o1, o3, o4-mini), Claude 3.5/4, Gemini (all models), Pixtral, local VLMs
344344
- **FIM models**: Codestral, Qwen2.5-coder, StarCoder2
345345

346346
**Cursor**: ✅ **Yes** - Wide model support.
@@ -361,6 +361,8 @@ This comparison is based on:
361361
- Vision-capable model detection (verified in `modelRouter.ts:1400-1417`)
362362
- Image QA registry (verified in `imageQARegistryContribution.ts`)
363363
- Multimodal message handling (verified in `convertToLLMMessageService.ts`)
364+
- Supports image uploads for: GPT-4o, GPT-4.1, GPT-5 series, o-series, Claude 3.5/4, Gemini (all), Pixtral, local VLMs
365+
- PDF upload support with text extraction and vision-based processing
364366

365367
**Cursor**: ✅ **Yes** - Vision model support.
366368

@@ -500,10 +502,11 @@ For a detailed list of models supported by CortexIDE, see the [Supported Models
500502

501503
CortexIDE supports 15+ providers with 100+ models, including:
502504
- Reasoning models (o1, o3, Claude 3.7/4, DeepSeek R1, QwQ, Qwen3, Phi-4)
503-
- Vision models (GPT-4o, Claude 3.5/4, Gemini, local VLMs)
505+
- Vision models (GPT-4o, GPT-4.1, GPT-5 series, o-series, Claude 3.5/4, Gemini, Pixtral, local VLMs)
504506
- FIM models (Codestral, Qwen2.5-coder, StarCoder2)
505507
- Local models (Ollama, vLLM, LM Studio)
506508

509+
507510
## Conclusion
508511

509512
CortexIDE stands out as the **only fully open-source AI code editor** with:
@@ -526,3 +529,5 @@ While other tools excel in specific areas (Cursor's polish, Continue.dev's VS Co
526529

527530
If you find any inaccuracies, please [open an issue](https://github.com/cortexide/cortexide/issues/new) with corrections and sources.
528531

532+
533+

src/vs/workbench/contrib/cortexide/browser/chatThreadService.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -818,7 +818,20 @@ class ChatThreadService extends Disposable implements IChatThreadService {
818818
return name.includes('3.5') || name.includes('3.7') || name.includes('4') || name.includes('opus') || name.includes('sonnet');
819819
}
820820
if (provider === 'openai') {
821-
return name.includes('4o') || name.includes('4.1') || name.includes('gpt-4');
821+
// GPT-5 series (all variants support vision)
822+
if (name.includes('gpt-5') || name.includes('gpt-5.1')) return true;
823+
// GPT-4.1 series
824+
if (name.includes('4.1')) return true;
825+
// GPT-4o series
826+
if (name.includes('4o')) return true;
827+
// o-series reasoning models (o1, o3, o4-mini support vision)
828+
if (name.startsWith('o1') || name.startsWith('o3') || name.startsWith('o4')) return true;
829+
// Legacy GPT-4 models
830+
if (name.includes('gpt-4')) return true;
831+
}
832+
if (provider === 'mistral') {
833+
// Pixtral models support vision
834+
if (name.includes('pixtral')) return true;
822835
}
823836
if (provider === 'ollama' || provider === 'vllm') {
824837
return name.includes('llava') || name.includes('bakllava') || name.includes('vision');

src/vs/workbench/contrib/cortexide/common/modelRouter.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1407,7 +1407,20 @@ export class TaskAwareModelRouter extends Disposable implements ITaskAwareModelR
14071407
return name.includes('3.5') || name.includes('3.7') || name.includes('4') || name.includes('opus') || name.includes('sonnet');
14081408
}
14091409
if (provider === 'openai') {
1410-
return name.includes('4o') || name.includes('4.1') || name.includes('gpt-4');
1410+
// GPT-5 series (all variants support vision)
1411+
if (name.includes('gpt-5') || name.includes('gpt-5.1')) return true;
1412+
// GPT-4.1 series
1413+
if (name.includes('4.1')) return true;
1414+
// GPT-4o series
1415+
if (name.includes('4o')) return true;
1416+
// o-series reasoning models (o1, o3, o4-mini support vision)
1417+
if (name.startsWith('o1') || name.startsWith('o3') || name.startsWith('o4')) return true;
1418+
// Legacy GPT-4 models
1419+
if (name.includes('gpt-4')) return true;
1420+
}
1421+
if (provider === 'mistral') {
1422+
// Pixtral models support vision
1423+
if (name.includes('pixtral')) return true;
14111424
}
14121425
if (provider === 'ollama' || provider === 'vllm') {
14131426
return name.includes('llava') || name.includes('bakllava') || name.includes('vision');

src/vs/workbench/contrib/cortexide/common/routingSmokeTests.ts

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -174,11 +174,18 @@ async function testImageScreenshot(router: ITaskAwareModelRouter): Promise<Smoke
174174

175175
try {
176176
const decision = await router.route(context);
177-
const isVisionModel = decision.modelSelection.providerName.toLowerCase() === 'gemini' ||
178-
decision.modelSelection.modelName.toLowerCase().includes('4o') ||
179-
decision.modelSelection.modelName.toLowerCase().includes('4.1') ||
180-
decision.modelSelection.modelName.toLowerCase().includes('claude') ||
181-
decision.modelSelection.modelName.toLowerCase().includes('llava');
177+
const modelName = decision.modelSelection.modelName.toLowerCase();
178+
const provider = decision.modelSelection.providerName.toLowerCase();
179+
const isVisionModel = provider === 'gemini' ||
180+
modelName.includes('gpt-5') ||
181+
modelName.includes('4.1') ||
182+
modelName.includes('4o') ||
183+
modelName.startsWith('o1') ||
184+
modelName.startsWith('o3') ||
185+
modelName.startsWith('o4') ||
186+
modelName.includes('claude') ||
187+
modelName.includes('pixtral') ||
188+
modelName.includes('llava');
182189

183190
return {
184191
name: 'Image screenshot',

0 commit comments

Comments
 (0)