Skip to content

Reasoning is not working for the Gemini model when using langchain_google_vertexai.ChatVertexAI #34267

@Arpit-Pandey-MachineLearning-ak

Description

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-cli
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-perplexity
  • langchain-prompty
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Example Code (Python)

class GeminiChatModelConfig(ModelConfigBase):
    """
    Configuration for Google Gemini models using LangChain.
    """

    def __init__(
        self,
        buffer_token: int,
        answer_token: int,
        memory_token: int,
        max_token_limit: int,
        model_id: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
        streaming: bool = True,
    ):
        model = self._initialize_model(model_id, model_kwargs, streaming)
        super().__init__(
            buffer_token=buffer_token,
            answer_token=answer_token,
            memory_token=memory_token,
            max_token_limit=max_token_limit,
            model=model,
            model_id=model_id,
        )

    def _initialize_model(
        self,
        model_id: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
        streaming: bool = True,
    ):
        if model_kwargs is None:
            model_kwargs = {}


        thinking_budget = 0
        include_thoughts = False
        if thinking_budget is not None:
            generation_config = model_kwargs.get("generation_config", {})
            
            thinking_config = {"thinkingBudget": thinking_budget}
            if include_thoughts is not None:
                thinking_config["includeThoughts"] = bool(include_thoughts)
            generation_config["thinkingConfig"] = thinking_config
            model_kwargs["generation_config"] = generation_config
        
        
        
        return ChatVertexAI(
            model_name=model_id,
            credentials=creds,
            api_version="v1beta1",
            temperature=model_kwargs.get('temperature',0.3),
            max_output_tokens=model_kwargs.get('max_output_tokens',4096),
            streaming=streaming,
            generation_config={
                "thinkingConfig": {
                    "include_thoughts": True, 
                    "thinking_budget": 0,   
                    "max_reasoning_tokens": 0,
                    "thinkingBudget": 0
                }
            }
        )
        
    @classmethod
    def create_from_model_config(
        cls,
        buffer_token: int,
        answer_token: int,
        max_token_limit: int,
        model_id: str,
        model_kwargs: Optional[Dict[str, Any]] = None,
        memory_token: int = 0,
        streaming: bool = True,
    ) -> "GeminiChatModelConfig":
        return cls(
            buffer_token=buffer_token,
            answer_token=answer_token,
            memory_token=memory_token,
            max_token_limit=max_token_limit,
            model_id=model_id,
            model_kwargs=model_kwargs,
            streaming=streaming
        )

initializer = InitializeModelClass()
aws_ai_models = initializer._initialize_models()

llm = aws_ai_models["gemini_2.5_flash"]


response = llm.invoke("Hello! Can you confirm you're working?")
print("Model response:", response)

print("\n=== Response Metadata ===")
print(response.response_metadata)

try:
    # This location varies slightly by version, but usually hides here:
    raw_response = response.response_metadata.get('raw_response') 
    if not raw_response and response.generation_info:
        raw_response = response.generation_info
    
    print(raw_response)
except Exception as e:
    print(f"Could not print raw response: {e}")

# Check for Usage Metadata (Token Counts)
# Even if the text is missing, the billing counts usually survive
print("\n=== Token Counts ===")
print(response.usage_metadata)

print("\nThinking tokens metadata search:")
meta = response.response_metadata
print("meta", meta)
print("thinking:", meta.get("thinking"))
print("tokenMetadata:", meta.get("tokenMetadata"))

Error Message and Stack Trace (if applicable)

Description

I'm facing an issue where I cannot modify or disable the thinking level for the gemini-2.5-flash model. I tried multiple approaches, including passing thinkingConfig = 0 through model_kwargs, but none of them worked.

In the output i can see their are still some thinking tokens are getting used as total_token = 48

in the output i am getting:

=== Response Metadata ===
{'safety_ratings': [], 'finish_reason': 'STOP', 'avg_logprobs': 0.0}
Could not print raw response: 'AIMessage' object has no attribute 'generation_info'

=== Token Counts ===
{'input_tokens': 10, 'output_tokens': 17, 'total_tokens': 48}

Thinking tokens metadata search:
meta {'safety_ratings': [], 'finish_reason': 'STOP', 'avg_logprobs': 0.0}
thinking: None
tokenMetadata: None

System Info

System Information

OS: Linux
OS Version: #88~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Oct 14 14:03:14 UTC 2
Python Version: 3.10.12 (main, Nov 4 2025, 08:48:33) [GCC 11.4.0]

Package Information

langchain_core: 0.3.76
langchain: 0.3.27
langsmith: 0.4.28
langchain_aws: 0.2.33
langchain_google_genai: 2.0.0
langchain_google_vertexai: 2.0.4
langchain_openai: 0.2.2
langchain_text_splitters: 0.3.11

Optional packages not installed

langserve

Other Dependencies

anthropic[vertexai]: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
beautifulsoup4: 4.13.5
bedrock-agentcore: Installed. No version info available.
boto3: 1.40.32
google-cloud-aiplatform: 1.129.0
google-cloud-storage: 2.19.0
google-generativeai: 0.7.2
httpx: 0.28.1
httpx-sse: 0.4.3
httpx<1,>=0.23.0: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.72: Installed. No version info available.
langchain-core<2.0.0,>=0.3.75: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.9: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langsmith-pyo3>=0.1.0rc2;: Installed. No version info available.
langsmith>=0.1.17: Installed. No version info available.
langsmith>=0.3.45: Installed. No version info available.
numpy: 1.26.4
openai: 1.107.2
openai-agents>=0.0.3;: Installed. No version info available.
opentelemetry-api>=1.30.0;: Installed. No version info available.
opentelemetry-exporter-otlp-proto-http>=1.30.0;: Installed. No version info available.
opentelemetry-sdk>=1.30.0;: Installed. No version info available.
orjson>=3.9.14;: Installed. No version info available.
packaging>=23.2: Installed. No version info available.
pillow: 9.0.1
playwright: Installed. No version info available.
pydantic: 2.11.9
pydantic<3,>=1: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic>=2.7.4: Installed. No version info available.
pytest>=7.0.0;: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests-toolbelt>=1.0.0: Installed. No version info available.
requests<3,>=2: Installed. No version info available.
requests>=2.0.0: Installed. No version info available.
rich>=13.9.4;: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tiktoken: 0.11.0
typing-extensions>=4.7: Installed. No version info available.
vcrpy>=7.0.0;: Installed. No version info available.
zstandard>=0.23.0: Installed. No version info available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing featurecore`langchain-core` package issues & PRslangchain`langchain` package issues & PRs

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions