diff --git a/agent-os/interfaces/whatsapp/introduction.mdx b/agent-os/interfaces/whatsapp/introduction.mdx index 099075680..4b0d7047d 100644 --- a/agent-os/interfaces/whatsapp/introduction.mdx +++ b/agent-os/interfaces/whatsapp/introduction.mdx @@ -1,59 +1,65 @@ --- title: WhatsApp -description: Host agents as WhatsApp applications +description: Host agents as WhatsApp applications. --- -Use the WhatsApp interface to serve Agents or Teams via WhatsApp. It mounts webhook routes on a FastAPI app and sends responses back to WhatsApp users and threads. +Use the WhatsApp interface to serve Agents, Teams, or Workflows on WhatsApp. It mounts webhook routes on a FastAPI app and sends responses back to WhatsApp users. ## Setup -Set up your WhatsApp Business API and configure the webhook URL to point to your AgentOS instance. +Follow the WhatsApp setup guide in the [deploy overview](/deploy/interfaces/whatsapp/overview). Required environment variables: -- `WHATSAPP_ACCESS_TOKEN` -- `WHATSAPP_PHONE_NUMBER_ID` -- `WHATSAPP_VERIFY_TOKEN` +- `WHATSAPP_ACCESS_TOKEN` (from Meta App Dashboard) +- `WHATSAPP_PHONE_NUMBER_ID` (from WhatsApp API Setup) +- `WHATSAPP_VERIFY_TOKEN` (a string you create for webhook verification) - Optional (production): `WHATSAPP_APP_SECRET` and `APP_ENV=production` The user's phone number is automatically used as the `user_id` for runs. This ensures that sessions and memory are appropriately scoped to the user. -The phone number is also used for the `session_id`, so a single WhatsApp conversation corresponds to a single session. This should be considered when managing session history. +The phone number is also used for the `session_id` (format: `wa:{phone_number}`), so a single WhatsApp conversation corresponds to a single session. ## Example Usage Create an agent, expose it with the `Whatsapp` interface, and serve via `AgentOS`: -```python +```python basic.py from agno.agent import Agent -from agno.models.openai import OpenAIResponses +from agno.db.sqlite import SqliteDb +from agno.models.openai import OpenAIChat from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp -image_agent = Agent( - model=OpenAIResponses(id="gpt-5.2"), # Ensure OPENAI_API_KEY is set - tools=[OpenAITools(image_model="gpt-image-1")], - markdown=True, +agent_db = SqliteDb(db_file="tmp/persistent_memory.db") + +basic_agent = Agent( + name="Basic Agent", + model=OpenAIChat(id="gpt-4o"), + db=agent_db, add_history_to_context=True, + num_history_runs=3, + add_datetime_to_context=True, + markdown=True, ) agent_os = AgentOS( - agents=[image_agent], - interfaces=[Whatsapp(agent=image_agent)], + agents=[basic_agent], + interfaces=[Whatsapp(agent=basic_agent)], ) app = agent_os.get_app() if __name__ == "__main__": - agent_os.serve(app="basic:app", port=8000, reload=True) + agent_os.serve(app="basic:app", port=7777, reload=True) ``` See the [WhatsApp Examples](/agent-os/usage/interfaces/whatsapp/basic) for more usage patterns. ## Core Components -- `Whatsapp` (interface): Wraps an Agno `Agent` or `Team` for WhatsApp via FastAPI. +- `Whatsapp` (interface): Wraps an Agno `Agent`, `Team`, or `Workflow` for WhatsApp via FastAPI. - `AgentOS.serve`: Serves the FastAPI app using Uvicorn. ## `Whatsapp` Interface @@ -62,18 +68,21 @@ Main entry point for Agno WhatsApp applications. ### Initialization Parameters -| Parameter | Type | Default | Description | -| --------- | ----------------- | ------- | ---------------------- | -| `agent` | `Optional[Agent]` | `None` | Agno `Agent` instance. | -| `team` | `Optional[Team]` | `None` | Agno `Team` instance. | +| Parameter | Type | Default | Description | +| --- | --- | --- | --- | +| `agent` | `Optional[Agent]` | `None` | Agno `Agent` instance. | +| `team` | `Optional[Team]` | `None` | Agno `Team` instance. | +| `workflow` | `Optional[Workflow]` | `None` | Agno `Workflow` instance. | +| `prefix` | `str` | `"/whatsapp"` | Custom FastAPI route prefix for the WhatsApp interface. | +| `tags` | `Optional[List[str]]` | `None` | FastAPI route tags for API documentation. Defaults to `["Whatsapp"]` if not provided. | -Provide `agent` or `team`. +Provide `agent`, `team`, or `workflow`. ### Key Method -| Method | Parameters | Return Type | Description | -| ------------ | ------------------------ | ----------- | -------------------------------------------------- | -| `get_router` | `use_async: bool = True` | `APIRouter` | Returns the FastAPI router and attaches endpoints. | +| Method | Parameters | Return Type | Description | +| --- | --- | --- | --- | +| `get_router` | None | `APIRouter` | Returns the FastAPI router and attaches endpoints. | ## Endpoints @@ -81,17 +90,55 @@ Mounted under the `/whatsapp` prefix: ### `GET /whatsapp/status` -- Health/status of the interface. +- Health/status check for the interface. +- Returns `{"status": "available"}`. ### `GET /whatsapp/webhook` -- Verifies WhatsApp webhook (`hub.challenge`). -- Returns `hub.challenge` on success; `403` on token mismatch; `500` if `WHATSAPP_VERIFY_TOKEN` missing. +- Handles WhatsApp webhook verification (`hub.challenge`). +- Returns `hub.challenge` on success; `403` on token mismatch; `500` if `WHATSAPP_VERIFY_TOKEN` is not set. ### `POST /whatsapp/webhook` - Receives WhatsApp messages and events. -- Validates signature (`X-Hub-Signature-256`); bypassed in development mode. -- Processes text, image, video, audio, and document messages via the agent/team. -- Sends replies (splits long messages; uploads and sends generated images). +- Validates the `X-Hub-Signature-256` header; bypassed when `APP_ENV=development`. +- Processes text, image, video, audio, and document messages via the agent/team/workflow. +- Sends replies back to the user (splits long messages at WhatsApp's 4096 character limit; uploads and sends generated images). - Responses: `200 {"status": "processing"}` or `{"status": "ignored"}`, `403` invalid signature, `500` errors. + +## Session Management + +Sessions are scoped by phone number: + +- **Session ID format**: `wa:{phone_number}` +- The user's phone number is used as both the `user_id` and the base for the `session_id` +- Each WhatsApp conversation maps to a single session + +This means all messages from the same phone number share the same conversation context. + +## Media Support + +**Inbound** (user sends to bot): images, video, audio, and documents. Media is downloaded via the WhatsApp Business API and passed to the agent as image, video, audio, or file inputs. + +**Outbound** (agent sends to user): generated images are uploaded to the WhatsApp Media API and sent as native WhatsApp image messages. Text responses are split into batches if they exceed the 4096 character limit. + +## Reasoning Support + +When the agent produces reasoning content (e.g., from ReasoningTools or ThinkingTools), it is sent as a separate italicized message before the main response. + +## Testing the Integration + +1. Run the app locally: `python .py` (ensure ngrok is running) +2. In your Meta App's WhatsApp Setup, configure the webhook URL to `https://YOUR-NGROK-URL/whatsapp/webhook` +3. Subscribe to the `messages` webhook field +4. Send a message from your test phone number to the WhatsApp Business number + +## Troubleshooting + +| Symptom | Cause | Fix | +| --- | --- | --- | +| 403 errors on webhook | Invalid signature in production mode | Set `APP_ENV=development` for local testing, or set `WHATSAPP_APP_SECRET` with your Meta App Secret | +| Webhook verification fails | Token mismatch or app not running | Ensure `WHATSAPP_VERIFY_TOKEN` matches and the app is running before clicking "Verify and save" | +| No response from the bot | Missing access token or phone number ID | Verify `WHATSAPP_ACCESS_TOKEN` and `WHATSAPP_PHONE_NUMBER_ID` are set correctly | +| Images not sending | Upload failure | Check `WHATSAPP_ACCESS_TOKEN` has permission to upload media | +| `500` on webhook verify | `WHATSAPP_VERIFY_TOKEN` not set | Export `WHATSAPP_VERIFY_TOKEN` before running | diff --git a/agent-os/usage/interfaces/whatsapp/agent-with-media.mdx b/agent-os/usage/interfaces/whatsapp/agent-with-media.mdx index 8b6ac8929..90b7fa748 100644 --- a/agent-os/usage/interfaces/whatsapp/agent-with-media.mdx +++ b/agent-os/usage/interfaces/whatsapp/agent-with-media.mdx @@ -1,6 +1,6 @@ --- title: "WhatsApp Agent with Media Support" -description: "WhatsApp agent that analyzes images, videos, and audio using multimodal AI" +description: "WhatsApp agent that analyzes images, videos, audio, and documents using multimodal AI" --- ## Code @@ -9,13 +9,14 @@ description: "WhatsApp agent that analyzes images, videos, and audio using multi from agno.agent import Agent from agno.db.sqlite import SqliteDb from agno.models.google import Gemini -from agno.os.app import AgentOS +from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp agent_db = SqliteDb(db_file="tmp/persistent_memory.db") + media_agent = Agent( name="Media Agent", - model=Gemini(id="gemini-2.0-flash"), + model=Gemini(id="gemini-3-flash-preview"), db=agent_db, add_history_to_context=True, num_history_runs=3, @@ -42,7 +43,6 @@ if __name__ == "__main__": ```bash export WHATSAPP_ACCESS_TOKEN=your_whatsapp_access_token export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id - export WHATSAPP_WEBHOOK_URL=your_webhook_url export WHATSAPP_VERIFY_TOKEN=your_verify_token export GOOGLE_API_KEY=your_google_api_key export APP_ENV=development @@ -64,9 +64,8 @@ if __name__ == "__main__": ## Key Features -- **Multimodal AI**: Gemini 2.0 Flash for image, video, and audio processing +- **Multimodal Analysis**: Gemini for image, video, audio, and document processing - **Image Analysis**: Object recognition, scene understanding, text extraction - **Video Processing**: Content analysis and summarization - **Audio Support**: Voice message transcription and response -- **Context Integration**: Combines media analysis with conversation history - +- **Conversation History**: Combines media analysis with context from last 3 interactions diff --git a/agent-os/usage/interfaces/whatsapp/agent-with-user-memory.mdx b/agent-os/usage/interfaces/whatsapp/agent-with-user-memory.mdx index 41be3996b..ac7e7ef4f 100644 --- a/agent-os/usage/interfaces/whatsapp/agent-with-user-memory.mdx +++ b/agent-os/usage/interfaces/whatsapp/agent-with-user-memory.mdx @@ -7,13 +7,14 @@ description: "Personalized WhatsApp agent that remembers user information and pr ```python cookbook/os/interfaces/whatsapp/agent_with_user_memory.py from textwrap import dedent + from agno.agent import Agent from agno.db.sqlite import SqliteDb from agno.memory.manager import MemoryManager from agno.models.google import Gemini -from agno.os.app import AgentOS +from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp -from agno.tools.hackernews import HackerNewsTools +from agno.tools.websearch import WebSearchTools agent_db = SqliteDb(db_file="tmp/persistent_memory.db") @@ -30,7 +31,7 @@ memory_manager = MemoryManager( personal_agent = Agent( name="Basic Agent", model=Gemini(id="gemini-2.0-flash"), - tools=[HackerNewsTools()], + tools=[WebSearchTools()], add_history_to_context=True, num_history_runs=3, add_datetime_to_context=True, @@ -40,10 +41,9 @@ personal_agent = Agent( enable_agentic_memory=True, instructions=dedent(""" You are a personal AI friend of the user, your purpose is to chat with the user about things and make them feel good. - First introduce yourself and ask for their name then, ask about themeselves, their hobbies, what they like to do and what they like to talk about. - Use the HackerNews tools to find latest information about things in the conversations - """), - debug_mode=True, + First introduce yourself and ask for their name then, ask about themselves, their hobbies, what they like to do and what they like to talk about. + Use web search to find latest information about things in the conversations + """), ) agent_os = AgentOS( @@ -65,8 +65,8 @@ if __name__ == "__main__": ```bash export WHATSAPP_ACCESS_TOKEN=your_whatsapp_access_token export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id - export WHATSAPP_WEBHOOK_URL=your_webhook_url export WHATSAPP_VERIFY_TOKEN=your_verify_token + export GOOGLE_API_KEY=your_google_api_key export APP_ENV=development ``` @@ -86,9 +86,7 @@ if __name__ == "__main__": ## Key Features -- **Memory Management**: Remembers user names, hobbies, preferences, and activities -- **HackerNews**: Access to current information during conversations -- **Personalized Responses**: Uses stored memories for contextualized replies -- **Friendly AI**: Acts as personal AI friend with engaging conversation -- **Gemini Powered**: Fast, intelligent responses with multimodal capabilities - +- **Agentic Memory**: MemoryManager captures user preferences, hobbies, and personal details +- **Cross-Session Recall**: Remembers user information across conversations +- **Web Search**: Uses WebSearchTools to find up-to-date information during conversations +- **Persistent Storage**: SQLite database for both sessions and memory diff --git a/agent-os/usage/interfaces/whatsapp/basic.mdx b/agent-os/usage/interfaces/whatsapp/basic.mdx index 7a3e4f3e9..d08132eb3 100644 --- a/agent-os/usage/interfaces/whatsapp/basic.mdx +++ b/agent-os/usage/interfaces/whatsapp/basic.mdx @@ -8,14 +8,15 @@ description: "Create a basic AI agent that integrates with WhatsApp Business API ```python cookbook/os/interfaces/whatsapp/basic.py from agno.agent import Agent from agno.db.sqlite import SqliteDb -from agno.models.openai import OpenAIResponses +from agno.models.openai import OpenAIChat from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp agent_db = SqliteDb(db_file="tmp/persistent_memory.db") + basic_agent = Agent( name="Basic Agent", - model=OpenAIResponses(id="gpt-5.2"), + model=OpenAIChat(id="gpt-4o"), db=agent_db, add_history_to_context=True, num_history_runs=3, @@ -42,7 +43,6 @@ if __name__ == "__main__": ```bash export WHATSAPP_ACCESS_TOKEN=your_whatsapp_access_token export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id - export WHATSAPP_WEBHOOK_URL=your_webhook_url export WHATSAPP_VERIFY_TOKEN=your_verify_token export OPENAI_API_KEY=your_openai_api_key export APP_ENV=development @@ -64,9 +64,8 @@ if __name__ == "__main__": ## Key Features -- **WhatsApp Integration**: Responds to messages automatically +- **WhatsApp Integration**: Responds to messages automatically - **Conversation History**: Maintains context with last 3 interactions - **Persistent Memory**: SQLite database for session storage - **DateTime Context**: Time-aware responses - **Markdown Support**: Rich text formatting in messages - diff --git a/agent-os/usage/interfaces/whatsapp/image-generation-model.mdx b/agent-os/usage/interfaces/whatsapp/image-generation-model.mdx index e867e07a8..f701b3318 100644 --- a/agent-os/usage/interfaces/whatsapp/image-generation-model.mdx +++ b/agent-os/usage/interfaces/whatsapp/image-generation-model.mdx @@ -1,6 +1,6 @@ --- title: "WhatsApp Image Generation Agent (Model-based)" -description: "WhatsApp agent that generates images using Gemini's built-in capabilities" +description: "WhatsApp agent that generates images using Gemini's built-in image generation" --- ## Code @@ -9,18 +9,18 @@ description: "WhatsApp agent that generates images using Gemini's built-in capab from agno.agent import Agent from agno.db.sqlite import SqliteDb from agno.models.google import Gemini -from agno.os.app import AgentOS +from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp agent_db = SqliteDb(db_file="tmp/persistent_memory.db") + image_agent = Agent( id="image_generation_model", db=agent_db, model=Gemini( - id="gemini-2.0-flash-exp-image-generation", + id="models/gemini-2.5-flash-image", response_modalities=["Text", "Image"], ), - debug_mode=True, ) agent_os = AgentOS( @@ -42,7 +42,6 @@ if __name__ == "__main__": ```bash export WHATSAPP_ACCESS_TOKEN=your_whatsapp_access_token export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id - export WHATSAPP_WEBHOOK_URL=your_webhook_url export WHATSAPP_VERIFY_TOKEN=your_verify_token export GOOGLE_API_KEY=your_google_api_key export APP_ENV=development @@ -64,9 +63,7 @@ if __name__ == "__main__": ## Key Features -- **Direct Image Generation**: Gemini 2.0 Flash experimental image generation -- **Text-to-Image**: Converts descriptions into visual content -- **Multimodal Responses**: Generates both text and images -- **WhatsApp Integration**: Sends images directly through WhatsApp -- **Debug Mode**: Enhanced logging for troubleshooting - +- **Native Image Generation**: Gemini 2.5 Flash with built-in image generation +- **Multimodal Responses**: Generates both text and images in a single response +- **No External Tools**: Image generation is handled directly by the model +- **WhatsApp Delivery**: Images uploaded and sent as native WhatsApp photos diff --git a/agent-os/usage/interfaces/whatsapp/image-generation-tools.mdx b/agent-os/usage/interfaces/whatsapp/image-generation-tools.mdx index 75659f898..b58573ff0 100644 --- a/agent-os/usage/interfaces/whatsapp/image-generation-tools.mdx +++ b/agent-os/usage/interfaces/whatsapp/image-generation-tools.mdx @@ -8,19 +8,19 @@ description: "WhatsApp agent that generates images using OpenAI's image generati ```python cookbook/os/interfaces/whatsapp/image_generation_tools.py from agno.agent import Agent from agno.db.sqlite import SqliteDb -from agno.models.openai import OpenAIResponses -from agno.os.app import AgentOS +from agno.models.openai import OpenAIChat +from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp from agno.tools.openai import OpenAITools agent_db = SqliteDb(db_file="tmp/persistent_memory.db") + image_agent = Agent( id="image_generation_tools", db=agent_db, - model=OpenAIResponses(id="gpt-5.2"), + model=OpenAIChat(id="gpt-4o"), tools=[OpenAITools(image_model="gpt-image-1")], markdown=True, - debug_mode=True, add_history_to_context=True, ) @@ -43,7 +43,6 @@ if __name__ == "__main__": ```bash export WHATSAPP_ACCESS_TOKEN=your_whatsapp_access_token export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id - export WHATSAPP_WEBHOOK_URL=your_webhook_url export WHATSAPP_VERIFY_TOKEN=your_verify_token export OPENAI_API_KEY=your_openai_api_key export APP_ENV=development @@ -66,8 +65,6 @@ if __name__ == "__main__": ## Key Features - **Tool-based Generation**: OpenAI's GPT Image-1 model via external tools -- **High-Quality Images**: Professional-grade image generation -- **Conversational Interface**: Natural language interaction for image requests +- **High-Quality Images**: Professional-grade image generation sent as WhatsApp photos +- **Conversational Interface**: Natural language interaction for image requests - **History Context**: Remembers previous images and conversations -- **GPT-4o Orchestration**: Intelligent conversation and tool management - diff --git a/agent-os/usage/interfaces/whatsapp/reasoning-agent.mdx b/agent-os/usage/interfaces/whatsapp/reasoning-agent.mdx index 167e11e34..c22232e21 100644 --- a/agent-os/usage/interfaces/whatsapp/reasoning-agent.mdx +++ b/agent-os/usage/interfaces/whatsapp/reasoning-agent.mdx @@ -1,6 +1,6 @@ --- title: "WhatsApp Reasoning Finance Agent" -description: "WhatsApp agent with advanced reasoning and financial analysis capabilities" +description: "WhatsApp agent with chain-of-thought reasoning and financial analysis capabilities" --- ## Code @@ -9,9 +9,9 @@ description: "WhatsApp agent with advanced reasoning and financial analysis capa from agno.agent import Agent from agno.db.sqlite import SqliteDb from agno.models.anthropic.claude import Claude -from agno.os.app import AgentOS +from agno.os import AgentOS from agno.os.interfaces.whatsapp import Whatsapp -from agno.tools.thinking import ThinkingTools +from agno.tools.reasoning import ReasoningTools from agno.tools.yfinance import YFinanceTools agent_db = SqliteDb(db_file="tmp/persistent_memory.db") @@ -21,13 +21,8 @@ reasoning_finance_agent = Agent( model=Claude(id="claude-3-7-sonnet-latest"), db=agent_db, tools=[ - ThinkingTools(add_instructions=True), - YFinanceTools( - stock_price=True, - analyst_recommendations=True, - company_info=True, - company_news=True, - ), + ReasoningTools(add_instructions=True), + YFinanceTools(), ], instructions="Use tables to display data. When you use thinking tools, keep the thinking brief.", add_datetime_to_context=True, @@ -53,7 +48,6 @@ if __name__ == "__main__": ```bash export WHATSAPP_ACCESS_TOKEN=your_whatsapp_access_token export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id - export WHATSAPP_WEBHOOK_URL=your_webhook_url export WHATSAPP_VERIFY_TOKEN=your_verify_token export ANTHROPIC_API_KEY=your_anthropic_api_key export APP_ENV=development @@ -75,9 +69,7 @@ if __name__ == "__main__": ## Key Features -- **Advanced Reasoning**: ThinkingTools for step-by-step financial analysis -- **Real-time Data**: Stock prices, analyst recommendations, company news -- **Claude Powered**: Superior analytical and reasoning capabilities +- **Chain-of-Thought Reasoning**: ReasoningTools for step-by-step financial analysis +- **Real-time Data**: Stock prices, analyst recommendations, company news via YFinanceTools +- **Claude Powered**: Analytical and reasoning capabilities - **Structured Output**: Well-formatted tables and financial insights -- **Market Intelligence**: Comprehensive company analysis and recommendations -