Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
__pycache__/
*.pyc
23 changes: 18 additions & 5 deletions SKILL.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
---
name: google-pse-search
description: >
Web search skill powered by Google PSE (Programmable Search Engine) API.
Web search skill powered by Google PSE (Programmable Search Engine) API
with optional Tavily search as a parallel/fallback provider.
Primary search tool for all web queries — Korean and English alike.
Better search quality than DuckDuckGo, especially for Korean content.
Triggers: "search for", "find", "look up", news/current events, recent info,
date-filtered searches, site-specific searches, any general web search request.
Requires environment variables: GOOGLE_PSE_KEY, GOOGLE_CX_ID.
Falls back to duckduckgo-search only on quota exceeded (HTTP 429/403) or API errors.
Optional: TAVILY_API_KEY (enables --provider tavily and --provider auto fallback).
Falls back to Tavily (if configured) or duckduckgo-search on quota exceeded (HTTP 429/403).
---

# Google PSE Search
Expand All @@ -16,8 +18,11 @@ description: >

```
1st: google-pse-search ← this skill (default for all web searches)
--provider google (default) uses Google PSE API
--provider tavily uses Tavily API directly
--provider auto uses Google PSE, falls back to Tavily on 403/429
2nd: web_fetch ← when URL is known (official docs, specific pages)
3rd: duckduckgo-search ← fallback on quota exceeded or API errors only
3rd: duckduckgo-search ← fallback when neither Google PSE nor Tavily is available
```

## Basic Usage
Expand Down Expand Up @@ -46,6 +51,12 @@ python $SKILL_DIR/scripts/search.py "query" --exact "must include" --exclude "ex

# Pagination (page 2)
python $SKILL_DIR/scripts/search.py "query" --start 11

# Use Tavily as search provider (requires TAVILY_API_KEY)
python $SKILL_DIR/scripts/search.py "query" --provider tavily

# Auto mode: Google PSE with Tavily fallback on quota errors
python $SKILL_DIR/scripts/search.py "query" --provider auto
```

## Options
Expand All @@ -55,21 +66,23 @@ python $SKILL_DIR/scripts/search.py "query" --start 11
| `--num N` | 5 | Number of results (1–10) |
| `--lang LANG` | ko | Language code (ko/en/ja/zh) |
| `--gl GL` | auto by lang | Region code override |
| `--date DATE` | — | Date restriction (d7/m1/y1 etc.) |
| `--date DATE` | — | Date restriction (d7/m1/y1 etc.). Tavily maps d1→day, d7/w1→week, m1→month, y1→year; other values are unsupported. |
| `--exact PHRASE` | — | Phrase that must appear in results |
| `--exclude TERM` | — | Term to exclude from results |
| `--site SITE` | — | Restrict to a specific site |
| `--start N` | 1 | Start index for pagination |
| `--raw` | — | Print raw JSON (debug) |
| `--provider` | google | Search provider: `google`, `tavily`, or `auto` |

## Error Handling

| Error | Cause | Action |
|-------|-------|--------|
| Missing env vars | .env not configured | Set GOOGLE_PSE_KEY and GOOGLE_CX_ID |
| HTTP 403/429 | Quota exceeded | Use duckduckgo-search as fallback |
| HTTP 403/429 | Quota exceeded | Use `--provider auto` for Tavily fallback, or duckduckgo-search |
| HTTP 400 | Invalid parameters | Check option values |
| 0 results | Query or filter issue | Adjust query or remove filters |
| Unsupported flag warning | --exact/--exclude/--start used with Tavily | These flags are Google-only; a stderr warning is emitted |

## API Reference

Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
requests>=2.28
tavily-python>=0.5
153 changes: 148 additions & 5 deletions scripts/search.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
Google PSE (Programmable Search Engine) search script.
Google PSE (Programmable Search Engine) search script with optional Tavily provider.

Usage:
python search.py "query"
Expand All @@ -12,10 +12,15 @@
python search.py "query" --site example.com
python search.py "query" --start 11
python search.py "query" --raw
python search.py "query" --provider tavily
python search.py "query" --provider auto

Environment variables required:
GOOGLE_PSE_KEY API key
GOOGLE_CX_ID Programmable Search Engine ID
GOOGLE_PSE_KEY API key (for google provider)
GOOGLE_CX_ID Programmable Search Engine ID (for google provider)

Optional environment variables:
TAVILY_API_KEY Tavily API key (enables tavily/auto provider)
"""

import argparse
Expand All @@ -32,6 +37,117 @@

API_URL = "https://customsearch.googleapis.com/customsearch/v1"


def get_tavily_client():
"""Return a TavilyClient if tavily-python is installed and TAVILY_API_KEY is set."""
api_key = os.environ.get("TAVILY_API_KEY")
if not api_key:
return None
try:
from tavily import TavilyClient
return TavilyClient(api_key=api_key)
except ImportError:
return None


DATE_TO_TIME_RANGE = {
"d1": "day",
"d7": "week",
"w1": "week",
"m1": "month",
"y1": "year",
}


def _warn_unsupported_tavily_flags(args):
"""Emit a stderr warning when Google-only flags are used with Tavily."""
unsupported = []
if args.date and args.date not in DATE_TO_TIME_RANGE:
unsupported.append(f"--date {args.date}")
if args.exact:
unsupported.append("--exact")
if args.exclude:
unsupported.append("--exclude")
if args.start > 1:
unsupported.append("--start")
if unsupported:
print(
f"Warning: {', '.join(unsupported)} not fully supported by Tavily and will be ignored.",
file=sys.stderr,
)


def tavily_search(args, client=None):
"""Perform a search using the Tavily API and print results in the same format."""
if client is None:
client = get_tavily_client()
if client is None:
print("Error: Tavily is not available.")
print(" Ensure TAVILY_API_KEY is set and tavily-python is installed:")
print(" pip install tavily-python")
sys.exit(1)

_warn_unsupported_tavily_flags(args)

kwargs = {
"query": args.query,
"max_results": min(max(1, args.num), 10),
"search_depth": "advanced",
}

if args.site:
kwargs["include_domains"] = [args.site]

if args.date:
time_range = DATE_TO_TIME_RANGE.get(args.date)
if time_range:
kwargs["time_range"] = time_range

try:
response = client.search(**kwargs)
except Exception as e:
print(f"Error: Tavily search failed: {e}")
sys.exit(1)

if args.raw:
print(json.dumps(response, ensure_ascii=False, indent=2))
return

print(format_tavily_results(response, args))


def format_tavily_results(data, args):
"""Format Tavily results into the same Markdown format as Google PSE."""
results = data.get("results", [])

lang_code = args.lang.lower()
gl = args.gl if args.gl else LANG_MAP.get(lang_code, ("", lang_code))[1]

conditions = [f"lang:{lang_code}", f"region:{gl}", "provider:tavily"]
if args.site:
conditions.append(f"site:{args.site}")

lines = []
lines.append(f'## Search Results: "{args.query}" ({len(results)} results)')
lines.append(f'> Filters: {" · ".join(conditions)}')
lines.append("")

if not results:
lines.append("No results found. Try adjusting your query or removing filters.")
return "\n".join(lines)

for i, item in enumerate(results, 1):
title = item.get("title", "(no title)")
url = item.get("url", "")
snippet = item.get("content", "").replace("\n", " ").strip()
lines.append(f"### {i}. [{title}]({url})")
if snippet:
lines.append(snippet)
lines.append("")

return "\n".join(lines)


LANG_MAP = {
"ko": ("lang_ko", "kr"),
"en": ("lang_en", "us"),
Expand Down Expand Up @@ -131,6 +247,18 @@ def format_results(data, args):


def search(args):
provider = getattr(args, "provider", "google")

if provider == "tavily":
tavily_search(args)
return

if provider == "auto" and not os.environ.get("GOOGLE_PSE_KEY"):
# No Google credentials; try Tavily directly
if get_tavily_client() is not None:
tavily_search(args)
return

api_key, cx_id = get_env()
params = build_params(args, api_key, cx_id)

Expand All @@ -143,20 +271,29 @@ def search(args):
print("Error: Network connection failed")
sys.exit(1)

# On quota/rate-limit errors, fall back to Tavily when provider=auto
if resp.status_code in (403, 429) and provider == "auto":
client = get_tavily_client()
if client is not None:
print(f"# Google PSE returned {resp.status_code}, falling back to Tavily...\n",
file=sys.stderr)
tavily_search(args, client=client)
return

if resp.status_code == 400:
err = resp.json().get("error", {}).get("message", "")
print(f"Error: Bad request (400): {err}")
sys.exit(1)
elif resp.status_code == 403:
err = resp.json().get("error", {}).get("message", "")
if "quota" in err.lower() or "limit" in err.lower():
print("Error: Daily quota exceeded (100 requests/day). Use duckduckgo-search as fallback.")
print("Error: Daily quota exceeded (100 requests/day). Use --provider tavily or --provider auto as fallback.")
else:
print(f"Error: Access denied (403). Check your API key or CX ID.")
print(f" Details: {err}")
sys.exit(1)
elif resp.status_code == 429:
print("Error: Rate limit exceeded (429). Use duckduckgo-search as fallback.")
print("Error: Rate limit exceeded (429). Use --provider tavily or --provider auto as fallback.")
sys.exit(1)
elif not resp.ok:
print(f"Error: API error ({resp.status_code}): {resp.text[:200]}")
Expand Down Expand Up @@ -185,6 +322,12 @@ def main():
parser.add_argument("--site", default="", help="Restrict search to a specific site")
parser.add_argument("--start", type=int, default=1, help="Start index for pagination (default: 1)")
parser.add_argument("--raw", action="store_true", help="Print raw JSON response (debug)")
parser.add_argument(
"--provider",
choices=["google", "tavily", "auto"],
default="google",
help="Search provider: google (default), tavily, or auto (Google with Tavily fallback)",
)

args = parser.parse_args()
search(args)
Expand Down