Skip to content

Tool: CNPq/Lattes Navigator — Conflict of Interest Rules + 5-Year Production Summary #5

@rmarcacini

Description

@rmarcacini

Objective

Create a Tool for Agents4Gov (LABIC – ICMC/USP) that uses browser-use to navigate public CNPq/Lattes pages, starting from the official search portal:

Start URL: https://buscatextual.cnpq.br/buscatextual/busca.do?metodo=apresentar

Given a list of names and Lattes IDs, the tool will:

  1. Detect potential Conflicts of Interest (COI) between the listed researchers.
  2. Summarize academic production over the last 5 years per researcher.

Scope & Constraints

  • Data sources: Only public CNPq/Lattes pages reachable from the start URL above.

Inputs

  • Researchers (list):
    • name (string)
    • lattes_id (string; as seen in the public Lattes URL)
  • Window: Rolling last 5 years (relative to execution date), configurable.
  • COI configuration (optional): thresholds and toggles for each rule (see below).

Conflict of Interest (COI) — Rules & Determination

The tool must evaluate pairwise COI across all input researchers using only publicly available information.
A COI flag is raised when any activated rule is satisfied. Each hit must include why it was triggered and evidence URLs.

Time Window

  • Default: last 5 calendar years (configurable).

Core Rules (activate via config; default = ON)

  1. Co-authorship (R1)

    • Condition: At least 1 co-authored item (journal, conference, chapter, book, patent, software, technical report) within the window.
    • Evidence: Publication entry (title, year, venue) on both profiles and/or shared coauthor list.
  2. Advisor–Advisee Relationship (R2)

    • Condition: One researcher listed as advisor/supervisor of the other’s Master/PhD/Postdoc within the window (concluded or ongoing).
    • Evidence: Advising/supervision sections (names, titles, years).
  3. Institutional Overlap (R3)

    • Condition: Same department or graduate program affiliation concurrently within the window.
    • Evidence: Affiliation fields (institution, unit/program, time markers).
    • Configurable detail: Require same program or accept same institution as sufficient.
  4. Project Team Overlap (R4)

    • Condition: Participation in the same funded project (research/project section) within the window.
    • Evidence: Project title, sponsor, role, and years as listed publicly.
  5. Committee/Board/Event Overlap (R5)

    • Condition: Publicly listed service on the same committee/board/event organization within the window (when available).
    • Evidence: Activities/Services section with event/committee name and year.
  6. Frequent Co-Authorship (R6, stronger signal)

    • Condition: ≥ 3 co-authored items within the window.
    • Evidence: Publication list corroborating repeated collaboration.
  7. Strong Institutional Proximity (R8)

    • Condition: Same lab/group explicitly named in both profiles within the window.
    • Evidence: Group/lab names in affiliations or projects.

Note: Disambiguation must be conservative. If names/venues are ambiguous, flag with low confidence and include a warning.


Outputs

Per Researcher

  • person: { name, lattes_id, profile_url, last_update (if available) }
  • production_5y:
    • publications: counts by type; top items (title, year, venue)
    • projects: active/ended (title, role, sponsor, years)
    • advising: MS/PhD/Postdoc concluded and ongoing
    • activities: committee/board/event roles (if public)
    • affiliations_5y: institutions/programs detected
  • coauthors_5y: unique coauthors (name, count)
  • warnings: rate limit, missing sections, parsing ambiguity
  • evidence: list of supporting URLs/snippets

Pairwise COI Matrix

  • pairs: [ { a_lattes_id, b_lattes_id, rules_triggered: [R1, R3, ...], confidence: "high|medium|low", evidence_urls: [...] } ]

Summary Text (LLM-assisted if enabled)

  • Short, neutral summary of COI findings and 5-year production highlights.

Functional Requirements

  1. Navigation & Parsing (browser-use)

    • Start at: https://buscatextual.cnpq.br/buscatextual/busca.do?metodo=apresentar
    • Search by name or go directly via lattes_id URL when available.
    • Visit each public profile; extract publications, projects, advising, affiliations, activities/services.
    • Record evidence URLs and minimal text snippets for each extracted item.
  2. Time Filtering & Normalization

    • Filter items to last 5 years; handle year parsing and ranges.
    • Normalize names (Unicode/case), venues, and roles; deduplicate by DOI or title+year.
  3. COI Evaluation

    • Apply rules R1–R7
    • Assign confidence levels (e.g., exact match = high; fuzzy/ambiguous = low).
    • Attach why + evidence URLs to each rule hit.

Expected Behavior (User Flow)

  1. User opens Open WebUI → Tools → CNPq/Lattes Navigator (COI + 5Y Summary).
  2. Provides a list of { name, lattes_id } and optional COI config (rules ON/OFF, window).
  3. Tool navigates from the start URL, finds profiles, extracts public data.
  4. Tool returns:
    • JSON (per-researcher results + pairwise COI matrix)
    • Short summary text (LLM-assisted if enabled)
    • Action log for auditing

Deliverables

  • Folder: tools/cnpq_lattes_navigator/
    • README.md — usage, COI rules, limitations, ethics/compliance
    • requirements.txt — declared dependencies
    • main.py — orchestration: navigation, parsing, COI rules, outputs
    • schema.json — output schema (per-person + pairs)
    • examples/ — sample input and anonymized output JSON
  • Update docs/README.md to reference this tool

Acceptance Criteria

  • Starts navigation from the official search URL and reaches public Lattes profiles.
  • Accepts list of { name, lattes_id }.
  • Extracts and summarizes last 5 years of production per researcher.
  • Applies COI rules (R1–R6; optional R7–R8) and returns pairwise findings with evidence URLs and confidence.
  • Returns validated JSON per schema.json + short human summary.
  • Implements rate limiting, retry/backoff, and transparent action logs.
  • Runs inside Open WebUI Tools (importable, configurable, runnable).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions