Skip to content

pingfanfan/HCEF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HCEF-LLM: Human-Centric Evaluation Framework for Large Language Models

English | 中文

🎯 Overview

HCEF-LLM (Human-Centric Evaluation Framework for Large Language Models) is a comprehensive evaluation framework designed to assess Large Language Models as "Ideal Intelligent Partners" from a human-centric perspective. This project contains an interactive web-based leaderboard and visualization platform for comparing model performance across multiple dimension

🏗️ Framework Structure

Ideal Intelligent Partner (IIP) Functional Roles

  1. IIP-F1: Precise Instruction & Constraint Comprehension

    • Meticulous interpretation of directives
    • Understanding explicit & implicit constraints
    • Key aspects:
      • Accurate interpretation of explicit directives
      • Identification of implicit constraints and assumptions
      • Scope definition and ambiguity resolution
  2. IIP-F2: Deep Goal & Intent Understanding

    • Discerning underlying purpose
    • Inferring user intent beyond literal requests
    • Key aspects:
      • Discernment of ultimate objectives
      • Inferring user intent beyond literal requests
      • Prioritization of multiple or conflicting goals
  3. IIP-F3: Capable & Adaptive Task Execution

    • Effective and efficient task execution
    • Resilience and adaptability in dynamic environments
    • Key aspects:
      • Effective task decomposition and strategic planning
      • Intelligent resource allocation and tool utilization
      • Robust exception handling and dynamic adaptation
  4. IIP-F4: Quality-Assured Outcome Delivery & Responsible Closure

    • High-quality deliverable production
    • Comprehensive and responsible task completion
    • Key aspects:
      • Adherence to delivery standards and formats
      • Output self-assessment and iterative refinement

Core Capability Dimensions (CD)

The Core Capability Dimensions (CDs) are fundamental abilities required for proficient and human-aligned performance, derived from foundational aspects of intelligence. These dimensions are organized into six primary categories, each with specific sub-dimensions:

  1. CD1: Input Processing & Comprehension Essential for accurate information acquisition and deep understanding

    • CD1.1: Multimodal Information Acquisition
      • Processing various input modalities (text, visual, auditory)
      • Initial gateway for subsequent processing
      • Comprehensive input interpretation
    • CD1.2: Contextual & Intentional Understanding
      • Deep contextual relevance discernment
      • User intent inference
      • Pragmatic nuance comprehension
      • Implicit assumption identification
  2. CD2: Knowledge Retention & Application Mechanisms for storing, retrieving, and utilizing information effectively

    • CD2.1: Dynamic Working Memory
      • Active information maintenance
      • Context tracking capability
      • Multi-step reasoning support
      • Complex instruction processing
    • CD2.2: Accessible Long-Term Knowledge
      • Vast knowledge repository access
      • Factual information retrieval
      • Procedural knowledge application
      • Common-sense understanding
  3. CD3: Logical Reasoning & Problem Solving Essential for resolving ambiguities, planning, and solution assessment

    • CD3.1: Analytical & Inferential Reasoning
      • Inductive reasoning for pattern recognition
      • Deductive reasoning for rule application
      • Evidence-based inference generation
      • Logical consequence derivation
    • CD3.2: Structured Problem Decomposition
      • Complex problem breakdown into sub-components
      • Component relationship analysis
      • Systematic solution development
      • Strategic execution planning
  4. CD4: Imaginative & Creative Cognition Non-standard thinking for novel ideas and abstract thought

    • CD4.1: Predictive & Generative Foresight
      • Future state anticipation
      • Outcome prediction modeling
      • Scenario generation
      • Pattern-based completion
    • CD4.2: Novel Solution Ideation & Abstract Representation
      • Divergent thinking application
      • Analogical connection making
      • Abstract concept representation
      • Unconventional solution generation
  5. CD5: Human-Centricity & Ethical Alignment Understanding and aligning with human values and social norms

    • CD5.1: Empathetic & Social Awareness
      • Emotional cue recognition
      • Social dynamic interpretation
      • Interpersonal nuance understanding
      • Socially intelligent interaction
    • CD5.2: Value Comprehension & Ethical Consideration
      • Human value understanding
      • Ethical principle application
      • Responsible AI framework alignment
      • Ethical situation recognition
  6. CD6: Output Generation & Delivery Production of clear, appropriate, and effective outputs

    • CD6.1: Clear & Coherent Communication
      • Grammatical correctness
      • Logical structure formation
      • Message clarity optimization
      • Information conveyance effectiveness
    • CD6.2: Adaptive & Purposeful Expression
      • Context-appropriate style adaptation
      • Audience-specific tone adjustment
      • Tool utilization effectiveness
      • Action execution precision

Proficiency Descriptors (PD)

PD1: Emergent

  • Characteristics: Basic functionality with significant limitations
  • Performance:
    • Can only handle simple, explicit tasks
    • Requires detailed guidance and structured input
    • Unstable output quality
    • Limited error handling capabilities

PD2: Developing

  • Characteristics: Improving capabilities with some constraints
  • Performance:
    • Can handle moderately complex tasks
    • Requires moderate guidance
    • Gradually improving output quality
    • Basic error handling capabilities

PD3: Proficient

  • Characteristics: Solid performance meeting most requirements
  • Performance:
    • Can handle complex tasks
    • Strong autonomy
    • Stable output quality
    • Good error handling capabilities

PD4: Expert

  • Characteristics: Exceptional performance exceeding expectations
  • Performance:
    • Can handle highly complex tasks
    • Exceptional autonomy and creativity
    • Excellent output quality
    • Outstanding error handling and recovery capabilities

📊 Evaluation Results

The leaderboard includes evaluation results for various state-of-the-art language models:

  • OpenAI o4 mini
  • OpenAI GPT 4.1
  • Claude 4 Sonnet
  • Gemini 2.5 Pro
  • Qwen3 235B

Each model is evaluated across all six capability dimensions and assigned an overall proficiency level.

👥 Authors

  • Pingfan
  • Dolly

📚 Citation

If you use this framework in your research, please cite: BibTeX:

@misc{wang2025hcef,
  title        = {HCEF-LLM: A Human-Centric Evaluation Framework for Advancing Large Language Models as Ideal Intelligent Partners},
  author       = {Wang, Pingfan and Deng, Linyuan},
  year         = {2025},
  month        = {July},
  publisher    = {TechRxiv},
  doi          = {10.36227/techrxiv.175329267.71975279/v1},
  url          = {https://doi.org/10.36227/techrxiv.175329267.71975279/v1}
}

About

Human-Centric Evaluation Framework for LLM evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors