HCEF-LLM: Human-Centric Evaluation Framework for Large Language Models

🎯 Overview

HCEF-LLM (Human-Centric Evaluation Framework for Large Language Models) is a comprehensive evaluation framework designed to assess Large Language Models as "Ideal Intelligent Partners" from a human-centric perspective. This project contains an interactive web-based leaderboard and visualization platform for comparing model performance across multiple dimension

🏗️ Framework Structure

Ideal Intelligent Partner (IIP) Functional Roles

IIP-F1: Precise Instruction & Constraint Comprehension
- Meticulous interpretation of directives
- Understanding explicit & implicit constraints
- Key aspects:
  - Accurate interpretation of explicit directives
  - Identification of implicit constraints and assumptions
  - Scope definition and ambiguity resolution
IIP-F2: Deep Goal & Intent Understanding
- Discerning underlying purpose
- Inferring user intent beyond literal requests
- Key aspects:
  - Discernment of ultimate objectives
  - Inferring user intent beyond literal requests
  - Prioritization of multiple or conflicting goals
IIP-F3: Capable & Adaptive Task Execution
- Effective and efficient task execution
- Resilience and adaptability in dynamic environments
- Key aspects:
  - Effective task decomposition and strategic planning
  - Intelligent resource allocation and tool utilization
  - Robust exception handling and dynamic adaptation
IIP-F4: Quality-Assured Outcome Delivery & Responsible Closure
- High-quality deliverable production
- Comprehensive and responsible task completion
- Key aspects:
  - Adherence to delivery standards and formats
  - Output self-assessment and iterative refinement

Core Capability Dimensions (CD)

The Core Capability Dimensions (CDs) are fundamental abilities required for proficient and human-aligned performance, derived from foundational aspects of intelligence. These dimensions are organized into six primary categories, each with specific sub-dimensions:

CD1: Input Processing & Comprehension Essential for accurate information acquisition and deep understanding
- CD1.1: Multimodal Information Acquisition
  - Processing various input modalities (text, visual, auditory)
  - Initial gateway for subsequent processing
  - Comprehensive input interpretation
- CD1.2: Contextual & Intentional Understanding
  - Deep contextual relevance discernment
  - User intent inference
  - Pragmatic nuance comprehension
  - Implicit assumption identification
CD2: Knowledge Retention & Application Mechanisms for storing, retrieving, and utilizing information effectively
- CD2.1: Dynamic Working Memory
  - Active information maintenance
  - Context tracking capability
  - Multi-step reasoning support
  - Complex instruction processing
- CD2.2: Accessible Long-Term Knowledge
  - Vast knowledge repository access
  - Factual information retrieval
  - Procedural knowledge application
  - Common-sense understanding
CD3: Logical Reasoning & Problem Solving Essential for resolving ambiguities, planning, and solution assessment
- CD3.1: Analytical & Inferential Reasoning
  - Inductive reasoning for pattern recognition
  - Deductive reasoning for rule application
  - Evidence-based inference generation
  - Logical consequence derivation
- CD3.2: Structured Problem Decomposition
  - Complex problem breakdown into sub-components
  - Component relationship analysis
  - Systematic solution development
  - Strategic execution planning
CD4: Imaginative & Creative Cognition Non-standard thinking for novel ideas and abstract thought
- CD4.1: Predictive & Generative Foresight
  - Future state anticipation
  - Outcome prediction modeling
  - Scenario generation
  - Pattern-based completion
- CD4.2: Novel Solution Ideation & Abstract Representation
  - Divergent thinking application
  - Analogical connection making
  - Abstract concept representation
  - Unconventional solution generation
CD5: Human-Centricity & Ethical Alignment Understanding and aligning with human values and social norms
- CD5.1: Empathetic & Social Awareness
  - Emotional cue recognition
  - Social dynamic interpretation
  - Interpersonal nuance understanding
  - Socially intelligent interaction
- CD5.2: Value Comprehension & Ethical Consideration
  - Human value understanding
  - Ethical principle application
  - Responsible AI framework alignment
  - Ethical situation recognition
CD6: Output Generation & Delivery Production of clear, appropriate, and effective outputs
- CD6.1: Clear & Coherent Communication
  - Grammatical correctness
  - Logical structure formation
  - Message clarity optimization
  - Information conveyance effectiveness
- CD6.2: Adaptive & Purposeful Expression
  - Context-appropriate style adaptation
  - Audience-specific tone adjustment
  - Tool utilization effectiveness
  - Action execution precision

Proficiency Descriptors (PD)

PD1: Emergent

Characteristics: Basic functionality with significant limitations
Performance:
- Can only handle simple, explicit tasks
- Requires detailed guidance and structured input
- Unstable output quality
- Limited error handling capabilities

PD2: Developing

Characteristics: Improving capabilities with some constraints
Performance:
- Can handle moderately complex tasks
- Requires moderate guidance
- Gradually improving output quality
- Basic error handling capabilities

PD3: Proficient

Characteristics: Solid performance meeting most requirements
Performance:
- Can handle complex tasks
- Strong autonomy
- Stable output quality
- Good error handling capabilities

PD4: Expert

Characteristics: Exceptional performance exceeding expectations
Performance:
- Can handle highly complex tasks
- Exceptional autonomy and creativity
- Excellent output quality
- Outstanding error handling and recovery capabilities

📊 Evaluation Results

The leaderboard includes evaluation results for various state-of-the-art language models:

OpenAI o4 mini
OpenAI GPT 4.1
Claude 4 Sonnet
Gemini 2.5 Pro
Qwen3 235B

Each model is evaluated across all six capability dimensions and assigned an overall proficiency level.

👥 Authors

Pingfan
Dolly

📚 Citation

If you use this framework in your research, please cite: BibTeX:

@misc{wang2025hcef,
  title        = {HCEF-LLM: A Human-Centric Evaluation Framework for Advancing Large Language Models as Ideal Intelligent Partners},
  author       = {Wang, Pingfan and Deng, Linyuan},
  year         = {2025},
  month        = {July},
  publisher    = {TechRxiv},
  doi          = {10.36227/techrxiv.175329267.71975279/v1},
  url          = {https://doi.org/10.36227/techrxiv.175329267.71975279/v1}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
README.zh.md		README.zh.md
index.html		index.html
paper.txt		paper.txt
script.js		script.js
styles.css		styles.css
visualization.html		visualization.html
visualization.js		visualization.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HCEF-LLM: Human-Centric Evaluation Framework for Large Language Models

🎯 Overview

🏗️ Framework Structure

Ideal Intelligent Partner (IIP) Functional Roles

Core Capability Dimensions (CD)

Proficiency Descriptors (PD)

PD1: Emergent

PD2: Developing

PD3: Proficient

PD4: Expert

📊 Evaluation Results

👥 Authors

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HCEF-LLM: Human-Centric Evaluation Framework for Large Language Models

🎯 Overview

🏗️ Framework Structure

Ideal Intelligent Partner (IIP) Functional Roles

Core Capability Dimensions (CD)

Proficiency Descriptors (PD)

PD1: Emergent

PD2: Developing

PD3: Proficient

PD4: Expert

📊 Evaluation Results

👥 Authors

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages