Skip to content

aminmomin2/FRI-Task-Planning

Repository files navigation

FRI-Task-Planning

A comprehensive robotic task planning and execution system that combines computer vision, AI-powered task decomposition, and ROS-based robot control for automated block manipulation tasks.

🎯 Project Overview

This project implements an intelligent robotic system capable of:

  • Computer Vision Block Tracking: Real-time detection and tracking of colored blocks using OpenCV
  • AI-Powered Task Planning: Using Google's Gemini AI to decompose high-level tasks into executable robot actions
  • ROS-Based Robot Control: Coordinated robot arm control for pick-and-place operations
  • 3D Point Cloud Processing: Depth perception and spatial understanding for precise manipulation

🏗️ System Architecture

The system consists of several interconnected modules:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Task Input    │───▶│  Gemini AI      │───▶│  Task Executor  │
│   (User)        │    │  Decomposition  │    │  (ROS Node)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                       │
                                ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Block Tracker  │───▶│  Point Cloud    │───▶│  Robot Controller│
│  (OpenCV)       │    │  Transformer    │    │  (ROS Node)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

📁 Project Components

🤖 Core Modules

1. Task Planning & AI Integration

  • gemini-api.py: Interfaces with Google's Gemini AI to decompose high-level tasks into robot-executable subtasks
  • config.py: Manages API keys and configuration settings
  • task_executor.py: ROS node that receives and executes task plans

2. Computer Vision & Block Tracking

  • block_tracking.py: Standalone OpenCV-based block detection and tracking
  • ros_block_tracker.py: ROS-integrated version of block tracking
  • centroid_tracker.py: Object tracking algorithm for maintaining block IDs across frames

3. 3D Perception & Spatial Understanding

  • listener.py: Camera data processing and point cloud generation
  • point_cloud_transformer.py: Coordinate frame transformations for robot integration

4. Robot Control

  • robot_controller.py: High-level robot arm control and pick-and-place operations

🚀 Features

🎨 Multi-Color Block Detection

  • Detects blocks in 7 different colors: red, orange, yellow, green, blue, purple
  • Robust HSV color space filtering with morphological operations
  • Real-time tracking with unique ID assignment

🧠 Intelligent Task Decomposition

  • Uses Gemini 2.0 Flash Thinking model for task planning
  • Converts natural language commands into structured robot actions
  • Supports "Pick" and "Place" primitive skills
  • Generates JSON-formatted task plans

📊 Advanced Object Tracking

  • Centroid-based object tracking with persistence
  • Movement detection and visualization
  • Trail visualization for motion analysis
  • Robust handling of occlusions and temporary disappearances

🤖 ROS Integration

  • Full ROS ecosystem integration
  • Point cloud processing and transformation
  • Robot arm pose control
  • Real-time sensor data processing

🛠️ Installation & Setup

Prerequisites

# Python dependencies
pip install opencv-python numpy imutils scipy google-genai python-dotenv

# ROS dependencies (if using ROS)
sudo apt-get install ros-noetic-cv-bridge ros-noetic-tf2-ros
pip install rospkg

Environment Setup

  1. Create a .env file in the project root:
GEMINI_API_KEY=your_gemini_api_key_here
  1. Install the required Python packages:
pip install -r requirements.txt

📖 Usage

1. Basic Block Tracking

# Standalone block tracking with webcam
python block_tracking.py

# With video file
python block_tracking.py --video path/to/video.mp4

2. AI Task Planning

# Interactive task planning
python gemini-api.py
# Enter task: "Stack the red block on top of the blue block"

3. ROS-Based System

# Terminal 1: Start ROS core
roscore

# Terminal 2: Start camera listener
python listener.py

# Terminal 3: Start block tracker
python ros_block_tracker.py

# Terminal 4: Start point cloud transformer
python point_cloud_transformer.py

# Terminal 5: Start robot controller
python robot_controller.py

# Terminal 6: Start task executor
python task_executor.py

🔧 Configuration

Block Detection Parameters

  • Color Ranges: HSV thresholds for each color in block_tracking.py
  • Block Size: MIN_BLOCK_AREA and MAX_BLOCK_AREA for size filtering
  • Movement Threshold: movement_threshold for motion detection

Robot Parameters

  • Gripper Control: gripper_open and gripper_close positions
  • Pick/Place Heights: pick_height and place_height above surfaces
  • Movement Delays: Timing for robot arm movements

AI Configuration

  • Model: Gemini 2.0 Flash Thinking (configurable in gemini-api.py)
  • Skills: Currently supports "Pick" and "Place" (extensible)
  • Output Format: JSON array of subtask objects

📊 Output Formats

Task Plan JSON

[
  {
    "subtask": "Pick the red block",
    "skill": "Pick"
  },
  {
    "subtask": "Place the red block on top of the blue block",
    "skill": "Place"
  }
]

Block Tracking Data

  • Real-time block positions and IDs
  • Movement status and trails
  • Color classification results

🔍 Troubleshooting

Common Issues

  1. Camera not detected: Check camera permissions and device connections
  2. Color detection issues: Adjust HSV ranges in color_ranges dictionary
  3. ROS connection errors: Ensure ROS core is running and topics are published
  4. API key errors: Verify Gemini API key in .env file

Performance Optimization

  • Reduce frame resolution for faster processing
  • Adjust block size thresholds for your environment
  • Use GPU acceleration for OpenCV operations

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

🙏 Acknowledgments

  • Google Gemini AI for task planning capabilities
  • OpenCV community for computer vision tools
  • ROS community for robotics framework
  • FRI (Friendly Robotics Initiative) for project inspiration

About

Intelligent robotic task planning and execution: Gemini decomposes natural-language goals, OpenCV tracks blocks, and ROS controls the arm with point-cloud perception.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages