Skip to content

rustnew/Q_learming_rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📘 README: CartPole Q‑Learning in Rust

Overview

This project implements a simple tabular Q‑learning agent to solve a CartPole-like environment in Rust. The agent discretizes the continuous 4D state space into buckets and learns a Q-table mapping state → action values, using ε‑greedy exploration.


Code structure

1. Constants & Hyperparameters

  • BUCKETS: Array of 4 numbers, each specifying how many discrete bins per observation dimension.
  • LOWER_BOUNDS, UPPER_BOUNDS: Continuous min/max values for each dimension (cart position, cart velocity, pole angle, pole angular velocity).
  • ALPHA, GAMMA, EPSILON, EPISODES: Learning rate, discount factor, exploration rate, and number of training episodes.

2. CartPole environment struct

  • Stores current state as a 4‑element array [x, ẋ, θ, θ̇].
  • reset(): Initializes state with small random noise.
  • step(action): Applies physics: simple force-based update for position and angle. Returns (new_state, reward = 1.0, done_flag).

3. State discretization – discretize()

  • Converts each continuous state to a discrete index using min/max normalization and the bucket count.
  • Clamps indices so they stay within [0 .. BUCKETS[i]‑1].

4. Q‑table

  • Stored as a HashMap<State, [f64; 2]>, where State = (usize, usize, usize, usize).
  • Initialized lazily when encountering new states.

5. Training in main()

  • Loop for EPISODES, each:

    • Reset environment.

    • While not done and steps < 500:

      • Discretize current state.

      • Choose an action:

        • With probability EPSILON: random action (exploration).
        • Else: pick action with higher Q-value (greedy).
      • Execute action and observe (next_state, reward, done).

      • Discretize next state and initialize Q entry if needed.

      • Q-learning update:

        td_target = reward + GAMMA * max(Q[next_state])
        td_error = td_target – Q[current_state][action]
        Q[current_state][action] += ALPHA * td_error
        
      • Update the current state and increment step counter.

    • Every 100 episodes, prints episode number and steps survived.

6. Evaluation

  • After training, runs 5 evaluation episodes:

    • Reset environment.
    • Always pick greedy action (no exploration).
    • Run until done or 500 steps, and print steps survived per evaluation.

Suggested README Sections


Project Description

A Rust implementation of a tabular Q‑learning agent solving a simplified CartPole environment using state discretization.


Setup & Dependencies

  • Requires Rust and Cargo.
  • Add rand = "0.x" to your Cargo.toml.

Running the Code

cargo run --release

This will train the Q‑learning agent over 1000 episodes and then evaluate it with 5 test runs, printing survival steps.


Code Explanation

  • CartPole struct: Simulates pole physics and returns next state, reward, and terminal signal.
  • discretize() function: Maps continuous observations into discrete indices for Q‑table lookup.
  • Q‑table: A HashMap where keys are discretized state tuples and values are Q-values for two actions.
  • Learning loop: Implements ε‑greedy policy, Q‑learning update rule, and state transitions.
  • Evaluation loop: Tests the learned policy without exploration.

Performance & Potential Enhancements

  • Discretization resolution: You can increase BUCKETS size for finer state representation, though that increases Q‑table size.
  • Hyperparameter tuning: Adjust ALPHA, GAMMA, EPSILON, and number of buckets or episodes to improve learning.
  • Alternative policies: Replace ε‑greedy with decaying ε or Boltzmann exploration.
  • Function approximation: For a more advanced version, consider replacing the Q-table with a neural network (DQN).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published