GitHub

📘 README: CartPole Q‑Learning in Rust

Overview

This project implements a simple tabular Q‑learning agent to solve a CartPole-like environment in Rust. The agent discretizes the continuous 4D state space into buckets and learns a Q-table mapping state → action values, using ε‑greedy exploration.

Code structure

1. Constants & Hyperparameters

BUCKETS: Array of 4 numbers, each specifying how many discrete bins per observation dimension.
LOWER_BOUNDS, UPPER_BOUNDS: Continuous min/max values for each dimension (cart position, cart velocity, pole angle, pole angular velocity).
ALPHA, GAMMA, EPSILON, EPISODES: Learning rate, discount factor, exploration rate, and number of training episodes.

2. `CartPole` environment struct

Stores current state as a 4‑element array [x, ẋ, θ, θ̇].
reset(): Initializes state with small random noise.
step(action): Applies physics: simple force-based update for position and angle. Returns (new_state, reward = 1.0, done_flag).

3. State discretization – `discretize()`

Converts each continuous state to a discrete index using min/max normalization and the bucket count.
Clamps indices so they stay within [0 .. BUCKETS[i]‑1].

4. Q‑table

Stored as a HashMap<State, [f64; 2]>, where State = (usize, usize, usize, usize).
Initialized lazily when encountering new states.

5. Training in `main()`

Loop for EPISODES, each:
- Reset environment.
- While not done and steps < 500:
  - Discretize current state.
  - Choose an action:
    - With probability EPSILON: random action (exploration).
    - Else: pick action with higher Q-value (greedy).
  - Execute action and observe (next_state, reward, done).
  - Discretize next state and initialize Q entry if needed.
  - Q-learning update:
```
td_target = reward + GAMMA * max(Q[next_state])
td_error = td_target – Q[current_state][action]
Q[current_state][action] += ALPHA * td_error
```
  - Update the current state and increment step counter.
- Every 100 episodes, prints episode number and steps survived.

6. Evaluation

After training, runs 5 evaluation episodes:
- Reset environment.
- Always pick greedy action (no exploration).
- Run until done or 500 steps, and print steps survived per evaluation.

Project Description

A Rust implementation of a tabular Q‑learning agent solving a simplified CartPole environment using state discretization.

Setup & Dependencies

Requires Rust and Cargo.
Add rand = "0.x" to your Cargo.toml.

Running the Code

cargo run --release

This will train the Q‑learning agent over 1000 episodes and then evaluate it with 5 test runs, printing survival steps.

Code Explanation

CartPole struct: Simulates pole physics and returns next state, reward, and terminal signal.
discretize() function: Maps continuous observations into discrete indices for Q‑table lookup.
Q‑table: A HashMap where keys are discretized state tuples and values are Q-values for two actions.
Learning loop: Implements ε‑greedy policy, Q‑learning update rule, and state transitions.
Evaluation loop: Tests the learned policy without exploration.

Performance & Potential Enhancements

Discretization resolution: You can increase BUCKETS size for finer state representation, though that increases Q‑table size.
Hyperparameter tuning: Adjust ALPHA, GAMMA, EPSILON, and number of buckets or episodes to improve learning.
Alternative policies: Replace ε‑greedy with decaying ε or Boltzmann exploration.
Function approximation: For a more advanced version, consider replacing the Q-table with a neural network (DQN).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
target		target
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 README: CartPole Q‑Learning in Rust

Overview

Code structure

1. Constants & Hyperparameters

2. `CartPole` environment struct

3. State discretization – `discretize()`

4. Q‑table

5. Training in `main()`

6. Evaluation

Suggested README Sections

Project Description

Setup & Dependencies

Running the Code

Code Explanation

Performance & Potential Enhancements

About

Uh oh!

Releases

Packages

Languages

rustnew/Q_learming_rust

Folders and files

Latest commit

History

Repository files navigation

📘 README: CartPole Q‑Learning in Rust

Overview

Code structure

1. Constants & Hyperparameters

2. CartPole environment struct

3. State discretization – discretize()

4. Q‑table

5. Training in main()

6. Evaluation

Suggested README Sections

Project Description

Setup & Dependencies

Running the Code

Code Explanation

Performance & Potential Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. `CartPole` environment struct

3. State discretization – `discretize()`

5. Training in `main()`

Packages