This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Mechanistic Interpretability Hackathon
Accepted at the 
Mechanistic Interpretability Hackathon
 research sprint on 
January 25, 2023

Investigating Agent Behavior In different RL methods

In order to achieve the desired behavior of an agent that learns from its mistakes and improves its performance, we need to get more familiar with the concept of Reinforcement Learning (RL). Implementing such a self-learning system is easier than we may think we already know the agent's systems but what is their behavior look like? We will show an example by with experimenting the three algorithms mentioned above. Let us assume we are in Baghdad city and we need to go to Fallujah as fast as we can. There are two roads where we can leave the city (Baghdad), routes 10 and 11. After we arrive in Fallujah east, we have only two bridges we can choose from to cross the Euphrates River to arrive at Fallujah west. The traffic is unpredictable, so it can happen that the road or bridge, we choose has a traffic jam. The time we need to cross the bridges depends on the first action (route 10 or 11), so they will be different in either case. In addition, sometimes we are redirected even though we have chosen the other route/bridge. Our goal is to build a strategy, where we gain the most reward on our journey. Rewards are the negative time we need to go through that road/bridge.

By 
Al-Hitawi Mohammed Abed , Saif Ali and Bertold Pal
🏆 
4th place
3rd place
2nd place
1st place
 by peer review