paint-brush
Reinforcement Learning Simulation Features Realism and Adaptability by@reinforcement

Reinforcement Learning Simulation Features Realism and Adaptability

tldt arrow

Too Long; Didn't Read

This system models a Continuous Double Auction (CDA) market using reinforcement learning agents—liquidity-taking (LT) and market-making (MM)—each operating independently with unique parameters. By running agents in parallel and diversifying their behavior, the simulation achieves greater realism and dynamic trading insights.
featured image - Reinforcement Learning Simulation Features Realism and Adaptability
Reinforcement Technology Advancements HackerNoon profile picture


This is Part 3 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.

Part 1: Abstract & Introduction

Part 2: Important Concepts

Part 3: System Description

Part 4: Agents & Simulation Details

Part 5: Experiment Design

Part 6: Continual Learning

Part 7: Experiment Results

Part 8: Market and Agent Responsiveness to External Events

Part 9: Conclusion & References

Part 10: Additional Simulation Results

Part 11: Simulation Configuration

3. System and Agents

3.1 System Description

The system contains a machine engine that organizes LOBs and settles trades, as well as a brokerage center that keeps track of each agent’s account, including the agent’s buying power and assets. All agents place market and limit orders to the matching engine through their brokerage accounts. The matching engine runs a CDA market model. The engine updates the latest LOB information and streams its state to each trading agent in real time.


The agents in this system are of two types: liquidity-taking (LT) agents and market-making (MM) agents. Each instance of these agents is formulated as an RL agent, each with its own parameters and reward function. Each agent observes the system independently, selects actions, receives feedback, and optimizes its own strategy. Each agent learns to adapt its strategy through actions (orders submitted) and feedback received (reward). The formulation of rewards is different for each agent, we provide details in the next section.


We highlight two aspects of our work which we think helped improve the realism of the simulation compared to prior work. First, all agents run in their own respective threads, thus all threads run in parallel and are not waiting for any other thread once they are launched. Second, all agents are heterogeneous. Even though some agents belong to the same category, they use different sets of hyperparameters, and this results in significantly different behavior for each agent.


Authors:

(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA (zyao9@stevens.edu);

(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA (zli149@stevens.edu);

(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA (mthomas3@stevens.edu);

(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA (ifloresc@stevens.edu).


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.