Safe Reinforcement Learning-Based Vibration Control: Overcoming Training Risks with LQR Guidance

ICCMS 2025

Indian Institute of Technology Delhi
Structural vibration risks in infrastructure demand robust controllers. Model-based strategies like LQR effectively stabilize these systems, but require accurate models—often impractical for real structures. RL offers a model-free alternative, but direct deployment is hazardous because agent exploration during its training can be damaging.

How can RL be trained safely on physical (true) systems?

We present a framework blending LQR policy guidance with RL.
  • The LQR controller, designed from arbitrary model parameters, guides the RL agent during exploration, ensuring safety without relying on perfect system identification.
  • RL policies learn directly on physical (true) systems using reward signals; LQR actions are linearly mixed for guidence.
  • Experiments demonstrate that the blended controller reduces training-time vibration compared to pure RL, and ultimately surpasses LQR in vibration suppression after learning, all while maintaining low control effort.
This method opens practical paths for safe, model-free vibration control in real-time applications where training risks are a major concern.
LQR-Guided RL

Figure 1: Overview of the LQR-Guided Reinforcement Learning framework: a classical spring-mass-damper system is controlled using a methodology that integrates Linear Quadratic Regulator (LQR) guidance within a reinforcement learning paradigm, leveraging both analytical control theory and data-driven learning.

Performance of LQR-Guided RL

A LQR is designed based on assumed linear dynamical system whose parameters value are chosen arbitrary. This designed LQR is used to Guide the RL during its training and testing phase on a true system. The assumed system and true system (nonlinear) considered are as below: $$\text{Assumed system}\hspace{2em}1.6\ddot{x} - 0.5\dot{x} + 181x = u - 1.6\ddot{x}_{g}$$ $$ \text{True system}\hspace{3em}\ddot{x} + 0.4\dot{x} + 100x + x^3= u - \ddot{x}^g$$ where $x,u,\ddot{x}^g$ are the displacement, control force and ground acceleration respectively.

Training phase

To assess the effectiveness of the proposed approach, the acceleration response of the dynamical system during the training phase is compared between the LQR-guided reinforcement learning (RL) policy and the non-guided RL policy. As shown in Figure [2], the incorporation of the LQR prior leads to a significantly reduced acceleration response relative to the non-guided RL policy.
LQR-Guided RL

Figure 2: Comparison of acceleration response during training of RL controller and LQR-Guided RL controller

Testing phase

The LQR-guided RL policy and the standalone LQR policy are tested on the true nonlinear system. As illustrated in Figure [3], the LQR-guided RL policy exhibits lower acceleration responses and requires reduced control effort compared to the LQR policy. These results indicate that LQR guidance improves the safety in the training process of the RL policy, and also, the trained RL policy achieves superior performance compared to the standalone LQR policy.
LQR-Guided RL

Figure 3: Comparison of displacement, velocity, acceleration response and control force during testing for the case of LQR-Guided RL and LQR

Abstract

Structural vibrations induced by external excitations pose significant risks, including safety hazards for occupants, structural damage, and increased maintenance costs. While conventional model-based control strategies like Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) effectively mitigate vibrations, their reliance on accurate system models necessitates tedious system identification. This tedious system identification process can be avoided by using model-free Reinforcement learning (RL) method for vibration control task.
RL controllers derive their policies solely from observed structural behaviour, eliminating the requirement for an explicit structural model. For an RL controller to be truly model-free, its training must occur on the actual physical system rather than in simulation. However, during this training phase, the RL controller lacks prior knowledge and it exerts control force on the structure randomly, which can potentially harm the structure. To mitigate this risk, we propose guiding the RL controller using a Linear Quadratic Regulator (LQR) controller. While LQR control typically relies on an accurate structural model for optimal performance, our observations indicate that even an LQR controller based on an entirely incorrect model outperforms the uncontrolled scenario. Motivated by this finding, we introduce a hybrid control framework that integrates both LQR and RL controllers. In this approach, the LQR policy is derived from a randomly selected model and its parameters. As this LQR policy does not require knowledge of the true or an approximate structural model the overall framework remains model-free. This hybrid approach eliminates dependency on explicit system models while minimizing exploration risks inherent in naive RL implementations. As per our knowledge, this is the first study to address the critical training safety challenge of RL-based vibration control and provide a validated solution.

BibTeX


Not yet added