Dynamic Resource Matching in Manufacturing Using Deep Reinforcement Learning
Summary: arXiv:2603.27066v1 Announce Type: cross
Matching plays a crucial role in the effective allocation of resources across various industries, with manufacturing being a significant area of focus. In recent years, the importance of capacity sharing within manufacturing processes has gained considerable attention.
Abstract
This paper addresses the challenge of dynamically matching demand-capacity types of manufacturing resources. We formulate this multi-period, many-to-many manufacturing resource-matching problem as a sequential decision-making process. Given the complexity of this problem, which includes extensive state and action spaces, accurately modeling the joint distribution of diverse demand types becomes impractical.
Introduction
To tackle the challenges posed by the curse of dimensionality and the complexities of transition dynamics in resource matching, we employ a model-free deep reinforcement learning approach. This method allows us to derive optimal matching policies without the need for comprehensive models of the underlying processes.
Methodology
In our research, we introduce two essential penalties to enhance the traditional Q-learning algorithm:
- Domain Knowledge-Based Penalty: This penalty is guided by prior policies, enabling a more informed decision-making process.
- Infeasibility Penalty: This penalty ensures that actions conform to the prevailing demand-supply constraints, thus preventing impractical solutions.
Theoretical Results
We provide theoretical foundations for the convergence of our domain knowledge-informed Q-learning approach, which offers performance guarantees for smaller-sized problems. For larger-scale scenarios, we enhance our methodology by integrating it into the deep deterministic policy gradient (DDPG) algorithm, resulting in what we term the domain knowledge-informed DDPG (DKDDPG).
Computational Study
Our computational experiments encompass both small- and large-scale scenarios, where DKDDPG consistently outperformed traditional DDPG and other reinforcement learning algorithms. The results indicate higher cumulative rewards and demonstrate enhanced efficiency across time and episodes.
Conclusion
Our study illustrates the potential of deep reinforcement learning techniques, specifically the DKDDPG framework, in optimizing resource allocation in manufacturing settings. By effectively addressing the inherent challenges of dynamic resource matching, our approach paves the way for more efficient manufacturing processes, ultimately benefiting the industry as a whole.
