Generative Models on Phase Space
Summary: arXiv:2604.02415v1 Announce Type: cross
Abstract: Deep generative models such as diffusion and flow matching are powerful machine learning tools capable of learning and sampling from high-dimensional distributions. They are particularly useful when the training data appears to be concentrated on a submanifold of the data embedding space. For high-energy physics data, consisting of collections of relativistic energy-momentum 4-vectors, this submanifold can enforce extremely strong physically-motivated priors, such as energy and momentum conservation. If these constraints are learned only approximately, rather than exactly, this can inhibit the interpretability and reliability of such generative models.
To remedy this deficiency, we introduce generative models which are, by construction, confined at every step of their sampling trajectory to the manifold of massless N-particle Lorentz-invariant phase space in the center-of-momentum frame. In the case of diffusion models, the “pure noise” forward process endpoint corresponds to the uniform distribution on phase space, which provides a clear starting point from which to identify how correlations among the particles emerge during the reverse (de-noising) process. We demonstrate that our models are able to learn both few-particle and many-particle distributions with various singularity structures, paving the way for future interpretability studies using generative models trained on simulated jet data.
Introduction
Generative models have revolutionized the way we approach machine learning, especially in fields where high-dimensional data is prevalent. In high-energy physics, effective modeling of data is crucial for understanding particle interactions and behaviors. Traditional methods often struggle with the complexities introduced by physical constraints, such as conservation laws. This article discusses recent advancements in generative modeling approaches specifically tailored to the unique requirements of high-energy physics data.
Core Concepts
- Generative Models: These models are designed to generate new data points from a learned distribution, making them ideal for tasks such as sampling and data augmentation.
- Phase Space: In the context of high-energy physics, phase space refers to the multidimensional space in which all possible states of a system are represented. For particle physics, this includes energy-momentum vectors that obey specific conservation laws.
- Diffusion Models: A type of generative model that utilizes a forward process to add noise to data and a reverse process to denoise and recover the original distribution.
Methodology
The proposed generative models leverage the structure of N-particle Lorentz-invariant phase space. By constraining the sampling process to this manifold, the models inherently respect physical principles, thus improving reliability and interpretability. The approach integrates the following key aspects:
- Manifold Learning: The model’s design ensures that it operates strictly within the boundaries defined by physical laws, such as energy and momentum conservation.
- Reverse Process Analysis: The diffusion model’s reverse process provides insights into how correlations develop as particles interact, leading to a more profound understanding of the underlying physics.
- Singularity Structures: The study demonstrates the model’s capability to learn distributions that exhibit various singularity patterns, crucial for accurately simulating complex particle interactions.
Conclusion
The introduction of generative models that conform to the physical constraints of high-energy physics offers a significant step forward in the field. By ensuring that these models are confined to Lorentz-invariant phase space, we enhance their interpretability and reliability. Future research will focus on applying these models to real-world data, potentially leading to breakthroughs in our understanding of particle physics and the development of advanced simulation techniques.
