A Multimodal Vision Transformer-based Modeling Framework for Prediction of Fluid Flows in Energy Systems
Summary: arXiv:2604.02483v1 Announce Type: cross
Abstract
Computational fluid dynamics (CFD) simulations of complex fluid flows in energy systems are prohibitively expensive due to strong nonlinearities and multiscale-multiphysics interactions. In this work, we present a transformer-based modeling framework for prediction of fluid flows, and demonstrate it for high-pressure gas injection phenomena relevant to reciprocating engines.
Introduction
The demand for efficient energy systems has highlighted the importance of accurately predicting fluid flows within these systems. Traditional CFD approaches, while precise, often come with significant computational costs. This paper introduces a novel approach that employs a hierarchical Vision Transformer (SwinV2-UNet) architecture, aimed at improving the prediction of fluid flows through the integration of multimodal datasets from multi-fidelity simulations.
Model Architecture
The proposed framework is designed to handle complex fluid dynamics by incorporating auxiliary tokens that encode data modalities and time increments. This allows the model to adaptively learn from varying data sources and resolutions, providing a more comprehensive understanding of fluid behavior under different conditions.
Methodology
The model assesses its performance through two primary tasks:
- Spatiotemporal Rollouts: The model autoregressively predicts the flow state at future times, allowing for dynamic forecasting of fluid behavior.
- Feature Transformation: The model infers unobserved fields/views from observed ones, enhancing its ability to reconstruct missing flow-field information.
Data Generation
To validate the model, we generated multimodal datasets from in-house CFD simulations involving argon jet injection into a nitrogen environment. These datasets were created under various grid resolutions, turbulence models, and equations of state, enabling the model to learn generalized predictions across diverse scenarios.
Results and Discussion
The results indicate that the transformer-based models exhibit a remarkable ability to generalize across different resolutions and modalities. The framework successfully forecasts flow evolution and accurately reconstructs missing flow-field information, demonstrating its effectiveness in complex fluid flow systems.
Conclusion
This work illustrates the potential of large vision transformer-based models in advancing predictive modeling of complex fluid flows. By leveraging multimodal datasets and a hierarchical architecture, we can reduce the computational burden associated with traditional CFD simulations while maintaining accuracy and reliability in predictions. Future research will focus on further refining model capabilities and exploring additional applications within the field of energy systems.
