A Proxy Consistency Loss for Grounded Fusion of Earth Observation and Location Encoders
Summary: arXiv:2604.18881v1
Announce Type: cross
Abstract
Supervised learning with Earth observation inputs is often limited by the sparsity of high-quality labeled or in-situ measured data to use as training labels. With the abundance of geographic data products, in many cases there are variables correlated with – but different from – the variable of interest that can be leveraged. We integrate such proxy variables within a geographic prior via a trainable location encoder and introduce a proxy consistency loss (PCL) formulation to imbue proxy data into the location encoder.
Key Insights
The first key insight behind our approach is to use the location encoder as an agile and flexible way to learn from abundantly available proxy data which can be sampled independently of training label availability. Our second key insight is that we will need to regularize the location encoder appropriately to achieve performance and robustness with limited labeled data.
Methodology
We proposed a systematic methodology that includes the following steps:
- Integration of Proxy Variables: Utilizing available geographic data products to feed into the location encoder.
- Proxy Consistency Loss (PCL): A new formulation that helps in embedding proxy data effectively into the learning process.
- Location Encoder Regularization: Implementing techniques to ensure that the location encoder performs robustly under the constraints of limited labeled data.
Experimental Results
Our experiments on air quality prediction and poverty mapping demonstrate significant improvements in predictive performance. The results indicate that:
- Integrating proxy data implicitly through the location encoder yields better outcomes than using both as inputs to an observation encoder.
- Fusion strategies that rely on frozen, pretrained location embeddings do not perform as well as our proposed method.
- In-sample prediction accuracy is superior, showcasing the ability of the PCL to incorporate rich information from the proxies.
- Out-of-sample predictions indicate that the learned latent embeddings facilitate generalization to regions where no training labels are available.
Conclusion
The introduction of a proxy consistency loss provides a robust mechanism for leveraging abundant proxy data in Earth observation applications. By integrating such data into a trainable location encoder, we not only improve prediction accuracy but also enhance the model’s ability to generalize across different geographical areas. Our findings highlight the potential of proxy variables in enriching training datasets, particularly in the context of supervised learning where labeled data is scarce.
Future work will focus on refining the PCL framework and exploring its applicability in other domains beyond Earth observation.
