Mean Masked Autoencoder with Flow-Mixing for Encrypted Traffic Classification
Summary: arXiv:2603.29537v1 Announce Type: cross
Abstract
Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated huge potential. However, existing methods are confined to isolated byte-level reconstruction of individual flows, lacking adequate perception of the multi-granularity contextual relationship in traffic. To address this limitation, we propose Mean MAE (MMAE), a teacher-student MAE paradigm with flow mixing strategy for building an encrypted traffic pre-training model.
Introduction
The increasing complexity of network traffic, particularly due to encryption, poses significant challenges for traditional classification methods. The advent of self-supervised learning techniques, particularly those utilizing Masked Autoencoders, has opened new avenues for improving traffic classification. The MMAE model represents a novel approach that enhances the capabilities of existing frameworks.
Methodology
MMAE employs a self-distillation mechanism for teacher-student interaction, where the teacher provides unmasked flow-level semantic supervision to advance the student from local byte reconstruction to multi-granularity comprehension. This shift is crucial for understanding the broader context of network traffic rather than focusing solely on isolated data points.
Flow Mixing Strategy
To break the information bottleneck in individual flows, we introduce a dynamic Flow Mixing (FlowMix) strategy to replace the traditional random masking mechanism. This innovative approach constructs challenging cross-flow mixed samples with interferences, compelling the model to learn discriminative representations from distorted tokens. The FlowMix strategy is pivotal in enhancing the model’s ability to generalize across diverse traffic patterns.
Packet-importance Aware Mask Predictor
Furthermore, we design a Packet-importance aware Mask Predictor (PMP) equipped with an attention bias mechanism. This mechanism leverages packet-level side-channel statistics to dynamically mask tokens with high semantic density, ensuring that the model focuses on the most informative parts of the traffic data.
Results
Numerous experiments conducted on various datasets covering encrypted applications, malware, and attack traffic demonstrate that MMAE achieves state-of-the-art performance. The results indicate significant improvements in classification accuracy and robustness against adversarial attacks, showcasing the effectiveness of our proposed methodologies.
Conclusion
The Mean MAE model represents a significant advancement in the field of encrypted traffic classification. By integrating multi-granularity contextual awareness and innovative flow-mixing strategies, MMAE sets a new benchmark for future research and applications in network traffic analysis.
Code Availability
The code for the Mean MAE model is available at the following link: MMAE GitHub Repository.
