Edge AI for Livestock Monitoring Using SAM 3 & DINOv3

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

Recent advancements in precision livestock farming (PLF) have been significantly enhanced by the development of foundation-model pipelines that leverage open-vocabulary detection, promptable video segmentation, and self-supervised visual embeddings. However, one major challenge remains: the GPU memory requirements of these models often exceed the capabilities of standard edge accelerators. A new study aims to address this issue by distilling the impressive capabilities of the 446M-parameter Perception Encoder (PE-ViT-L+) from the SAM 3 framework into a more manageable 40.66M-parameter model suitable for edge deployment.

Distillation Mechanisms

The distillation process involves three innovative mechanisms:

Feature Pyramid Network Student Encoder: Built on the TinyViT-21M-512 architecture, this encoder allows for efficient multi-scale processing.
Four-Term Direction-Then-Scale Distillation Loss: This novel loss function aids in refining the student model’s learning process.
Backbone-Substitution Inference: Utilizing sliding-window session pruning, this method effectively manages GPU memory usage, ensuring that the model operates within feasible limits during deployment.

DINOv3 Integration

The research also incorporates elements from the DINOv3 model family, specifically the pre-distilled ViT-S/16 variant, which contains 21.6M parameters. This variant is paired with a significantly larger 6716M-parameter ViT-7B teacher model. The smaller ViT-S model serves as the embedder for individual animals, facilitating precise monitoring of their behaviors.

Performance Metrics

When tested on the Edinburgh Pig dataset, the newly compressed pipeline demonstrated remarkable performance:

MOTA: Achieved 92.29%, closely trailing the SAM 3 teacher.
IDF1: Reached 96.15%, maintaining a robust level of identification accuracy.
System-Level Parameter Reduction: The model achieved a 7.77-fold reduction in parameters compared to the original, making it more efficient for edge deployment.
Peak VRAM Usage: Reduced from 19.52GB to 6.49GB, demonstrating significant optimization for edge computing environments.
Top-1 Accuracy: Attained 97.34% with a macro-F1 score of 91.67% across nine classes of pig behavior.

Edge Compatibility and Future Implications

The distilled model is designed to fit comfortably within the constraints of an NVIDIA Jetson Orin NX 16GB system, allowing for a headroom of 4.9GB. This configuration supports a proposed, although not yet empirically validated, on-device embedding-pool re-identification mechanism. This mechanism is projected to create a longitudinal visual record with an individual footprint of approximately 94MB per animal per year. Such a record could prove invaluable for retrospective analyses related to disease, lameness, reproductive issues, and growth outcomes.

In summary, the distillation of SAM 3 and DINOv3 not only enhances the feasibility of individual-level livestock monitoring on edge devices but also opens new avenues for longitudinal visual analytics in precision agriculture. As this technology continues to evolve, it holds the potential to transform livestock management practices significantly.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Edge AI for Livestock Monitoring Using SAM 3 & DINOv3

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

Distillation Mechanisms

DINOv3 Integration

Performance Metrics

Edge Compatibility and Future Implications

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related