CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities
Summary: arXiv:2603.26425v1 Announce Type: cross
Recent advancements in vision backbone architectures have largely concentrated on enhancing efficiency for hardware platforms characterized by substantial parallel processing capabilities. This trend is increasingly applicable to embedded systems, such as mobile phones and embedded AI accelerator modules. However, CPUs, which are unable to parallelize operations to the same extent, necessitate a unique design philosophy. This philosophy is focused on balancing the volume of operations (multiply-accumulate operations, or MACs) with hardware-efficient execution, aiming for a high number of MACs per second (MACpS).
Research Focus
In our research, we delve into two significant modifications of standard convolutions, which are pivotal for reducing computational costs:
- Grouping Convolutions: This technique effectively reduces the complexity of operations by dividing the input into smaller, manageable groups.
- Reducing Kernel Sizes: Smaller kernels imply fewer parameters and computations, which contribute to lower resource usage without significantly compromising performance.
Findings
Both adaptations result in a considerable decrease in the total number of MACs needed for inference. However, it is essential to maintain low latency while ensuring hardware efficiency. Our experimental evaluations across a variety of CPU devices demonstrate that these modifications successfully uphold high levels of hardware efficiency.
Introduction of CPUBone
Based on the insights garnered from our investigations, we are proud to introduce CPUBone, a novel family of vision backbone models specifically optimized for CPU-based inference. CPUBone stands out by achieving state-of-the-art Speed-Accuracy Trade-offs (SATs) across a diverse array of CPU devices. Moreover, it effectively translates its efficiency to downstream tasks, including:
- Object Detection
- Semantic Segmentation
Conclusion and Availability
CPUBone is designed to leverage the unique capabilities of CPUs, thereby offering an efficient solution for vision-based applications in environments where parallelization is limited. The models and the corresponding code can be accessed at the following link: CPUBone GitHub Repository.
This innovative approach not only enhances the performance of vision tasks on CPU-based platforms but also opens new avenues for research and development in the field of computer vision, particularly in resource-constrained environments.
