AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories
Summary: arXiv:2604.19606v1 Announce Type: new
Abstract: Systematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to domain-specific data and formats. While recent coding agents can translate ideas into implementations, they typically stop at producing code and lack a verifier that can reproduce strong baselines and rigorously test which components truly matter.
We introduce AblateCell, a reproduce-then-ablate agent for virtual cell repositories that closes this verification gap. AblateCell first reproduces reported baselines end-to-end by auto-configuring environments, resolving dependency and data issues, and rerunning official evaluations while emitting verifiable artifacts. It then conducts closed-loop ablation by generating a graph of isolated repository mutations and adaptively selecting experiments under a reward that trades off performance impact and execution cost.
Key Features of AblateCell
AblateCell has several innovative features that set it apart in the field of AI Virtual Cells:
- End-to-End Baseline Reproduction: The tool automates the process of reproducing reported baselines, ensuring that experimental results are reliable and verifiable.
- Dependency Resolution: AblateCell effectively resolves dependency and data issues, allowing for seamless environment configuration.
- Verifiable Artifacts Emission: The agent emits artifacts that can be verified, providing transparency and credibility to the results.
- Closed-Loop Ablation: AblateCell generates a graph of isolated mutations, allowing for a systematic exploration of which components contribute most significantly to performance.
- Adaptive Experiment Selection: The agent selects experiments based on a reward system that balances performance impact with execution costs, optimizing resource utilization.
Performance Evaluation
In rigorous testing, AblateCell was evaluated on three single-cell perturbation prediction repositories: CPA, GEARS, and BioLORD. The results demonstrated its effectiveness:
- End-to-End Workflow Success: Achieved a success rate of 88.9%, which is an impressive 29.9% improvement over human expert performance.
- Accuracy in Recovering Critical Components: Attained a remarkable accuracy of 93.3%, surpassing heuristic methods by 53.3%.
Conclusion
The introduction of AblateCell represents a significant advancement in the field of AI Virtual Cells. By providing a robust framework for reproducing and ablation testing, it addresses the critical need for standardized verification processes in biological repositories. The ability to scale repository-grounded verification and attribution directly on biological codebases will likely catalyze further innovations in the field, ultimately enhancing the reliability and applicability of AI in biological research.
