Automated discovery of physical laws from observational data is a grand challenge in AI. Current methods, relying on symbolic regression or LLMs, are limited to uni-modal data and overlook the rich, visual phenomenological representations of motion that are indispensable to physicists. This "sensory deprivation" severely weakens their ability to interpret the inherent spatio-temporal patterns within dynamic phenomena.
To address this gap, we propose we propose VIPER-R1, a multimodal model for Visual Induction for Physics-based Equation Reasoning to discover fundamental symbolic formulas.
The model is trained via a curriculum of Motion Structure Induction (MSI), using supervised fine-tuning to interpret kinematic phase portraits and construct hypotheses guided by a Causal Chain of Thought (C-CoT), followed by Reward-Guided Symbolic Calibration (RGSC) to purify the formula's structure with reinforcement learning. During inference, the trained VIPER acts as an agent: it first posits a high-confidence symbolic ansatz, then proactively invokes an external symbolic regression tool to perform Symbolic Residual Realignment (SR²). This final step, analogous to a physicist's perturbation analysis, reconciles the theoretical model with empirical data.
To support this research, we introduce PhysSymbol, a new 5,000-instance multimodal corpus. Experiments show that VIPER-R1 consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability, enabling more precise discovery of physical laws.
Our framework consists of a comprehensive two-stage pipeline designed to emulate the cognitive workflow of physicists in discovering physical laws from visual observations.
Joint generation of Causal Chain of Thought (C-CoT) and initial Symbolic Ansatz from visual evidence and trajectory data.
Refines symbolic formulation by conditioning on ground-truth reasoning chain, focusing on syntax and semantics of physical formalisms.
Uses Group Relative Policy Optimization (GRPO) to refine symbolic hypotheses through a sophisticated reward system:
Ensures adherence to predefined template structure
Parameter-agnostic Jaccard similarity for topological correctness
Binary reward for exact symbolic matches
Agentic refinement process where VIPER-R1 invokes external symbolic regression tools to correct residual errors:
VIPER-R1 significantly outperforms state-of-the-art VLMs on the PhysSymbol benchmark:
56.7% improvement over best baseline (Claude-4-Sonnet)
45.4% improvement over top zero-shot model
3× lower error than best baseline (0.091)
Performance comparison across different SOTA VLMs
Each component of our framework contributes significantly to the overall performance:
Contribution of MSI and RGSC stages
Structural: 0.096 | Accuracy: 0.179
Structural: 0.554 | Accuracy: 0.399
+475% structural improvementStructural: 0.812 | Accuracy: 0.487
+746% total improvement
We examine a challenging system governed by a(t) = -kx - cv³ + η(t)
, combining linear restoring force,
non-linear damping, and stochastic noise. This case demonstrates VIPER-R1's ability to identify and integrate
components with fundamentally different characteristics.
Phase space and trajectory analysis of complex dynamical system
Remarkable Accuracy: The model correctly identifies all three physical components with high quantitative precision
Linear system with velocity damping
Periodic external forcing
Cubic restoring force
Multiple time scales
We introduce PhysSymbol, a comprehensive benchmark containing 5,000 multimodal instances for physics formula discovery. Each instance comprises:
Phase space trajectory analysis
Time series trajectory visualization
Combined multimodal analysis
Dataset will be released upon paper acceptance
@article{liu2025VIPERr1,
title={Mimicking the Physicist's Eye: A VLM-centric Approach for Physics Formula Discovery},
author={Liu, Jiaqi and Lai, Songning and Li, Pengze and Yu, Di and Zhou, Wenjie and Zhou, Yiyang and Xia, Peng and Wang, Zijun and Chen, Xi and Tang, Shixiang and Bai, Lei and Ouyang, Wanli and Ding, Mingyu and Yao, Huaxiu and Wang, Aoran},
journal={arXiv preprint arXiv:2508.17380},
year={2025}
}