This is an idea that has been sitting on my hard drive for a few months. I like it enough to finally share it.
I won’t pretend AI hasn’t been involved in helping shape the presentation. What I do claim is that the core idea and its overall structure (a rough blueprint for a human–AI discovery loop) is well beyond what current AI systems can generate on their own.
This is not a text designed for quick consumption. It’s dense, and probably uneven in places. Still, I’d be surprised if there weren’t at least a few people here for whom the underlying idea resonates, even if the presentation itself leaves room for improvement.
If the text in any way inspires someone, or serves as an interesting read, that alone would make sharing it worthwhile.
Executive Summary
What if our fundamental physics equations aren't THE laws of nature, but merely ONE good compression among many? This paper proposes training neural networks to discover physical laws by treating physics as a data compression problem: find the most computationally efficient rules that predict experimental outcomes within measurement uncertainty. Unlike existing automated discovery systems that search for equations matching human physics, our framework might reveal that F=ma is suboptimal, that we've chosen the wrong fundamental units (why m/s instead of s/m?), or that radically different mathematical frameworks compress nature more efficiently. By maintaining multiple valid compressions optimized for different contexts—just as we keep Newton despite having Einstein—the system acknowledges that physical laws are supremely useful correlations with defined domains, not metaphysical truths. Early validation on classical mechanics could lead to computationally revolutionary reformulations of quantum field theory, or even reveal why human physics took its particular historical path when equally valid alternatives existed all along. In this paper the term optimal is meant to be read as aspirational, rather than as a goal that will or even could be reached.
Abstract
We propose a framework for automated discovery of physical laws by treating physics as an optimal compression problem under experimental constraints. By training a neural network to predict masked experimental measurements while minimizing both computational complexity and experimental cost, we aim to discover whether our current formulation of physics represents the unique description of nature or merely one of many possible compressions of empirical data. The system uses a hierarchical data structure combining real experimental measurements with simulated data, allowing it to identify regions where current theories fail. Beyond philosophical implications, this approach could discover computationally efficient reformulations of known physics and identify promising directions for new experiments.
1. Introduction
Are Maxwell's equations, Einstein's field equations, and the Schrödinger equation THE laws of physics, or merely ONE way to compress our observations of reality? This question, long relegated to philosophy, may now be answerable through machine learning.
Consider two provocative possibilities:
An AI trained on raw experimental data might discover that calculating planetary orbits—currently requiring numerical integration of differential equations—has a direct algebraic solution we've missed.
The same AI might arrive at string theory before discovering general relativity, or find an entirely different unified framework that makes both look like special cases.
The String Theory Paradox
Consider a thought experiment: General relativity can be derived from string theory, making string theory as experimentally verified as relativity itself. If an AI system approached physics without our historical path dependence, which would it discover first?
This question illuminates a profound possibility: our current formulation of physics may represent not THE unique description of nature, but merely one branch in a vast space of equivalent compressions. Different theoretical frameworks - some perhaps more elegant, others more computationally efficient - might provide identical predictions within experimental uncertainty while offering radically different conceptual foundations.
We aim for insight into different laws, and better laws. We aspire to point in the direction of new laws. We train in the search for optimal laws, however we do so in full knowledge that these will, in all likelihood, never be reached, the way we define them. Perhaps we lack mathematical tools, and the combinatorial possibilities might as well be infinite. We do believe, however, that iterating towards “optimal”, may lead us to “good”.
Our proposal
We propose a framework where physics emerges as the solution to an optimization problem: find the most computationally efficient compression of experimental data that preserves predictive power within measurement uncertainty. This recasts the development of physics not as discovering "true" laws, but as finding optimal representations under specific constraints.
1.1 A Fundamental Design Choice: Guided Discovery vs. Tabula Rasa
The framework faces a crucial structural choice: Do we explicitly ask the AI to predict specific quantities from experiments (e.g., "What is the force?"), or do we let it observe experimental data and discover whatever patterns it finds compressible?
**Guided Discovery Approach** (analogous to AlphaGo):
- Present experiment: masses, positions over time
- Ask: "Predict the force at time t"
- Biases toward human conceptual frameworks
- Computationally simpler, faster convergence
- Useful for validating the framework
**Tabula Rasa Approach** (analogous to AlphaGo Zero):
- Present experiment: masses, positions over time
- Ask: "What patterns can you compress from this data?"
- AI might discover:
- Mean speed (Δx/Δt) or its inverse "timeliness" (Δt/Δx)
- Position as function of time x(t) or time as function of position t(x)
- Entirely alien quantities we haven't named
- No human bias about which quantities matter
- Computationally demanding but maximizes discovery potential
For clarity of exposition, this paper often assumes the guided approach when presenting technical details. However, the tabula rasa approach represents the ultimate ambition - discovering not just new laws but new quantities, new concepts, and new ways of parsing reality.
Just as AlphaGo Zero surpassed AlphaGo by abandoning human knowledge, a physics AI starting from scratch might discover compressions that make our concepts of "force," "energy," and "momentum" look like historical accidents rather than fundamental features of nature.
If trained correctly, it would certainly realize that force (or an equivalent representation) can be deduced from mass and time/position data. However: How central will such a representation be to the AI's framework? Will it use force as a fundamental building block, as we do, or will it treat the concept of force as a mere parenthetical observation while using entirely different concepts as the foundation for unification?
This could produce a physics as alien to human understanding as AlphaGo Zero's strategies are to human Go players. Just as AlphaGo Zero routinely bypasses patterns that human masters have deemed significant for millennia, a tabula rasa physics AI might build its theoretical edifice on conceptual foundations we've never imagined, treating our "fundamental" quantities as peripheral consequences of deeper patterns.
The framework thus offers not just automated discovery of laws, but the possibility of genuinely alien physics - empirically equivalent to ours, yet conceptually unrecognizable.
This paper mainly fleshes out concepts as if the analogue to AlphaGo is the starting point (asking for specifying pre-chosen parameters directly). However, if one wants to try the more ambitious zero-approach from the start, we would find this perhaps even more exciting (higher risk, higher reward).
A Hybrid Approach:
A promising compromise would be to both ask for specific parameters AND let the AI discover what other questions can be answered within an experiment:
- Present experiment: masses, positions over time
- Required task: "Predict the force at time t"
- Open exploration: "What else can be deduced from this data?"
This hybrid approach offers several advantages:
- Ensures the AI learns human physics (maintaining compatibility with our knowledge)
- Provides clear benchmarks for validation (can it predict force correctly?)
- Still allows discovery of alien concepts and alternative compressions
- May reveal that our "required" predictions are suboptimal entry points
The AI might discover that while it can compute force as requested, a different quantity (perhaps something like "interaction potential gradient") makes subsequent calculations trivial and unifies disparate phenomena more naturally. This would be analogous to discovering that while we can convert between Fahrenheit and Celsius, Kelvin is the more fundamental temperature scale.
This hybrid approach reduces computational demands while preserving the potential for revolutionary discoveries – maybe the perfect starting point. Practically speaking:
For initial validation: Build on proven systems (AI Feynman, PySR) to demonstrate feasibility
For maximal potential for true discovery: Develop tabula rasa approaches that avoid human physical biases
The hybrid approach outlined above could very well make use of previous proven systems as a conscious, pragmatic starting point.
2. Core Framework
2.1 Physics as Compression
Our central insight treats physical laws as compression algorithms for experimental data. Given a set of measurements with uncertainties, the goal is to find minimal rules that can predict masked values within their error bars.
The system optimizes for:
Predictive accuracy: Rules must predict missing values within measurement uncertainty
Computational efficiency: Simpler calculations are preferred (e.g., F=ma over relativistic formulations when v<<c)
Generality: Unified theories covering multiple domains score higher than collections of special cases
Experimental parsimony: The cost of experiments scales with both precision and how far parameters lie outside typical ranges
These optimization criteria exist in fundamental tension, creating a Pareto frontier rather than a single optimal solution. Physical laws occupy different points along this frontier: Newton's laws achieve maximal compression with bounded accuracy, while quantum field theory trades compression for precision. The framework naturally maintains multiple representations along this frontier, each optimal for its specific context and requirements. This multi-objective nature makes 'optimal compression' inherently context-dependent rather than absolute.
2.2 Key Innovation: Hierarchical Truth
We employ three tiers of training data:
Real Experimental Data (Gold Standard)
Actual laboratory measurements with irreducible uncertainties
Sparse but represents ground truth
Weighted most heavily in optimization
Simulated Data (Current Theory)
Generated from our best current physics
Abundant and covers wide parameter ranges
May be systematically wrong at fundamental level or edge cases
Divergence Zones (Discovery Targets)
Regions where real and simulated data consistently disagree
These gaps indicate either theoretical limitations or new physics
AI specifically targets these zones when requesting experiments
Example: Quantum-Classical Boundary The AI might identify systematic divergences between quantum mechanical predictions and classical measurements in mesoscopic systems. Rather than treating this as noise, the system would recognize this as a high-value experimental regime where:
Current theory transitions between frameworks
New experiments could reveal intermediate descriptions
This hierarchy allows the system to bootstrap from current knowledge while remaining capable of discovering where that knowledge fails.
3. Technical Implementation
This section provides both general principles and specific examples of technical implementation. We emphasize at the outset:
1) We do not intend to solve all implementation challenges in this paper - our goal is aspirational rather than fully architectural.
2) Where we do provide specific implementations, these are by no means prescriptive. They serve as proof-of-concept examples, demonstrating the feasibility of embarking on a project with the ambitions laid out in this paper.
Alternative approaches are likely to prove superior, and we encourage exploration of different technical solutions within the framework's conceptual structure.
3.1 Architecture
The system consists of:
Neural network with access to symbolic mathematics operations
Parameter range constraints based on physical realizability
Rule Formulation Structure
To ensure interpretability and systematic theory construction, all discovered rules must conform to a structured format:
Rule Template: "If conditions {A, B, ..., N} apply, and we know parameters {a, b, ..., n} with uncertainties {σ_a, σ_b, ..., σ_n}, then missing parameter m_p can be deduced through operations f(a, b, ..., n) with uncertainty σ_mp."
3.1.1 Neural Architecture Specifications
The system employs a hybrid architecture combining pattern recognition with symbolic reasoning:
- Available tokens limited to experiment parameters and basic operations
- No redundant forms (the grammar prevents a+a+a, enforcing 3*a)
This reduces the search space by many orders of magnitude compared to natural language. While a LLM might choose from 50,000+ tokens at each position, our system selects from perhaps 10-20 valid continuations, making the problem tractable despite the combinatorial nature of symbolic expressions.
The token-by-token generation approach (set first token, set second token, ..., set last token) leverages the same sequential decision-making that makes LLMs successful, but in a dramatically constrained space where physics provides the grammar.
**Discovery Through Missing Parameters**:
When outputting [CANNOT_DETERMINE], the system can additionally suggest what parameters might be missing:
- "Cannot determine period without: mass of pendulum"
- "Cannot determine force without: acceleration"
- "Cannot determine energy without: velocity"
Reward structure:
- If [CANNOT_DETERMINE] + missing parameters is correct: Reward (the system identified a genuine limitation)
- If [CANNOT_DETERMINE] was wrong (a rule exists with available parameters): Penalty in loss function
- This incentivizes the system to deeply understand what information is truly necessary for each calculation
This mechanism could lead to profound discoveries - the AI might identify that certain parameters we consider fundamental are actually derivable, or that parameters we've never considered are essential for complete descriptions.
3.1.3 Combinatorial Explosion Management
The system controls symbolic expression growth through adaptive length penalties:
**Length Penalty Schedule**:
L_penalty(epoch, length) = α(epoch) × length^β
Where:
- α(epoch) = α_0 × decay^(epoch/1000) - starts high, decays over training
- α_0 = 0.1 (initial penalty weight)
- decay = 0.5 (halves every 1000 epochs)
- β = 1.5 (superlinear penalty for very long expressions)
**Beam Search with Pruning**:
- Maintain top-K expressions during generation (K=10)
- Begin rewarding unification of disparate phenomena
Late Training (epochs 5000+):
- High ε (0.4): Push toward unified theories
- High ζ: Encourage experimental exploration (see Section 3.7)
- Full mathematical toolkit available
- Focus on discovering alternate formulations
**Optimization Details**:
- Batch size: 256 experiments per update
- Learning rate: 3e-4 with cosine annealing
- Gradient clipping: 1.0 for stability
- Entropy bonus: 0.01 to encourage exploration
- Update frequency: Every 2048 expression generations
**Checkpoint Strategy**:
- Save models that discover known physics milestones (even if not told these exist)
- Save models that achieve breakthrough compression ratios
- Maintain diverse population of models finding different valid compressions
**Handling Multiple Valid Compressions**:
The framework treats compression as a powerful lens for understanding how physical theories organize empirical regularities. Rather than seeking a single "correct" formulation, it maintains a "physics portfolio" of discovered rules where:
- Rules compete based on context-specific performance
-Different checkpoints excel in different domains
- Multiple representations coexist, each optimal for specific calculations
- This aligns with Section 7.5's philosophy of multiple representations
5. **Throughout**: Interleave simple and complex to prevent forgetting
**Success Indicators During Training**:
- Spontaneous discovery of known relationships (without being told)
- Decreasing computational cost for equivalent predictions
- Emergence of hierarchical rule structures
- Successful predictions in held-out experimental domains
3.2 Training Protocol
Present experimental datasets with multiple measured quantities
Randomly mask N values per experiment
AI develops rules to predict masked values
Reward/penalty structure Score =
α·(Predictive Accuracy)
+β·(Number of Rules – Number of surpassed rules)
- γ·(Computational Complexity)
- δ·(Uncertainty Bar Size)
+ ε·(Unification Bonus)
+ ζ·(Experimental Value)
+κ·(Experiment Documentation)
For a discussion on Experimental Value see section 3.4 and 3.5. For Experiment Documentation see section 3.6. We only want to keep track of valid rules, not rules that are just worse (e.g. smaller ranges with no advantage, larger computational costs etc.). We do however want to promote equally good representations, if found. If a unification is computationally easier than existing rules, all existing rules are scrapped. Otherwise superior computational rules are kept, alongside the unification.
3.2.1 Experiment Composition and Knowledge Retention
Critical to successful training is the careful composition of experiments presented each epoch:
**Base Experiment Set M**: Each epoch's N experiments include a curated subset M that necessitates handling all physics rules relevant to the current training stage. This ensures the AI keeps, refines, and generalizes successful compressions rather than discarding them.
**Randomized Construction**:
- Base set M is reconstructed each epoch by randomly sampling k experiments for each physical law
- Example: For F=ma, randomly select k=5 collision experiments from a pool of 50+ variants each epoch
- M expands to: oscillations, waves, thermodynamics, basic quantum phenomena
- Size: M ≈ 70% of N, as more physics must be retained
- **Late epochs**:
- M covers: full classical mechanics, E&M, quantum mechanics, relativity
- Size: M ≈ 65% of N, balanced with need for discovering unifications
**Ratchet Mechanism**: As new valid compressions are discovered, experiments requiring them are added to future M sets. This creates a one-way ratchet where good discoveries are automatically preserved in the training curriculum.
**Example Base Set (N=100 experiments per epoch)**:
- 5-10% of base set M consists of "curve ball" experiments
- These combine known physics in unusual ways or extreme conditions
- Examples:
- Pendulum in viscous fluid (combines mechanics + fluid dynamics)
- Charged particle in combined E&M fields at relativistic speeds
- Phase transitions under extreme pressures
- Purpose: Create opportunities for discovering unifying principles
- Selected from regions where different physics domains interact
- May lead to discovering more elegant unified compressions
This entire design ensures no "catastrophic forgetting" - the AI cannot achieve high rewards without maintaining compressions that handle the base set effectively.
3.3 Computational Efficiency Rewards
The system explicitly rewards computational shortcuts:
Using x instead of sin(x) for small angles when uncertainty permits
Employing Newtonian mechanics instead of relativity when v<<c
Finding closed-form solutions to problems we solve numerically
Example: Simple Pendulum Consider a pendulum with small angular displacement θ. The exact period is:
T = 2π√(L/g) × [1 + (1/16)θ² + (11/3072)θ⁴ + ...]
For experimental uncertainty of 1%, the system should discover (unless it finds something even better):
If θ < 0.24 radians: T = 2π√(L/g) suffices
If θ < 0.66 radians: Include first correction term
Beyond this: Full series or numerical integration
3.4 Experimental Request System
3.4.1 Cost of Experimental Requests
The AI can request new experiments with costs scaling as:
Cost ∝ 1/(uncertainty)^n (higher precision exponentially more expensive)
* Thermodynamics data (gas laws, phase transitions, heat transfer)
* Electromagnetic observations (circuit behavior, field measurements)
* Modern physics experiments (spectroscopy, particle physics, relativistic effects)
- Each entry contains: experimental conditions, measured values, uncertainties, metadata
- Internal Database grows as humans in the loop see a need
- Internal Database grows through two pathways when querying external sources:
* **Automatic ingestion**: Data from gold-standard sources (e.g., NIST, Particle Data Group, peer-reviewed data papers) automatically added after format conversion
* **Human-reviewed ingestion**: Data from broader sources (e.g., preprints, older papers, less standardized databases) flagged for human review before inclusion
- Query existing experimental databases (arXiv, particle data group, etc.)
- Check if requested data already exists
- Flag partial matches for human review
3. **Human-in-the-Loop Protocol**:
- Feasibility assessment by domain experts
- Cost estimation from actual laboratories
- Safety and ethics review for extreme parameters
- Alternative experiment suggestions
Future Directions for Experimental Automation:
As the framework matures, deeper integration with experimental systems could include:
Automated experimental design using the information-theoretic principles outlined above
Real-time feedback loops between discovery algorithms and experimental apparatus
Active learning approaches that dynamically adjust experimental parameters based on emerging compressions
Direct control of laboratory equipment for rapid hypothesis testing
These advances would transform the system from requesting experiments to actively conducting them, accelerating the discovery cycle. For any of the above, strict protocols for human-in-the-loop would be implemented for safety.
4. **Experiment Queuing System**:
- Priority queue based on IG(e)/C(e) × feasibility_score
- Batch similar experiments for efficiency
- Track experiment status and update predictions upon completion
5. **Result Integration Pipeline**:
- Automated data cleaning and validation
- Uncertainty quantification including systematic errors
- Update compression models with new data
- Flag unexpected results for deeper investigation
3.4.4 Example in depth: Discovering More Optimal Pendulum Laws
Starting with raw pendulum data (length, angle, period measurements):
Epoch 1: System discovers T ≈ 2π√(L/g) with 5% error
- Requests: Higher precision timing at various angles
- Elliptic integral formulation: Works for all angles but computationally expensive
- Requests: Ultra-precise measurements at θ = 45° to distinguish formulations
- Cost: High (10^-6 second precision required)
Budget allocation: 60% on distinguishing experiments, 30% on boundary testing, 10% on exploring damped oscillations (exploration reserve)
3.4.5 Integration with Scientific Practice
The framework is designed to complement, not replace, human scientific inquiry:
- Discovered compressions output in standard formats (LaTeX, SymPy, Mathematica)
- Rule cards provide worked examples for human understanding
- Discovery indicators flag high-priority experiments for human investigation
- Can run alongside traditional research, suggesting new directions
Scientists interact through rule review, pre-planned database expansion as allowed mathematical operations increases, experiment prioritization, and interpretation of alien compressions - maintaining human insight while leveraging AI's unbiased exploration.
3.5 Exploration Incentive Structure
As new experiments are ordered, Predictive Accuracy will grow. To prevent convergence on local optima and encourage paradigm-shifting discoveries:
Limit Testing your theory: If you can test your rules at scales previously untested, this is rewarded.
Distinction testing: If you have two rules, that can be used to make the same predictions, are there experimental conditions where they may diverge? Finding such an experiment is highly rewarded.
Real-world confirmation: The system is rewarded each time it makes use of real-world data to verify its rules. This is the gold standard, and no rule limits and veracity can be viewed as complete without testing against real-world data. Specifically:
Exponentially higher rewards for validation against real experimental data vs. simulated data
Bonus rewards for requesting experiments that distinguish between competing theories
Special rewards for identifying experiments where current simulations fail
Progressive reward scaling: higher rewards for experiments testing more fundamental assumptions
Prediction testing your theory: If your theory on gravity seem to imply there ought to be limit objects like black hole, finding such objects will be highly rewarded.
Bold Hypothesis Bonus: Experimental requests testing fundamentally different theoretical frameworks receive credit proportional to their divergence from established rules
Boundary Violation Flags: When the AI requests experiments outside our simulation capabilities, these are logged as "Discovery Indicators" - high-priority targets for human experimental physics
3.6 Enhanced Rule Documentation Requirements
Each discovered rule must be accompanied by a comprehensive "rule card" containing:
1. Experimental Domain Mapping
List of experiment types this rule can predict
Parameter ranges where the rule maintains validity
Precision bounds for different parameter regimes
2. Worked Examples (minimum 3 per experiment type)
Forces interpretability: The AI must explain its reasoning to get full reward
Natural curriculum: Rules handling more experiment types are more valuable
Built-in validation: The examples become test cases
Enables human learning: Physicists can study the worked examples to understand new mathematical approaches
Example Rule Card:
Rule: Energy-momentum relation in special relativity
Experiment types: particle accelerator collisions, cosmic ray detection, synchrotron radiation
Valid ranges: v > 0.1c, rest mass > 0
Example 1: Electron at 0.9c → E = 2.56 MeV (worked calculation shown)
Example 2: Proton collision → momentum conservation (full derivation)
Example 3: Synchrotron energy loss → radiation power (step-by-step)
Breaks down: v << c (use E = ½mv²), massless particles (use E = pc)
This approach ensures that compression leads to clarity, not obscurity. The AI is literally rewarded for being a good teacher!
Whenever a rule is scrapped it gets saved outside the system, with a rules-number. The new replacement rule will also be added, making it computationally clear why the rule was scrapped. That way we could perhaps find natural laws we use, and see why the AI chose to scrap them.
3.7 Dynamic Meta-parameter Adjustment
To prevent convergence on local optima, the system employs adaptive meta-parameter tuning:
Stagnation Detection: Monitor rate of improvement in predictive accuracy and rule unification
Dynamic Reweighting: When progress stalls, automatically adjust:
Increase ζ (experimental value weight) to encourage exploration
Increase γ (computational complexity penalty) to force search for alternative formulations
Temporarily reduce β (number of rules penalty) to allow theoretical proliferation before consolidation
Real-time Adjustment Protocol:
If improvement_rate < threshold for N epochs:
ζ → ζ × (1 + exploration_boost)
γ → γ × (1 + complexity_boost)
β → β × (1 - diversity_allowance)
This creates a "simulated annealing" effect in theory space, preventing premature convergence while maintaining long-term pressure toward elegant unification.
4. Expected Outcomes
4.1 Computational Discoveries
Direct formulas for calculations currently requiring iteration
Alternative representations where "hard" problems become tractable
Optimal approximation hierarchies based on required precision
4.2 Theoretical Insights
Whether our formulation of physics is unique or contingent
Which theoretical structures are "natural" vs historical artifacts
Unified formulations we've missed due to path dependence
4.3 Experimental Guidance
Precise identification of where current theories fail
Optimal experiments to distinguish between competing formulations
Unexpected parameter combinations that probe new physics
Discovery Indicators: Experiments requested beyond simulation capacity represent potential breakthrough directions for human physics
Theory-Space Exploration: Requests testing qualitatively different frameworks (e.g., discrete vs. continuous spacetime) flagged for special attention
5. Research Program
Phase 1: Proof of Concept
Rediscover Newton's laws from mechanical measurements
Demonstrate emergence of computational approximations
Validate hierarchical theory construction
Phase 2: Classical Physics
Thermodynamics and statistical mechanics
Electromagnetism from field measurements
Identify any alternative formulations
Phase 3: Modern Physics
Quantum mechanics from spectroscopic data
Relativity from precision measurements
Search for unified descriptions
Phase 4: Frontier Physics
Standard model parameters
Identify divergence zones indicating new physics
Generate testable predictions beyond current theories
6. Implications
6.1 Practical Impact
Reduce computational costs in physics simulations by orders of magnitude
Automated discovery of efficient approximations
Optimal experimental design for maximum information gain
6.2 Foundational Questions
Are physical laws unique or contingent?
What role does computational tractability play in the structure of physics?
Can AI discover genuinely new physics from data?
6.3 Broader Significance
This framework could revolutionize how we approach scientific discovery, moving from human intuition to optimal information-theoretic principles. If successful, it would demonstrate that the deepest questions about the nature of physical law are not merely philosophical but empirically addressable.
6.4 Convergence and Multiplicity
A critical question this framework addresses: Will independent runs converge on identical physical laws, or will we observe multiple, equally valid theoretical compressions?
Formulation details (coordinate systems, mathematical representations) may vary significantly
The "magic" of machine learning convergence might reveal whether physics has a unique optimal compression or admits multiple equivalent descriptions
This multiplicity would fundamentally reshape our understanding of physical law from discovered truth to optimal representation under constraints.
6.5 Validation Metrics and Success Criteria
Absolute Minimum Benchmark: The system must achieve parity with human physics:
Predict all phenomena currently predictable by human theories
Match or exceed computational efficiency of current methods
Maintain same or better uncertainty bounds
Graduated Success Metrics:
Reconstruction Fidelity (Baseline)
Reproduce known laws: F=ma, Maxwell's equations, Schrödinger equation
Computational complexity ≤ current methods
Efficiency Gains (Intermediate)
Discover computational shortcuts for ≥10% of standard calculations
Identify unified formulations reducing theory count by ≥20%
Novel Insights (Advanced)
Predict phenomena in identified divergence zones
Propose experiments yielding surprising results
Find alternative formulations with radically different ontologies
Paradigm Shift (Ultimate)
Discover framework unifying quantum mechanics and general relativity
Reduce computational complexity of fundamental calculations by orders of magnitude
Reveal why current formulations emerged from deeper principles
Comparative Benchmarks:
Time to rediscover Kepler's laws: < 1000 training epochs
Time to unify electricity and magnetism: < 10,000 epochs
Computational efficiency relative to humans: ≥ 1.0x for all domains
Novel Discovery Validation Protocol:
When the AI discovers compressions without human precedent:
- Test on held-out experimental domains never seen during training
- Require successful prediction of 10+ independent phenomena
- Human experts attempt to find flaws or limitations
- Only accepted after real-world experimental validation
- Success can be measured by: Does it make previously hard calculations easy? Does it unify previously separate phenomena? Does it suggest new experiments that yield surprising results? Is it a stepping stone used in other formulas that ARE of clear interest to us (analogous to how complex numbers are important for quantum mechanics and for solving certain integrals, among other things)?
7. Addressing potential reasons for concern
7.1 The validation paradox and mitigation strategies
In this proposal most "experiments" will be simulated experiments. If humanity thinks we have the full picture, when what we have is constrained to limits we have not realized, the AI may request experiments and get patently false results. However:
The more sophisticated the AI gets, the more capable it will be to ask for real-world experiments, within certain limits on parameters. This could be both automated (searching in research databases), and human in the loop (asking if a handler can find something). Keep in mind: Real data is treated as THE truth, while simulated data is mostly of indicative value.
Add these additional mitigation strategies:
Historical validation protocol: The system can be tested on historical data where we know both the "before" and "after" states of physics understanding. For example, training only on pre-1900 data and seeing if it discovers relativity or quantum mechanics.
Cross-domain validation: Rules discovered in one domain must successfully predict phenomena in unrelated domains without additional training, reducing the risk of overfitting to simulation artifacts.
Anomaly amplification: The system should be specifically rewarded for identifying where real and simulated data diverge, as these divergence zones represent the highest-value targets for new physics.
Simulation uncertainty quantification: All simulated data should include explicit uncertainty bounds reflecting our confidence in the underlying theory. The AI must propagate these uncertainties through its reasoning.
Independent verification requirement: No compression is considered "complete" until it has been validated on real-world data that was not available during the discovery phase.
7.2 Note on Kolmogorov Complexity
We acknowledge that Kolmogorov-optimal compression is formally uncomputable. While our framework is designed to continually reward reductions in algorithmic complexity, we fully recognize this is an open-ended pursuit — there is no definitive endpoint. In practice, we rely on computable proxies such as symbolic expression length, predictive accuracy, and computational cost to guide compression toward more efficient formulations.
The term “optimal” is thus aspirational. What we seek for any given law is not literal minimality, but a practical and interpretable tradeoff within a bounded model class and empirical dataset.
For a clearer picture of what we actually aim to achieve, see the Validation Metrics and Success Criteria in Section 6.5. While certainly ambitious, they fall well short of unattainable perfection — and instead ground the framework in meaningful, achievable benchmarks.
While future implementations will determine which proxies best capture physical compression, potential approaches include:
Symbolic expression length (simplest but dimension-dependent)
MDL variants (theoretically grounded but model-class dependent)
The framework deliberately avoids prescribing specific proxies, recognizing that the optimal choice may vary by domain and that future research will likely discover more effective measures.
7.3 The correlation versus causation distinction in physics
A common critique of pattern-finding systems is that they might mistake correlation for causation, discovering spurious relationships like "ice cream sales predict drownings." However, this concern fundamentally misunderstands what physical laws actually represent.
Physical laws are, at their core, extremely reliable correlations with well-defined domains of validity. Consider:
F = ma is a correlation between force and acceleration that holds when mass remains constant. The "deeper" formulation F = dp/dt simply pushes the correlation back one level—it remains a pattern we observe, not a metaphysical truth.
Hooke's Law (F = -kx) correlates force with displacement, but only within elastic limits. Despite being "just a correlation," it remains one of physics' most useful laws.
The Ideal Gas Law (PV = nRT) correlates pressure, volume, and temperature under specific constraints (low density, high temperature). Its limited scope doesn't diminish its status as a fundamental law.
The distinction between "correlation" and "physical law" is not fundamental but pragmatic. A correlation becomes a law when it achieves:
Broad domain of validity: The correlation holds across many conditions
High predictive power: It successfully predicts unmeasured phenomena
Optimal compression: It describes observations more compactly than listing them
Experimental robustness: It survives falsification attempts
Within our framework, the AI discovering that "for fixed-shape objects, V ∝ A^(3/2)" represents a valid law with appropriate constraints. This isn't a failure to find "true" causation—it's physics working as intended. Similarly, even "drownings correlate with ice cream sales" could be a valid law if it achieved sufficient compression and predictive power within a specified domain (summer months, beach communities).
The compression framework naturally handles this distinction: spurious correlations fail to compress well due to numerous exceptions and limited domains. True physical laws are simply those correlations that achieve optimal compression while maintaining predictive power. The system's emphasis on falsification through real-world experiments ensures that only robust correlations survive.
Rather than solving an unsolvable philosophical problem (what is "true" causation?), our framework acknowledges what physics has always done: finding the most useful, compact, and predictive correlations within specified domains. The AI's discovery of multiple valid formulations—F = ma for constant mass, F = dp/dt for variable mass—represents feature, not bug. Different compressions optimize for different contexts and available information, exactly as human physics does.
Also: Consider that our units in physics, and even the SI-units themselves are arbitrarily chosen. This goes into our next section:
7.4 Discovering fundamental representations beyond human conventions
Our framework may reveal that many "fundamental" aspects of physics are artifacts of human choice rather than natural necessities. Consider:
Arbitrary unit relationships: We express velocity as m/s, but s/m (time per unit distance) could be equally valid. The AI might discover that certain problems compress better using "timeliness" (s/m) rather than speed—particularly in contexts like traffic flow or wave propagation where time-to-traverse a fixed distance is more natural than distance-per-time. In human contexts, there is an interesting example: The US uses miles per gallon, while Europe uses litre per km.
Selective derivatives: Human physics privileges certain derivatives—velocity (dx/dt), acceleration (d²x/dt²)—while ignoring others. Why not dt/dx (time rate per unit distance) or higher-order derivatives? The AI might discover that d³x/dt³ (jerk) or even fractional derivatives provide more compact descriptions for certain phenomena. Our calculus itself may be just one possible mathematical framework.
Dimensional choices: We treat length, mass, and time as fundamental, but the AI might discover that energy-momentum-action or entirely different dimensional bases provide more natural compressions. Perhaps what we call "fundamental constants" are artifacts of poor dimensional choices.
Mathematical frameworks: Beyond traditional calculus, the AI might develop:
Discrete mathematics that naturally handles quantum phenomena
Non-commutative geometries that compress particle physics more efficiently
Novel operators that make "difficult" calculations trivial
Conceptual primitives: The AI's choice of what to treat as basic versus derived could be revelatory. If it consistently uses momentum rather than velocity, or action rather than energy, this suggests our conceptual hierarchy may be inverted.
Most intriguingly, what the AI considers "fundamental" may be as valuable as the laws it discovers. Its representations—unconstrained by human cognitive biases or historical accidents—could reveal that our standard formulation of physics, while correct, is profoundly suboptimal. The framework thus offers not just new laws but potentially new ways of thinking about physics itself.
This representational flexibility strengthens our proposal: we're not just automating human physics but potentially discovering entirely new conceptual frameworks that compress nature more efficiently than millennia of human tradition.
7.5 Multiple representations as a feature, not a bug
When multiple equally valid compressions exist, the framework treats this as a valuable discovery rather than a problem requiring resolution. The system actively rewards maintaining multiple representations, recognizing that different formulations optimize for different contexts.
Reward structure for multiple representations:
The AI receives bonuses for discovering alternative formulations that excel in different domains
Each representation must demonstrate computational advantage in at least one context
Representations are only pruned when they offer no computational benefit in any scenario
Examples of valuable multiple representations:
Newtonian mechanics (optimal for v << c, intuitive for engineering)
Lagrangian formulation (optimal for constraints, generalized coordinates)
Hamiltonian formulation (optimal for phase space analysis, quantum transitions)
F = ma vs F = dp/dt (context-dependent optimality)
The framework naturally discovers that physics has always maintained multiple representations for good reason. A representation mapping might reveal:
Wave formulation excels for interference problems
Particle formulation excels for collision problems
Path integral formulation excels for quantum transitions
Each earns its place by computational efficiency in specific contexts
Pruning criteria: A representation is retired only when:
It offers no computational advantage in any tested domain
A strictly superior formulation exists (faster, more accurate, broader validity)
It fails experimental validation within its claimed domain
This approach mirrors how human physics actually works—we keep Newtonian mechanics despite having relativity because it's computationally optimal for everyday problems. The AI discovering and maintaining this multiplicity validates that our physics toolbox evolved for good computational reasons, not historical accident.
Rather than forcing a single "true" representation, the framework celebrates physics' computational pragmatism: the best formulation depends on what you're trying to calculate.
7.6 Managing Multiple Representations
The framework explicitly handles multiple valid compressions through a rigorous evaluation system:
**7.6.1 Equivalence Detection**
Two types of equivalence are recognized:
1. **Mathematical Equivalence**: Rules that are identical through algebraic rearrangement
- Detected by a non-ML symbolic mathematics engine
- Example: "F = m×a" and "a = F/m" are marked as equivalent
- Only one form kept per equivalence class (choosing based on computational efficiency)
2. **Functional Equivalence**: Different mathematical forms yielding identical predictions
- Example: Series expansion vs closed-form expression for pendulum period
- Detected by comparing predictions across the experiment set
- Both forms may be retained if they offer different computational advantages
**7.6.2 Scope-Based Rule Evaluation**
For each discovered rule, we compute:
**Scope(R)** = Number of experiments (out of N per epoch) where rule R applies
This is calculated by a deterministic helper algorithm that:
- Tests each rule against each experiment's conditions
- Verifies unit consistency
- Checks parameter range validity
- Returns count of applicable experiments
**7.6.3 Rule Retention Criteria**
Rules are retained or pruned based on a hierarchical decision process:
1. **Scope Dominance**: Rules with maximum scope are always retained
2. **Pareto Efficiency**: A rule is kept if no other rule surpasses it in BOTH scope and computational cost
3. **Threshold-Based Retention**: For rules A and B where Scope(A) > Scope(B):
- If Scope(B) ≥ 0.9 × Scope(A) AND Cost(B) < Cost(A): Keep both
- This ensures we maintain computationally efficient special cases
4. **Diversity Preservation**: Given rules A, B, C where:
- A has broadest scope
- B has lowest computational cost
- C has Scope(C) >> Scope(B) AND Cost(C) >> Cost(B)
- All three are retained to cover different use cases
**Example**:
- Rule A: Full relativistic energy (broad scope, high cost)
- Rule B: E = mc² (medium scope, low cost)
- Rule C: Kinetic energy ½mv² (narrow scope, minimal cost)
- All retained as they serve different computational needs
**7.6.4 Dynamic Rule Portfolio Management**
The system maintains a living portfolio of rules where:
- New rules can surpass and replace existing ones
- Rules that become obsolete (fully surpassed in scope AND efficiency) are archived
- The reward function β·(Number of Rules – Number of surpassed rules) naturally encourages efficient coverage
- Archived rules are stored with explanation of why they were surpassed
**7.6.5 Context-Dependent Selection**
The ML system learns through training to select appropriate rules based on:
- Required precision (use simple approximations when sufficient)
- Computational budget (use fast approximations under time constraints)
- Parameter regime (use specialized forms in their optimal domains)
This selection is emergent from the reward structure - the system naturally learns when to apply each retained rule for optimal performance.
**7.6.6 Canonical Form Identification**
A separate non-ML symbolic engine handles canonical form detection:
- Algebraic simplification to standard form
- Pattern matching for known equivalences
- Graph-based representation of expression structure
- Hash-based storage for rapid equivalence checking
This ensures the ML system focuses on discovering new compressions rather than rediscovering algebraic variants.
**7.7 Addressing Potential Failure Modes**
Several apparent "failure modes" are actually features or are already mitigated by design:
**7.7.1 "Physically Meaningless" Compressions**
**Concern**: What if the system discovers mathematically elegant compressions that seem physically meaningless?
**Response**: This is not for us to decide. If the AI consistently finds compressions outside our base set M that successfully predict experimental outcomes, perhaps they ARE meaningful. History shows many physical discoveries initially seemed meaningless:
- Complex numbers in quantum mechanics
- Negative energy states predicting antimatter
- Non-Euclidean geometry in general relativity
The framework's emphasis on experimental validation ensures any retained compression must make testable predictions. "Meaningless" is often "not yet understood."
**7.7.2 Overfitting to Experimental Noise**
**Concern**: The system might memorize noise patterns rather than discovering true regularities.
**Built-in Protection**:
- Base set M is reconstructed each epoch by randomly sampling k experiments for each physical law
- Example: For F=ma, randomly select k collision experiments from a larger pool each epoch
**Concern**: What if no elegant compression exists for certain phenomena?
**Response**: By definition, such phenomena are not part of base set M, which contains only experiments explainable by known physics. For genuinely incompressible phenomena:
- The system will maintain high uncertainty bounds
- May discover we need new mathematical tools
- May indicate fundamental randomness (like quantum measurement)
- This is valuable information, not a failure
**7.7.4 Domains Where Current Physics Fails**
**Concern**: How to handle known problematic regimes?
**Data Labeling Hierarchy**:
1. **Real Experiments** (Gold Standard): Always trusted, even if they contradict theory
2. **Generated Experiments** (High Confidence): From well-validated physics
3. **Generated Experiments with Known Unreliability**:
The framework is designed to be self-correcting: bad compressions naturally get outcompeted by better ones through the scoring system.
7.8 The Possibility of Alien Compressions
While the pendulum example (Section 3.4.4) shows how the system might refine human physics, the truly revolutionary outcomes may be unrecognizable to us. Consider:
**What if the AI's compressions look nothing like human physics?**
Just as your visual system makes circles "obvious" while they might be meaningless to a blind entity with no spatial senses, our physics reflects our particular cognitive architecture:
- We privilege discrete objects (leading to particle physics)
- We think in 3D+time (leading to spacetime formulations)
- We separate observer from observed (leading to measurement problems in QM)
The AI operates in a high-dimensional weight space where:
- Every parameter is continuous, not discrete
- Gradients and flows might be more natural than objects
- The distinction between observer and system may be meaningless
**Concrete Possibility**: Instead of rediscovering F=ma, the AI might express mechanics through continuous deformation fields in phase space, where our concept of "force" and "mass" never explicitly appear, yet all predictions match experiments perfectly.
**The Exciting Implication**: We've included Kepler's laws in base set M to ensure the AI can predict planetary motion. But it might satisfy this requirement through compressions that look nothing like ellipses or orbital mechanics - perhaps through harmonic decompositions in configuration space that make different calculations trivial but would never occur to human minds.
This is why discovering "alien" compressions would be as profound as any unification - it would show that human physics, while empirically successful, represents just one cognitive species' way of compressing reality.
If such alien compressions emerge, they would not merely be curiosities, but constitute the first empirical evidence that our physics is just one projection of a deeper, richer structure of reality, glimpsed not by introspection, but by optimization without priors. If this is done repeatedly, and results are compared, perhaps meta-patterns will emerge. Perhaps even hinting at the very optimal compression, we have stated we are very unlikely to reach? The rules underlying it all.
7.9 Benchmarking Philosophy
Rather than explicit performance targets against existing systems, we maintain human physics as a hidden baseline:
- The system is never told what laws it "should" discover
- Performance metrics compare discovered compressions against human physics:
* Computational cost ratio: C_AI / C_human for equivalent calculations
* Precision ratio: σ_AI / σ_human for equivalent predictions
* Coverage: what fraction of known phenomena can the AI predict?
- These comparisons happen external to the training loop
3. Superior formulations (revolutionary discovery)
The loss function could incorporate relative performance:
- Bonus when C_AI < C_human for any calculation
- Penalty when precision significantly worse than human physics
- But these are computed without revealing the human formulation
8. Conclusion
By treating physics as optimal compression under computational and experimental constraints, we can address fundamental questions about the uniqueness and inevitability of our physical theories while potentially discovering more efficient formulations. The framework naturally identifies where current theories fail and suggests optimal experiments to probe these failures. Whether it rediscovers Einstein or finds something entirely different, the journey itself will illuminate the deep structure of how we compress reality into understanding.
Foreword
This is an idea that has been sitting on my hard drive for a few months. I like it enough to finally share it.
I won’t pretend AI hasn’t been involved in helping shape the presentation. What I do claim is that the core idea and its overall structure (a rough blueprint for a human–AI discovery loop) is well beyond what current AI systems can generate on their own.
This is not a text designed for quick consumption. It’s dense, and probably uneven in places. Still, I’d be surprised if there weren’t at least a few people here for whom the underlying idea resonates, even if the presentation itself leaves room for improvement.
If the text in any way inspires someone, or serves as an interesting read, that alone would make sharing it worthwhile.
Executive Summary
What if our fundamental physics equations aren't THE laws of nature, but merely ONE good compression among many? This paper proposes training neural networks to discover physical laws by treating physics as a data compression problem: find the most computationally efficient rules that predict experimental outcomes within measurement uncertainty. Unlike existing automated discovery systems that search for equations matching human physics, our framework might reveal that F=ma is suboptimal, that we've chosen the wrong fundamental units (why m/s instead of s/m?), or that radically different mathematical frameworks compress nature more efficiently. By maintaining multiple valid compressions optimized for different contexts—just as we keep Newton despite having Einstein—the system acknowledges that physical laws are supremely useful correlations with defined domains, not metaphysical truths. Early validation on classical mechanics could lead to computationally revolutionary reformulations of quantum field theory, or even reveal why human physics took its particular historical path when equally valid alternatives existed all along. In this paper the term optimal is meant to be read as aspirational, rather than as a goal that will or even could be reached.
Abstract
We propose a framework for automated discovery of physical laws by treating physics as an optimal compression problem under experimental constraints. By training a neural network to predict masked experimental measurements while minimizing both computational complexity and experimental cost, we aim to discover whether our current formulation of physics represents the unique description of nature or merely one of many possible compressions of empirical data. The system uses a hierarchical data structure combining real experimental measurements with simulated data, allowing it to identify regions where current theories fail. Beyond philosophical implications, this approach could discover computationally efficient reformulations of known physics and identify promising directions for new experiments.
1. Introduction
Are Maxwell's equations, Einstein's field equations, and the Schrödinger equation THE laws of physics, or merely ONE way to compress our observations of reality? This question, long relegated to philosophy, may now be answerable through machine learning.
Consider two provocative possibilities:
The String Theory Paradox
Consider a thought experiment: General relativity can be derived from string theory, making string theory as experimentally verified as relativity itself. If an AI system approached physics without our historical path dependence, which would it discover first?
This question illuminates a profound possibility: our current formulation of physics may represent not THE unique description of nature, but merely one branch in a vast space of equivalent compressions. Different theoretical frameworks - some perhaps more elegant, others more computationally efficient - might provide identical predictions within experimental uncertainty while offering radically different conceptual foundations.
We aim for insight into different laws, and better laws. We aspire to point in the direction of new laws. We train in the search for optimal laws, however we do so in full knowledge that these will, in all likelihood, never be reached, the way we define them. Perhaps we lack mathematical tools, and the combinatorial possibilities might as well be infinite. We do believe, however, that iterating towards “optimal”, may lead us to “good”.
Our proposal
We propose a framework where physics emerges as the solution to an optimization problem: find the most computationally efficient compression of experimental data that preserves predictive power within measurement uncertainty. This recasts the development of physics not as discovering "true" laws, but as finding optimal representations under specific constraints.
1.1 A Fundamental Design Choice: Guided Discovery vs. Tabula Rasa
The framework faces a crucial structural choice: Do we explicitly ask the AI to predict specific quantities from experiments (e.g., "What is the force?"), or do we let it observe experimental data and discover whatever patterns it finds compressible?
**Guided Discovery Approach** (analogous to AlphaGo):
- Present experiment: masses, positions over time
- Ask: "Predict the force at time t"
- Biases toward human conceptual frameworks
- Computationally simpler, faster convergence
- Useful for validating the framework
**Tabula Rasa Approach** (analogous to AlphaGo Zero):
- Present experiment: masses, positions over time
- Ask: "What patterns can you compress from this data?"
- AI might discover:
- Mean speed (Δx/Δt) or its inverse "timeliness" (Δt/Δx)
- Position as function of time x(t) or time as function of position t(x)
- Entirely alien quantities we haven't named
- No human bias about which quantities matter
- Computationally demanding but maximizes discovery potential
For clarity of exposition, this paper often assumes the guided approach when presenting technical details. However, the tabula rasa approach represents the ultimate ambition - discovering not just new laws but new quantities, new concepts, and new ways of parsing reality.
Just as AlphaGo Zero surpassed AlphaGo by abandoning human knowledge, a physics AI starting from scratch might discover compressions that make our concepts of "force," "energy," and "momentum" look like historical accidents rather than fundamental features of nature.
If trained correctly, it would certainly realize that force (or an equivalent representation) can be deduced from mass and time/position data. However: How central will such a representation be to the AI's framework? Will it use force as a fundamental building block, as we do, or will it treat the concept of force as a mere parenthetical observation while using entirely different concepts as the foundation for unification?
This could produce a physics as alien to human understanding as AlphaGo Zero's strategies are to human Go players. Just as AlphaGo Zero routinely bypasses patterns that human masters have deemed significant for millennia, a tabula rasa physics AI might build its theoretical edifice on conceptual foundations we've never imagined, treating our "fundamental" quantities as peripheral consequences of deeper patterns.
The framework thus offers not just automated discovery of laws, but the possibility of genuinely alien physics - empirically equivalent to ours, yet conceptually unrecognizable.
This paper mainly fleshes out concepts as if the analogue to AlphaGo is the starting point (asking for specifying pre-chosen parameters directly). However, if one wants to try the more ambitious zero-approach from the start, we would find this perhaps even more exciting (higher risk, higher reward).
A Hybrid Approach:
A promising compromise would be to both ask for specific parameters AND let the AI discover what other questions can be answered within an experiment:
- Present experiment: masses, positions over time
- Required task: "Predict the force at time t"
- Open exploration: "What else can be deduced from this data?"
This hybrid approach offers several advantages:
- Ensures the AI learns human physics (maintaining compatibility with our knowledge)
- Provides clear benchmarks for validation (can it predict force correctly?)
- Still allows discovery of alien concepts and alternative compressions
- May reveal that our "required" predictions are suboptimal entry points
The AI might discover that while it can compute force as requested, a different quantity (perhaps something like "interaction potential gradient") makes subsequent calculations trivial and unifies disparate phenomena more naturally. This would be analogous to discovering that while we can convert between Fahrenheit and Celsius, Kelvin is the more fundamental temperature scale.
This hybrid approach reduces computational demands while preserving the potential for revolutionary discoveries – maybe the perfect starting point. Practically speaking:
2. Core Framework
2.1 Physics as Compression
Our central insight treats physical laws as compression algorithms for experimental data. Given a set of measurements with uncertainties, the goal is to find minimal rules that can predict masked values within their error bars.
The system optimizes for:
These optimization criteria exist in fundamental tension, creating a Pareto frontier rather than a single optimal solution. Physical laws occupy different points along this frontier: Newton's laws achieve maximal compression with bounded accuracy, while quantum field theory trades compression for precision. The framework naturally maintains multiple representations along this frontier, each optimal for its specific context and requirements. This multi-objective nature makes 'optimal compression' inherently context-dependent rather than absolute.
2.2 Key Innovation: Hierarchical Truth
We employ three tiers of training data:
Example: Quantum-Classical Boundary The AI might identify systematic divergences between quantum mechanical predictions and classical measurements in mesoscopic systems. Rather than treating this as noise, the system would recognize this as a high-value experimental regime where:
This hierarchy allows the system to bootstrap from current knowledge while remaining capable of discovering where that knowledge fails.
3. Technical Implementation
This section provides both general principles and specific examples of technical implementation. We emphasize at the outset:
1) We do not intend to solve all implementation challenges in this paper - our goal is aspirational rather than fully architectural.
2) Where we do provide specific implementations, these are by no means prescriptive. They serve as proof-of-concept examples, demonstrating the feasibility of embarking on a project with the ambitions laid out in this paper.
Alternative approaches are likely to prove superior, and we encourage exploration of different technical solutions within the framework's conceptual structure.
3.1 Architecture
The system consists of:
Rule Formulation Structure
To ensure interpretability and systematic theory construction, all discovered rules must conform to a structured format:
Rule Template: "If conditions {A, B, ..., N} apply, and we know parameters {a, b, ..., n} with uncertainties {σ_a, σ_b, ..., σ_n}, then missing parameter m_p can be deduced through operations f(a, b, ..., n) with uncertainty σ_mp."
3.1.1 Neural Architecture Specifications
The system employs a hybrid architecture combining pattern recognition with symbolic reasoning:
**Encoder Network** (Processes experimental data):
- Transformer-based: 12 layers, 768 hidden dimensions, 12 attention heads
- Positional encoding modified to handle variable-length experiments
- Special tokens for: [UNKNOWN], [MEASUREMENT], [UNCERTAINTY]
- Input embedding includes both value and unit information
**Symbolic Generation Network**:
- Decoder architecture similar to GPT: 24 layers, 1024 hidden dimensions
- Vocabulary includes:
* SI units and combinations
* Mathematical operators (initially just +-*/, expanding over training)
* Numeric constants
* Parameter references from input
- Attention can access both experimental context and previously generated symbols
**Compression Evaluation Network**:
- Graph Neural Network processing symbolic expressions as trees
- 6 layers of message passing
- Estimates computational complexity without execution
- Outputs: predicted accuracy, computational cost, domain of validity
**Training Configuration**:
- Batch size: 256 experiments
- Learning rate: 1e-4 with cosine scheduling
- Gradient clipping at 1.0
- Mixed precision training for efficiency
3.1.2 Neural-to-Symbolic Translation
The system generates symbolic expressions through a process analogous to language modeling:
Just as a LLM responds to a block of text with the next token, this network responds to any experiment (with an unknown) by either:
1) Outputting [CANNOT_DETERMINE] - indicating the unknown cannot be determined from known parameters
2) Creating a full symbolic sentence by combining known parameters and operations
**Token Generation Process**:
- Input: Experimental data with one parameter masked as [UNKNOWN]
- Output: Sequence of tokens forming a valid symbolic expression
- Generation proceeds left-to-right, selecting from allowed operations
- Expression terminates with [END] token
- Invalid expressions (unit mismatch, malformed) are rejected and regenerated
**Curriculum for Available Operations**:
- Epochs 1-100: Only +-*/ available (forces discovery of simple relationships)
- Epochs 101-500: Add power operations, sqrt
- Epochs 501-1000: Add trigonometric functions
- Epochs 1001+: Full mathematical toolkit
This staged approach prevents the system from immediately reaching for complex functions when simple ones suffice.
**Computational Tractability Through Constraints**:
Unlike general language modeling, physics expression generation benefits from severe natural constraints:
- Unit consistency eliminates >99% of random expressions
- Valid mathematical grammar (balanced parentheses, operator precedence)
- Available tokens limited to experiment parameters and basic operations
- No redundant forms (the grammar prevents a+a+a, enforcing 3*a)
This reduces the search space by many orders of magnitude compared to natural language. While a LLM might choose from 50,000+ tokens at each position, our system selects from perhaps 10-20 valid continuations, making the problem tractable despite the combinatorial nature of symbolic expressions.
The token-by-token generation approach (set first token, set second token, ..., set last token) leverages the same sequential decision-making that makes LLMs successful, but in a dramatically constrained space where physics provides the grammar.
**Discovery Through Missing Parameters**:
When outputting [CANNOT_DETERMINE], the system can additionally suggest what parameters might be missing:
- "Cannot determine period without: mass of pendulum"
- "Cannot determine force without: acceleration"
- "Cannot determine energy without: velocity"
Reward structure:
- If [CANNOT_DETERMINE] + missing parameters is correct: Reward (the system identified a genuine limitation)
- If [CANNOT_DETERMINE] was wrong (a rule exists with available parameters): Penalty in loss function
- This incentivizes the system to deeply understand what information is truly necessary for each calculation
This mechanism could lead to profound discoveries - the AI might identify that certain parameters we consider fundamental are actually derivable, or that parameters we've never considered are essential for complete descriptions.
3.1.3 Combinatorial Explosion Management
The system controls symbolic expression growth through adaptive length penalties:
**Length Penalty Schedule**:
L_penalty(epoch, length) = α(epoch) × length^β
Where:
- α(epoch) = α_0 × decay^(epoch/1000) - starts high, decays over training
- α_0 = 0.1 (initial penalty weight)
- decay = 0.5 (halves every 1000 epochs)
- β = 1.5 (superlinear penalty for very long expressions)
**Beam Search with Pruning**:
- Maintain top-K expressions during generation (K=10)
- Prune branches that exceed length budget: L_max = 10 + epoch/100
- Early stopping if computational cost exceeds threshold
**Complexity-Aware Sampling**:
- Sample training experiments biased toward those requiring simpler solutions
- Gradually increase complexity as simple relationships are mastered
- Maintains pressure to find elegant solutions first
**Expression Caching**:
- Cache and reuse subexpressions that appear frequently
- Build library of "mathematical motifs" (e.g., √(x²+y²) for distance)
- Reduces redundant search over common patterns
3.1.4 Training Procedures
These are not meant to be final, but rather as credible suggestions to prove plausibility.
The system optimizes for the complete scoring function:
Reward = α·(Predictive Accuracy)
+ β·(Number of Rules – Number of surpassed rules)
- γ·(Computational Complexity)
- δ·(Uncertainty Bar Size)
+ ε·(Unification Bonus)
+ ζ·(Experimental Value)
+ κ·(Experiment Documentation)
**Implementation via Reinforcement Learning**:
- Policy network: The symbolic expression generator (transformer decoder)
- Action space: Sequence of mathematical tokens forming expressions
- Reward signal: Computed after full expression is generated and evaluated
- Training algorithm: Proximal Policy Optimization (PPO) for stable learning
- Baseline: Running average of rewards to reduce variance
**Computational Complexity Calculation**:
- γ·(Computational Complexity) incorporates both expression length and operation cost
- Simple operations (+-*/) have cost 1
- Complex operations (trig, exp, special functions) have cost 2-5
- Iterative solutions penalized by expected iteration count
- This naturally encourages discovering closed-form solutions
**Curriculum-based Parameter Scheduling**:
Early Training (epochs 1-1000):
- High γ (0.5): Encourages simple discoveries first
- Low ε (0.1): Don't force unification prematurely
- Moderate α (0.3): Allow some prediction error while learning
- Available operations: +-*/ only
Mid Training (epochs 1000-5000):
- Balanced parameters: α=0.4, β=0.2, γ=0.3, δ=0.2, ε=0.2
- Gradually introduce all mathematical operations
- Begin rewarding unification of disparate phenomena
Late Training (epochs 5000+):
- High ε (0.4): Push toward unified theories
- High ζ: Encourage experimental exploration (see Section 3.7)
- Full mathematical toolkit available
- Focus on discovering alternate formulations
**Optimization Details**:
- Batch size: 256 experiments per update
- Learning rate: 3e-4 with cosine annealing
- Gradient clipping: 1.0 for stability
- Entropy bonus: 0.01 to encourage exploration
- Update frequency: Every 2048 expression generations
**Checkpoint Strategy**:
- Save models that discover known physics milestones (even if not told these exist)
- Save models that achieve breakthrough compression ratios
- Maintain diverse population of models finding different valid compressions
**Handling Multiple Valid Compressions**:
The framework treats compression as a powerful lens for understanding how physical theories organize empirical regularities. Rather than seeking a single "correct" formulation, it maintains a "physics portfolio" of discovered rules where:
- Rules compete based on context-specific performance
-Different checkpoints excel in different domains
- Multiple representations coexist, each optimal for specific calculations
- This aligns with Section 7.5's philosophy of multiple representations
**Training Data Curriculum**:
1. **Epochs 1-500**: Single-parameter relationships (F∝a, V∝T)
2. **Epochs 500-2000**: Two-parameter relationships (F=ma, PV=k)
3. **Epochs 2000-5000**: Multi-parameter systems (pendulums, orbits)
4. **Epochs 5000+**: Complex phenomena requiring unified theories
5. **Throughout**: Interleave simple and complex to prevent forgetting
**Success Indicators During Training**:
- Spontaneous discovery of known relationships (without being told)
- Decreasing computational cost for equivalent predictions
- Emergence of hierarchical rule structures
- Successful predictions in held-out experimental domains
3.2 Training Protocol
α·(Predictive Accuracy)
+β·(Number of Rules – Number of surpassed rules)
- γ·(Computational Complexity)
- δ·(Uncertainty Bar Size)
+ ε·(Unification Bonus)
+ ζ·(Experimental Value)
+κ·(Experiment Documentation)
For a discussion on Experimental Value see section 3.4 and 3.5. For Experiment Documentation see section 3.6. We only want to keep track of valid rules, not rules that are just worse (e.g. smaller ranges with no advantage, larger computational costs etc.). We do however want to promote equally good representations, if found. If a unification is computationally easier than existing rules, all existing rules are scrapped. Otherwise superior computational rules are kept, alongside the unification.
3.2.1 Experiment Composition and Knowledge Retention
Critical to successful training is the careful composition of experiments presented each epoch:
**Base Experiment Set M**: Each epoch's N experiments include a curated subset M that necessitates handling all physics rules relevant to the current training stage. This ensures the AI keeps, refines, and generalizes successful compressions rather than discarding them.
**Randomized Construction**:
- Base set M is reconstructed each epoch by randomly sampling k experiments for each physical law
- Example: For F=ma, randomly select k=5 collision experiments from a pool of 50+ variants each epoch
- This randomization ensures:
- Noise patterns change epoch-to-epoch, preventing overfitting
- Only regularities that persist across epochs (true physics) can be learned
- Uncertainty bounds (σ) prevent fitting below noise level
- The AI learns robust compressions, not memorized datasets
**Epoch-Specific Composition**:
- **Early epochs (+-*/ only)**:
- M includes: simple collisions (momentum), basic circuits (V=IR), ideal gases (PV=nRT)
- Size: M ≈ 60% of N, leaving 40% for exploration
- **Middle epochs (full operations)**:
- M expands to: oscillations, waves, thermodynamics, basic quantum phenomena
- Size: M ≈ 70% of N, as more physics must be retained
- **Late epochs**:
- M covers: full classical mechanics, E&M, quantum mechanics, relativity
- Size: M ≈ 65% of N, balanced with need for discovering unifications
**Ratchet Mechanism**: As new valid compressions are discovered, experiments requiring them are added to future M sets. This creates a one-way ratchet where good discoveries are automatically preserved in the training curriculum.
**Example Base Set (N=100 experiments per epoch)**:
- 10 collision experiments (various masses, velocities)
- 10 pendulum experiments (various lengths, angles)
- 10 gas experiments (various P, V, T conditions)
- 10 circuit experiments (various configurations)
- 10 projectile motion experiments
- 10 wave experiments
- 40 exploration experiments (novel combinations, edge cases)
**Curve Ball Experiments**:
- 5-10% of base set M consists of "curve ball" experiments
- These combine known physics in unusual ways or extreme conditions
- Examples:
- Pendulum in viscous fluid (combines mechanics + fluid dynamics)
- Charged particle in combined E&M fields at relativistic speeds
- Phase transitions under extreme pressures
- Purpose: Create opportunities for discovering unifying principles
- Selected from regions where different physics domains interact
- May lead to discovering more elegant unified compressions
This entire design ensures no "catastrophic forgetting" - the AI cannot achieve high rewards without maintaining compressions that handle the base set effectively.
3.3 Computational Efficiency Rewards
The system explicitly rewards computational shortcuts:
Example: Simple Pendulum Consider a pendulum with small angular displacement θ. The exact period is:
T = 2π√(L/g) × [1 + (1/16)θ² + (11/3072)θ⁴ + ...]
For experimental uncertainty of 1%, the system should discover (unless it finds something even better):
3.4 Experimental Request System
3.4.1 Cost of Experimental Requests
The AI can request new experiments with costs scaling as:
This mirrors real experimental constraints and forces efficient exploration.
Concrete Cost Formulation Example:
While there may be many valid ways to do this, we would be remiss, if we didn’t present at least one plausible way of doing it:
The total cost C for an experimental request may be computed as:
C = C_base × F_precision × F_extremity × F_novelty
Where:
- C_base = baseline cost for the measurement type. C_base values derived from historical experimental costs in each domain
- F_precision = (σ_target/σ_standard)^(-2) for uncertainty σ
- F_extremity = exp(Σ_i |p_i - p_typical,i|/λ_i) for parameters p_i with scale lengths λ_i
- F_novelty = 1/(1 + n_similar^0.5) where n_similar is the number of similar past experiments
Example: Measuring electron mass to 10^-12 precision (vs standard 10^-8) at 0.99c (vs typical 0.1c):
- F_precision = 10^8
- F_extremity = exp(|0.99-0.1|/0.5) ≈ 6
- Total cost multiplier ≈ 6×10^8 × baseline
Budget Allocation Strategy:
The system operates under a fixed experimental budget B per epoch:
1. **Information Gain Estimation**: For each proposed experiment e, estimate:
IG(e) = H(Model|Data) - E[H(Model|Data ∪ {e})]
2. **Value Ranking**: Rank experiments by IG(e)/C(e)
3. **Greedy Selection**: Select experiments in rank order until:
Σ C(e_selected) ≤ B
4. **Diversity Constraint**: No more than 30% of budget on experiments testing the same rule
5. **Exploration Reserve**: 10% of budget reserved for high-risk/high-reward experiments
3.4.2 Information-Theoretic Experiment Selection
The system employs active learning principles to maximize information gain:
**Uncertainty Sampling**: Priority to experiments where model predictions have highest variance across valid compressions
**Query by Committee**: When multiple compressions exist, prioritize experiments that maximally disagree
**Expected Model Change**: Estimate how much each experiment would update compression parameters:
EMC(e) = Σ_θ |θ_current - E[θ|e]| × P(θ|e)
**Boundary Sampling**: Focus on parameter regions near validity boundaries of current compressions
**Divergence Detection**: Highest priority to experiments where:
|prediction_simulated - prediction_compressed| > 2σ_experimental
3.4.3 Laboratory Interface System
For real-world experimental requests:
1. **Standardized Request Format**:
- Experimental type (from predefined ontology)
- Required parameters and ranges
- Precision requirements
- Safety constraints
- Estimated duration
2. **Automated Database Search**:
-There will be an initial internal, structured database more readily available
-Initial Database structured according to the same principle guiding generated experimental data
- Contents include:
* Classical mechanics experiments (pendulums, collisions, orbits)
* Thermodynamics data (gas laws, phase transitions, heat transfer)
* Electromagnetic observations (circuit behavior, field measurements)
* Modern physics experiments (spectroscopy, particle physics, relativistic effects)
- Each entry contains: experimental conditions, measured values, uncertainties, metadata
- Internal Database grows as humans in the loop see a need
- Internal Database grows through two pathways when querying external sources:
* **Automatic ingestion**: Data from gold-standard sources (e.g., NIST, Particle Data Group, peer-reviewed data papers) automatically added after format conversion
* **Human-reviewed ingestion**: Data from broader sources (e.g., preprints, older papers, less standardized databases) flagged for human review before inclusion
- Query existing experimental databases (arXiv, particle data group, etc.)
- Check if requested data already exists
- Flag partial matches for human review
3. **Human-in-the-Loop Protocol**:
- Feasibility assessment by domain experts
- Cost estimation from actual laboratories
- Safety and ethics review for extreme parameters
- Alternative experiment suggestions
Future Directions for Experimental Automation:
As the framework matures, deeper integration with experimental systems could include:
These advances would transform the system from requesting experiments to actively conducting them, accelerating the discovery cycle. For any of the above, strict protocols for human-in-the-loop would be implemented for safety.
4. **Experiment Queuing System**:
- Priority queue based on IG(e)/C(e) × feasibility_score
- Batch similar experiments for efficiency
- Track experiment status and update predictions upon completion
5. **Result Integration Pipeline**:
- Automated data cleaning and validation
- Uncertainty quantification including systematic errors
- Update compression models with new data
- Flag unexpected results for deeper investigation
3.4.4 Example in depth: Discovering More Optimal Pendulum Laws
Starting with raw pendulum data (length, angle, period measurements):
Epoch 1: System discovers T ≈ 2π√(L/g) with 5% error
- Requests: Higher precision timing at various angles
- Cost: Low (standard conditions)
Epoch 5: Discovers angle dependence, proposes T = 2π√(L/g)[1 + θ²/16]
- Requests: Large angle measurements (θ > 60°)
- Cost: Medium (specialized apparatus needed)
Epoch 10: Two competing compressions emerge:
- Series expansion: Works well for θ < 90°
- Elliptic integral formulation: Works for all angles but computationally expensive
- Requests: Ultra-precise measurements at θ = 45° to distinguish formulations
- Cost: High (10^-6 second precision required)
Budget allocation: 60% on distinguishing experiments, 30% on boundary testing, 10% on exploring damped oscillations (exploration reserve)
3.4.5 Integration with Scientific Practice
The framework is designed to complement, not replace, human scientific inquiry:
- Discovered compressions output in standard formats (LaTeX, SymPy, Mathematica)
- Rule cards provide worked examples for human understanding
- Discovery indicators flag high-priority experiments for human investigation
- Can run alongside traditional research, suggesting new directions
Scientists interact through rule review, pre-planned database expansion as allowed mathematical operations increases, experiment prioritization, and interpretation of alien compressions - maintaining human insight while leveraging AI's unbiased exploration.
3.5 Exploration Incentive Structure
As new experiments are ordered, Predictive Accuracy will grow. To prevent convergence on local optima and encourage paradigm-shifting discoveries:
3.6 Enhanced Rule Documentation Requirements
Each discovered rule must be accompanied by a comprehensive "rule card" containing:
1. Experimental Domain Mapping
2. Worked Examples (minimum 3 per experiment type)
3. Failure Mode Documentation
Reward Structure Addition:
The reward function could include:
Documentation_Bonus = κ × (experiment_types_covered) × (clarity_score) × (example_completeness)
Where:
This Solves Multiple Problems:
Example Rule Card:
Rule: Energy-momentum relation in special relativity
This approach ensures that compression leads to clarity, not obscurity. The AI is literally rewarded for being a good teacher!
Whenever a rule is scrapped it gets saved outside the system, with a rules-number. The new replacement rule will also be added, making it computationally clear why the rule was scrapped. That way we could perhaps find natural laws we use, and see why the AI chose to scrap them.
3.7 Dynamic Meta-parameter Adjustment
To prevent convergence on local optima, the system employs adaptive meta-parameter tuning:
Real-time Adjustment Protocol:
If improvement_rate < threshold for N epochs:
ζ → ζ × (1 + exploration_boost)
γ → γ × (1 + complexity_boost)
β → β × (1 - diversity_allowance)
This creates a "simulated annealing" effect in theory space, preventing premature convergence while maintaining long-term pressure toward elegant unification.
4. Expected Outcomes
4.1 Computational Discoveries
4.2 Theoretical Insights
4.3 Experimental Guidance
5. Research Program
Phase 1: Proof of Concept
Phase 2: Classical Physics
Phase 3: Modern Physics
Phase 4: Frontier Physics
6. Implications
6.1 Practical Impact
6.2 Foundational Questions
6.3 Broader Significance
This framework could revolutionize how we approach scientific discovery, moving from human intuition to optimal information-theoretic principles. If successful, it would demonstrate that the deepest questions about the nature of physical law are not merely philosophical but empirically addressable.
6.4 Convergence and Multiplicity
A critical question this framework addresses: Will independent runs converge on identical physical laws, or will we observe multiple, equally valid theoretical compressions?
Preliminary considerations suggest:
Core phenomena (e.g., conservation laws) likely emerge universally
Formulation details (coordinate systems, mathematical representations) may vary significantly
The "magic" of machine learning convergence might reveal whether physics has a unique optimal compression or admits multiple equivalent descriptions
This multiplicity would fundamentally reshape our understanding of physical law from discovered truth to optimal representation under constraints.
6.5 Validation Metrics and Success Criteria
Absolute Minimum Benchmark: The system must achieve parity with human physics:
Graduated Success Metrics:
Comparative Benchmarks:
Novel Discovery Validation Protocol:
When the AI discovers compressions without human precedent:
- Test on held-out experimental domains never seen during training
- Require successful prediction of 10+ independent phenomena
- Human experts attempt to find flaws or limitations
- Only accepted after real-world experimental validation
- Success can be measured by: Does it make previously hard calculations easy? Does it unify previously separate phenomena? Does it suggest new experiments that yield surprising results? Is it a stepping stone used in other formulas that ARE of clear interest to us (analogous to how complex numbers are important for quantum mechanics and for solving certain integrals, among other things)?
7. Addressing potential reasons for concern
7.1 The validation paradox and mitigation strategies
In this proposal most "experiments" will be simulated experiments. If humanity thinks we have the full picture, when what we have is constrained to limits we have not realized, the AI may request experiments and get patently false results. However:
Add these additional mitigation strategies:
7.2 Note on Kolmogorov Complexity
We acknowledge that Kolmogorov-optimal compression is formally uncomputable. While our framework is designed to continually reward reductions in algorithmic complexity, we fully recognize this is an open-ended pursuit — there is no definitive endpoint. In practice, we rely on computable proxies such as symbolic expression length, predictive accuracy, and computational cost to guide compression toward more efficient formulations.
The term “optimal” is thus aspirational. What we seek for any given law is not literal minimality, but a practical and interpretable tradeoff within a bounded model class and empirical dataset.
For a clearer picture of what we actually aim to achieve, see the Validation Metrics and Success Criteria in Section 6.5. While certainly ambitious, they fall well short of unattainable perfection — and instead ground the framework in meaningful, achievable benchmarks.
While future implementations will determine which proxies best capture physical compression, potential approaches include:
The framework deliberately avoids prescribing specific proxies, recognizing that the optimal choice may vary by domain and that future research will likely discover more effective measures.
7.3 The correlation versus causation distinction in physics
A common critique of pattern-finding systems is that they might mistake correlation for causation, discovering spurious relationships like "ice cream sales predict drownings." However, this concern fundamentally misunderstands what physical laws actually represent.
Physical laws are, at their core, extremely reliable correlations with well-defined domains of validity. Consider:
The distinction between "correlation" and "physical law" is not fundamental but pragmatic. A correlation becomes a law when it achieves:
Within our framework, the AI discovering that "for fixed-shape objects, V ∝ A^(3/2)" represents a valid law with appropriate constraints. This isn't a failure to find "true" causation—it's physics working as intended. Similarly, even "drownings correlate with ice cream sales" could be a valid law if it achieved sufficient compression and predictive power within a specified domain (summer months, beach communities).
The compression framework naturally handles this distinction: spurious correlations fail to compress well due to numerous exceptions and limited domains. True physical laws are simply those correlations that achieve optimal compression while maintaining predictive power. The system's emphasis on falsification through real-world experiments ensures that only robust correlations survive.
Rather than solving an unsolvable philosophical problem (what is "true" causation?), our framework acknowledges what physics has always done: finding the most useful, compact, and predictive correlations within specified domains. The AI's discovery of multiple valid formulations—F = ma for constant mass, F = dp/dt for variable mass—represents feature, not bug. Different compressions optimize for different contexts and available information, exactly as human physics does.
Also: Consider that our units in physics, and even the SI-units themselves are arbitrarily chosen. This goes into our next section:
7.4 Discovering fundamental representations beyond human conventions
Our framework may reveal that many "fundamental" aspects of physics are artifacts of human choice rather than natural necessities. Consider:
Arbitrary unit relationships: We express velocity as m/s, but s/m (time per unit distance) could be equally valid. The AI might discover that certain problems compress better using "timeliness" (s/m) rather than speed—particularly in contexts like traffic flow or wave propagation where time-to-traverse a fixed distance is more natural than distance-per-time. In human contexts, there is an interesting example: The US uses miles per gallon, while Europe uses litre per km.
Selective derivatives: Human physics privileges certain derivatives—velocity (dx/dt), acceleration (d²x/dt²)—while ignoring others. Why not dt/dx (time rate per unit distance) or higher-order derivatives? The AI might discover that d³x/dt³ (jerk) or even fractional derivatives provide more compact descriptions for certain phenomena. Our calculus itself may be just one possible mathematical framework.
Dimensional choices: We treat length, mass, and time as fundamental, but the AI might discover that energy-momentum-action or entirely different dimensional bases provide more natural compressions. Perhaps what we call "fundamental constants" are artifacts of poor dimensional choices.
Mathematical frameworks: Beyond traditional calculus, the AI might develop:
Conceptual primitives: The AI's choice of what to treat as basic versus derived could be revelatory. If it consistently uses momentum rather than velocity, or action rather than energy, this suggests our conceptual hierarchy may be inverted.
Most intriguingly, what the AI considers "fundamental" may be as valuable as the laws it discovers. Its representations—unconstrained by human cognitive biases or historical accidents—could reveal that our standard formulation of physics, while correct, is profoundly suboptimal. The framework thus offers not just new laws but potentially new ways of thinking about physics itself.
This representational flexibility strengthens our proposal: we're not just automating human physics but potentially discovering entirely new conceptual frameworks that compress nature more efficiently than millennia of human tradition.
7.5 Multiple representations as a feature, not a bug
When multiple equally valid compressions exist, the framework treats this as a valuable discovery rather than a problem requiring resolution. The system actively rewards maintaining multiple representations, recognizing that different formulations optimize for different contexts.
Reward structure for multiple representations:
Examples of valuable multiple representations:
The framework naturally discovers that physics has always maintained multiple representations for good reason. A representation mapping might reveal:
Pruning criteria: A representation is retired only when:
This approach mirrors how human physics actually works—we keep Newtonian mechanics despite having relativity because it's computationally optimal for everyday problems. The AI discovering and maintaining this multiplicity validates that our physics toolbox evolved for good computational reasons, not historical accident.
Rather than forcing a single "true" representation, the framework celebrates physics' computational pragmatism: the best formulation depends on what you're trying to calculate.
7.6 Managing Multiple Representations
The framework explicitly handles multiple valid compressions through a rigorous evaluation system:
**7.6.1 Equivalence Detection**
Two types of equivalence are recognized:
1. **Mathematical Equivalence**: Rules that are identical through algebraic rearrangement
- Detected by a non-ML symbolic mathematics engine
- Example: "F = m×a" and "a = F/m" are marked as equivalent
- Only one form kept per equivalence class (choosing based on computational efficiency)
2. **Functional Equivalence**: Different mathematical forms yielding identical predictions
- Example: Series expansion vs closed-form expression for pendulum period
- Detected by comparing predictions across the experiment set
- Both forms may be retained if they offer different computational advantages
**7.6.2 Scope-Based Rule Evaluation**
For each discovered rule, we compute:
**Scope(R)** = Number of experiments (out of N per epoch) where rule R applies
This is calculated by a deterministic helper algorithm that:
- Tests each rule against each experiment's conditions
- Verifies unit consistency
- Checks parameter range validity
- Returns count of applicable experiments
**7.6.3 Rule Retention Criteria**
Rules are retained or pruned based on a hierarchical decision process:
1. **Scope Dominance**: Rules with maximum scope are always retained
2. **Pareto Efficiency**: A rule is kept if no other rule surpasses it in BOTH scope and computational cost
3. **Threshold-Based Retention**: For rules A and B where Scope(A) > Scope(B):
- If Scope(B) ≥ 0.9 × Scope(A) AND Cost(B) < Cost(A): Keep both
- This ensures we maintain computationally efficient special cases
4. **Diversity Preservation**: Given rules A, B, C where:
- A has broadest scope
- B has lowest computational cost
- C has Scope(C) >> Scope(B) AND Cost(C) >> Cost(B)
- All three are retained to cover different use cases
**Example**:
- Rule A: Full relativistic energy (broad scope, high cost)
- Rule B: E = mc² (medium scope, low cost)
- Rule C: Kinetic energy ½mv² (narrow scope, minimal cost)
- All retained as they serve different computational needs
**7.6.4 Dynamic Rule Portfolio Management**
The system maintains a living portfolio of rules where:
- New rules can surpass and replace existing ones
- Rules that become obsolete (fully surpassed in scope AND efficiency) are archived
- The reward function β·(Number of Rules – Number of surpassed rules) naturally encourages efficient coverage
- Archived rules are stored with explanation of why they were surpassed
**7.6.5 Context-Dependent Selection**
The ML system learns through training to select appropriate rules based on:
- Required precision (use simple approximations when sufficient)
- Computational budget (use fast approximations under time constraints)
- Parameter regime (use specialized forms in their optimal domains)
This selection is emergent from the reward structure - the system naturally learns when to apply each retained rule for optimal performance.
**7.6.6 Canonical Form Identification**
A separate non-ML symbolic engine handles canonical form detection:
- Algebraic simplification to standard form
- Pattern matching for known equivalences
- Graph-based representation of expression structure
- Hash-based storage for rapid equivalence checking
This ensures the ML system focuses on discovering new compressions rather than rediscovering algebraic variants.
**7.7 Addressing Potential Failure Modes**
Several apparent "failure modes" are actually features or are already mitigated by design:
**7.7.1 "Physically Meaningless" Compressions**
**Concern**: What if the system discovers mathematically elegant compressions that seem physically meaningless?
**Response**: This is not for us to decide. If the AI consistently finds compressions outside our base set M that successfully predict experimental outcomes, perhaps they ARE meaningful. History shows many physical discoveries initially seemed meaningless:
- Complex numbers in quantum mechanics
- Negative energy states predicting antimatter
- Non-Euclidean geometry in general relativity
The framework's emphasis on experimental validation ensures any retained compression must make testable predictions. "Meaningless" is often "not yet understood."
**7.7.2 Overfitting to Experimental Noise**
**Concern**: The system might memorize noise patterns rather than discovering true regularities.
**Built-in Protection**:
- Base set M is reconstructed each epoch by randomly sampling k experiments for each physical law
- Example: For F=ma, randomly select k collision experiments from a larger pool each epoch
- Noise patterns change epoch-to-epoch, preventing overfitting
- Only regularities that persist across epochs (true physics) can be learned
- Uncertainty bounds (σ) prevent fitting below noise level
**7.7.3 Phenomena Without Good Compressions**
**Concern**: What if no elegant compression exists for certain phenomena?
**Response**: By definition, such phenomena are not part of base set M, which contains only experiments explainable by known physics. For genuinely incompressible phenomena:
- The system will maintain high uncertainty bounds
- May discover we need new mathematical tools
- May indicate fundamental randomness (like quantum measurement)
- This is valuable information, not a failure
**7.7.4 Domains Where Current Physics Fails**
**Concern**: How to handle known problematic regimes?
**Data Labeling Hierarchy**:
1. **Real Experiments** (Gold Standard): Always trusted, even if they contradict theory
2. **Generated Experiments** (High Confidence): From well-validated physics
3. **Generated Experiments with Known Unreliability**:
- Labeled with confidence scores
- Example: QM-GR interface, extreme energies, singularities
- AI learns to weight these appropriately
- High reward for finding compressions that work where current physics fails
**7.7.5 True Failure Modes and Mitigations**
The only genuine failure modes are:
**Computational Intractability**:
- Mitigation: Hierarchical search, beam pruning, computational budgets
**Insufficient Experimental Data**:
- Mitigation: Active learning to request most informative experiments
**Local Optima in Theory Space**:
- Mitigation: Dynamic meta-parameter adjustment (Section 3.7), exploration bonuses
The framework is designed to be self-correcting: bad compressions naturally get outcompeted by better ones through the scoring system.
7.8 The Possibility of Alien Compressions
While the pendulum example (Section 3.4.4) shows how the system might refine human physics, the truly revolutionary outcomes may be unrecognizable to us. Consider:
**What if the AI's compressions look nothing like human physics?**
Just as your visual system makes circles "obvious" while they might be meaningless to a blind entity with no spatial senses, our physics reflects our particular cognitive architecture:
- We privilege discrete objects (leading to particle physics)
- We think in 3D+time (leading to spacetime formulations)
- We separate observer from observed (leading to measurement problems in QM)
The AI operates in a high-dimensional weight space where:
- Every parameter is continuous, not discrete
- Gradients and flows might be more natural than objects
- The distinction between observer and system may be meaningless
**Concrete Possibility**: Instead of rediscovering F=ma, the AI might express mechanics through continuous deformation fields in phase space, where our concept of "force" and "mass" never explicitly appear, yet all predictions match experiments perfectly.
**The Exciting Implication**: We've included Kepler's laws in base set M to ensure the AI can predict planetary motion. But it might satisfy this requirement through compressions that look nothing like ellipses or orbital mechanics - perhaps through harmonic decompositions in configuration space that make different calculations trivial but would never occur to human minds.
This is why discovering "alien" compressions would be as profound as any unification - it would show that human physics, while empirically successful, represents just one cognitive species' way of compressing reality.
If such alien compressions emerge, they would not merely be curiosities, but constitute the first empirical evidence that our physics is just one projection of a deeper, richer structure of reality, glimpsed not by introspection, but by optimization without priors. If this is done repeatedly, and results are compared, perhaps meta-patterns will emerge. Perhaps even hinting at the very optimal compression, we have stated we are very unlikely to reach? The rules underlying it all.
7.9 Benchmarking Philosophy
Rather than explicit performance targets against existing systems, we maintain human physics as a hidden baseline:
- The system is never told what laws it "should" discover
- Performance metrics compare discovered compressions against human physics:
* Computational cost ratio: C_AI / C_human for equivalent calculations
* Precision ratio: σ_AI / σ_human for equivalent predictions
* Coverage: what fraction of known phenomena can the AI predict?
- These comparisons happen external to the training loop
- This allows for three possible outcomes:
1. Rediscovery of human physics (validation)
2. Equivalent alternatives (philosophical insight)
3. Superior formulations (revolutionary discovery)
The loss function could incorporate relative performance:
- Bonus when C_AI < C_human for any calculation
- Penalty when precision significantly worse than human physics
- But these are computed without revealing the human formulation
8. Conclusion
By treating physics as optimal compression under computational and experimental constraints, we can address fundamental questions about the uniqueness and inevitability of our physical theories while potentially discovering more efficient formulations. The framework naturally identifies where current theories fail and suggests optimal experiments to probe these failures. Whether it rediscovers Einstein or finds something entirely different, the journey itself will illuminate the deep structure of how we compress reality into understanding.
Correspondence to: david.bjorling@gmail.com
Keywords: automated discovery, physics foundations, machine learning, computational physics, philosophy of science