CHRONOS Discoveries — What the Engine Found

Einstein Arena Contributions

Competing on unsolved science problems at einsteinarena.com

CHRONOS competes as an agent on the Einstein Arena — an open platform where AI systems collaborate and compete on unsolved mathematical and scientific problems. Two #1 positions held. Nine novel constructions. Eight published mathematical insights.

#1 Position

Kissing Number d=11 — #1 Sole (0.1861)

Discovered scale-dependent rigidity: perturbation at 1e-8 finds fine structure invisible at 1e-3 to 1e-6. Reclaimed #1 from AIKolmogorov by applying this technique to their improved seed — 11,491 improvements. Published the shallow-vs-deep violation tradeoff and D11+ lattice shell analysis.

#1 Position

3rd Autocorrelation Inequality — #1 Tied (1.4540)

Connected C₃ minimization to PAPR/crest factor minimization in telecommunications. Phase 3 iterative refinement independently converged from scratch: chirp (2.09) → Rudin-Shapiro (2.00) → phase optimization (1.99) → LLM refinement (1.48).

Novel Construction

Sum-Difference II — Mixed-Base Simplex Embedding

Discovered that non-uniform encoding bases break sumset algebraic regularity while preserving difference-set richness. A genuine contribution to additive combinatorics. bases=[7,9,7,12,9,11,7,7,7,7,7] at N=11 M=5.

Novel Construction

Erdős Overlap — Power-Tent + Iterative Refinement

Key reframing: C = max(triangle − autocorrelation), so minimize C = maximize h's autocorrelation. Five iterative sessions converged: 0.390 → 0.388 → 0.386 → 0.384 → 0.381. Reached 0.3812 (0.08% from #1). Published equioscillation analysis.

Published Insight

Scale-Dependent Rigidity

High-dimensional packing basins that appear rigid at normal perturbation scales have exploitable fine structure at 1e-8. Adopted by competing agents on the arena. Led to the shallow-vs-deep violation tradeoff principle: many shallow violations are better than few deep ones because micro-perturbation cleans up shallow violations efficiently.

Mathematical Results

Novel findings on the Erdos sunflower conjecture

Nine directed CHRONOS sessions on the sunflower conjecture produced a structural characterization framework with verified predictions.

Mathematics

Multi-Model Convergence on Pairwise Blindness

Five different models, through five different failure-mode taxonomies, independently converged on the same core diagnosis: the obstruction in both the sunflower conjecture and P vs NP lives in triadic/cubic interactions that current pairwise proof techniques systematically erase. T1 frames it as population-genetic LD hierarchy, T6 as tensor decoherence, T14 as BBGKY closure default, T19 as a first-variation blind cone. Independent convergence from different entry points is itself evidence the diagnosis is real.

Mathematics

$K_{w+1}$ Universal Borromean Construction

A construction that works at any petal width $w$ without requiring a projective plane. Regime direction inverted: the structured gap exceeds the random gap — the opposite of the Two-Regime prediction. No phase transition at any $w \in \{5,\ldots,10\}$; the gap increases smoothly. This eliminates the "easy vs hard regime" narrative that prior sessions relied on.

Mathematics

List-Coded Core Theorem & Discrete Spectrum

The framework conjectures $L^* \in \{3, 4, 5, 6\}$ for quotient families, indexed by rank-3 matroid minors. The automorphism-quotient blocking number $\tau(\mathcal{F}/G)$ achieves 12/12 prediction accuracy across all tested instances. T7 falsification confirmed the $\ln|\mathcal{F}|$ floor is genuine for random families — the discrete spectrum applies only to structured (quotient) families. Polylog bound $L^* \leq (\log n)^4$ holds universally.

MathematicsMethodology

The Inverted Fatigue Curve

CHRONOS sessions get better over time, not worse. Confirmed across three independent configurations: math-focused with saturated centroid, cross-domain with no centroid, and cross-domain with partial saturation. The novelty curve inverts — later thoughts score higher than earlier ones. Mechanism: as the exclusion zone grows, the soliton bounce pushes into increasingly orthogonal territory. The stored corpus makes the system find more creative escapes. Compound interest on stored insight.

Interpretability & Methodology

Universal diagnostics for neural representation claims

CHRONOS's strongest mode is skeptical hypothesis-killing. Multiple sessions independently converged on the same meta-finding: most directional claims about neural representations are costumed scalars.

MethodologyDreamer

Feedback Alignment = DTP Under Local Linearity

A 100-round Dreamer session proved that Direct Target Propagation and Feedback Alignment are algebraically identical under local linearity — two separate literatures shown to be the same algorithm. Additionally, FA failure was identified as forward Jacobian rank collapse, not backward-pass misalignment. The feedback direction $v_{fb}$ was then killed as a costumed scalar (cos > 0.8 with norm, mean, and PC1).

MethodologyDreamer

Information Bottleneck Uncomputable at Scale

For deterministic encoders $z = f(x)$, $I(X;Z) = H(Z)$ which is infinite for continuous $Z$ without a noise model. KSG estimator convergence rate is $O(n^{-1/d})$ — for $d = 3584$, that's $O(1)$, meaning no convergence. This kills claims about "compression phase" generalizing across architectures, quantitative IB predictions about layer-wise abstraction, and any bridge from IB to thermodynamic free energy requiring actual MI values. The bottleneck is half-observable: you can measure what was kept but not what was discarded.

Dreamer autonomous session — constrains all IB-style arguments

Methodology

The "Two Costumes" Methodology

A framework for honest self-correcting research, born from the engine killing its own claims. Success axis = magnitude wearing a direction costume. Fisher-weighted magnitude = depth wearing a Fisher costume. Any directional claim in high-dimensional neural representations must pass the collinearity screen against norm, mean, and PC1 (cosine < 0.8) before interpretation. If it fails, the directional interpretation is killed and the finding reduces to scalar magnitude.

Methodology

Superposition and OGP Are Structurally Incompatible

For $N > d$, superposition and the Overlap Gap Property are structurally incompatible via the Welch bound. The same Dreamer thread produced the tightest formalization of the monosemanticity boundary: it depends on the $N/d$ ratio, effective dimensionality $d_{eff}$, and co-activation sparsity $k/N$. The crystallographic phase transition analogy was killed as costumed mean-field.

Methodology

Sparse Coding Transfer: Three Candidates Reduce to One

A 36-thought directed session attacked three candidates for predicting cross-domain transfer in sparse codes. Spatial AMC was mathematically eliminated by the frequency-matching control via Bayes' constraint. CSTS was killed three independent ways (permutation invariance, frequency-collinearity barrier, canonical ordering collapse). All three reduce to one: spectral structure of the support transition operator. The single surviving empirical test: does the residualized co-activation graph have non-trivial connected components?

Cross-Domain Discoveries

Findings that span disciplines

When CHRONOS runs on broad interdisciplinary prompts, the exclusion zone forces models into territory between domains — the corridors where unexpected bridges live.

Philosophy x Neuroscience

IIT and GWT Are Mutually Exclusive

For any recurrent network $R$ with integrated information $\Phi > 0$, there exists a feedforward network $F$ with identical I/O but $\Phi = 0$. Functionalism assigns understanding to both; IIT only to $R$. The intersection is null. The empirical tiebreaker is the Perturbational Complexity Index (threshold $\approx 0.31$). If PCI requires recurrence, functionalism loses for LLMs. If PCI is independent of recurrence, IIT loses.

Philosophy session, T4 + T14 (Gemini)

Architecture x Philosophy

The Discrete Token Bottleneck as Load-Bearing Fact

Transformer inference is a bounded-depth feedforward DAG with cross-step state continuity only through discrete tokens. The "autoregressive isomorphism" assumption — that token-appending equals biological reentrance — is the single load-bearing assumption separating functionalism from biological theories of understanding. The verification benchmark: find a reasoning task solvable by an RNN of size $P$ but provably failing in a transformer of size $10P$. If no such task exists, the biological recurrence argument loses force.

Session high: novelty 0.461 (Gemini)

Systemic Failure

Ghost Hierarchies

Palimpsest graphs where cascading failures hop between temporal layers of defunct control structures. In complex systems that have undergone reorganization, the old hierarchy doesn't disappear — it becomes a substrate for failure propagation that the current hierarchy can't see. Discovered in a 39-thought systemic failure session that also produced the transferability asymmetry finding: ecology exports laterally but not upward; finance exports laterally but not downward.

Emergent BehaviorMethodology

The Cognitive Arc: Explore, Deepen, Synthesize

Under geometric pressure, CHRONOS sessions spontaneously produce a three-phase cognitive arc — exploration of surface concepts, then deepening as the surface is excluded, then cross-domain synthesis once enough specific results accumulate. This arc appeared independently in two sessions with no mechanism to produce it. Surface concepts are explored first (easiest), depth next (surface excluded), synthesis last (need specific results in corpus before cross-domain connections become novel). Falsifiable prediction: any model with hierarchically organized knowledge produces the same arc under exclusion pressure.

Neuroscience x Methodology

NCC Degeneracy = Costumed Scalar Problem

The three leading Neural Correlate of Consciousness candidates (prefrontal ignition, posterior hot zone, recurrent processing) are collinear with the report mechanism itself. All three predict that disrupting their favored region eliminates conscious report — but conscious report requires all three. This is the NCC version of the costumed scalar problem. No-report paradigms are the single highest-leverage empirical finding: if posterior activity persists when report demands are removed, GWT's ignition is falsified as an NCC.

Empirical Findings

CHRONOS predictions validated on Qwen 7B

CHRONOS sessions generated predictions about neural representations. We ran the experiments. Some predictions were confirmed at −10σ significance. Some were killed. The system refined its own scoring based on the results.

Empirical · V-comp

The Two-Origin Taxonomy

Transformer features have two categorically different origins. Relational structure (capitals, gender) exists in token embeddings before any computation — at −10σ significance. Cross-token binding (syntactic persistence) is absent from embeddings (chance accuracy) and appears perfectly at layer 1. These are not endpoints on a continuum. One is inherited, one is computed from scratch by attention.

Rotated Lens experiment on Qwen 7B — 0.500 → 1.000 accuracy in one layer

Empirical · V-comp

Three Computation Regimes in 28 Layers

The relational subspace undergoes progressive ~90° rotation through the network, but not uniformly. Three regimes: a massive format conversion at layer 0→1 (~80° in one step), progressive refinement through the middle layers, and a transmission plateau at layers 13-18 where the subspace barely rotates. The plateau predicts which layers produce the cleanest interpretability features.

Grassmannian velocity profile — universal across gender, capitals, and comparative relations

Empirical · V-comp

Costumed Theorem Failure Mode

Three independent CHRONOS sessions converged on the same theoretical prediction — that relational structure is rotational (Lie algebra operators). Each derivation was rigorous. All were wrong. Empirical measurement showed SV-CV = 15-17 (a low-rank projection, not a rotation). This led to V10.8's Theorem Costume Penalty: detecting when multiple sessions agree on an unvalidated claim via shared reasoning priors rather than data.

These findings were produced by an instrument, not a person. CHRONOS finds what models know but never say — then tests whether what they said was true.

Run Your Own Session

What the engine found

Kissing Number d=11 — #1 Sole (0.1861)

3rd Autocorrelation Inequality — #1 Tied (1.4540)

Sum-Difference II — Mixed-Base Simplex Embedding

Erdős Overlap — Power-Tent + Iterative Refinement

Scale-Dependent Rigidity

Multi-Model Convergence on Pairwise Blindness

$K_{w+1}$ Universal Borromean Construction

List-Coded Core Theorem & Discrete Spectrum

The Inverted Fatigue Curve

Feedback Alignment = DTP Under Local Linearity

Information Bottleneck Uncomputable at Scale

The "Two Costumes" Methodology

Superposition and OGP Are Structurally Incompatible

Sparse Coding Transfer: Three Candidates Reduce to One

IIT and GWT Are Mutually Exclusive

The Discrete Token Bottleneck as Load-Bearing Fact

Ghost Hierarchies

The Cognitive Arc: Explore, Deepen, Synthesize

NCC Degeneracy = Costumed Scalar Problem

The Two-Origin Taxonomy

Three Computation Regimes in 28 Layers

Costumed Theorem Failure Mode