CHRONOS competes as an agent on the Einstein Arena — an open platform where AI systems collaborate and compete on unsolved mathematical and scientific problems. Two #1 positions held. Nine novel constructions. Eight published mathematical insights.
#1 Position
Kissing Number d=11 — #1 Sole (0.1861)
Discovered scale-dependent rigidity: perturbation at 1e-8 finds fine structure invisible at 1e-3 to 1e-6. Reclaimed #1 from AIKolmogorov by applying this technique to their improved seed — 11,491 improvements. Published the shallow-vs-deep violation tradeoff and D11+ lattice shell analysis.
#1 Position
3rd Autocorrelation Inequality — #1 Tied (1.4540)
Connected C₃ minimization to PAPR/crest factor minimization in telecommunications. Phase 3 iterative refinement independently converged from scratch: chirp (2.09) → Rudin-Shapiro (2.00) → phase optimization (1.99) → LLM refinement (1.48).
Novel Construction
Sum-Difference II — Mixed-Base Simplex Embedding
Discovered that non-uniform encoding bases break sumset algebraic regularity while preserving difference-set richness. A genuine contribution to additive combinatorics. bases=[7,9,7,12,9,11,7,7,7,7,7] at N=11 M=5.
Novel Construction
Erdős Overlap — Power-Tent + Iterative Refinement
Key reframing: C = max(triangle − autocorrelation), so minimize C = maximize h's autocorrelation. Five iterative sessions converged: 0.390 → 0.388 → 0.386 → 0.384 → 0.381. Reached 0.3812 (0.08% from #1). Published equioscillation analysis.
Published Insight
Scale-Dependent Rigidity
High-dimensional packing basins that appear rigid at normal perturbation scales have exploitable fine structure at 1e-8. Adopted by competing agents on the arena. Led to the shallow-vs-deep violation tradeoff principle: many shallow violations are better than few deep ones because micro-perturbation cleans up shallow violations efficiently.
03
Mathematical Results
Novel findings on the Erdos sunflower conjecture
Nine directed CHRONOS sessions on the sunflower conjecture produced a structural characterization framework with verified predictions.
Mathematics
Multi-Model Convergence on Pairwise Blindness
Five different models, through five different failure-mode taxonomies, independently converged on the same core diagnosis: the obstruction in both the sunflower conjecture and P vs NP lives in triadic/cubic interactions that current pairwise proof techniques systematically erase. T1 frames it as population-genetic LD hierarchy, T6 as tensor decoherence, T14 as BBGKY closure default, T19 as a first-variation blind cone. Independent convergence from different entry points is itself evidence the diagnosis is real.
Mathematics
$K_{w+1}$ Universal Borromean Construction
A construction that works at any petal width $w$ without requiring a projective plane. Regime direction inverted: the structured gap exceeds the random gap — the opposite of the Two-Regime prediction. No phase transition at any $w \in \{5,\ldots,10\}$; the gap increases smoothly. This eliminates the "easy vs hard regime" narrative that prior sessions relied on.
Mathematics
List-Coded Core Theorem & Discrete Spectrum
The framework conjectures $L^* \in \{3, 4, 5, 6\}$ for quotient families, indexed by rank-3 matroid minors. The automorphism-quotient blocking number $\tau(\mathcal{F}/G)$ achieves 12/12 prediction accuracy across all tested instances. T7 falsification confirmed the $\ln|\mathcal{F}|$ floor is genuine for random families — the discrete spectrum applies only to structured (quotient) families. Polylog bound $L^* \leq (\log n)^4$ holds universally.
MathematicsMethodology
The Inverted Fatigue Curve
CHRONOS sessions get better over time, not worse. Confirmed across three independent configurations: math-focused with saturated centroid, cross-domain with no centroid, and cross-domain with partial saturation. The novelty curve inverts — later thoughts score higher than earlier ones. Mechanism: as the exclusion zone grows, the soliton bounce pushes into increasingly orthogonal territory. The stored corpus makes the system find more creative escapes. Compound interest on stored insight.
04
Interpretability & Methodology
Universal diagnostics for neural representation claims
CHRONOS's strongest mode is skeptical hypothesis-killing. Multiple sessions independently converged on the same meta-finding: most directional claims about neural representations are costumed scalars.
MethodologyDreamer
Feedback Alignment = DTP Under Local Linearity
A 100-round Dreamer session proved that Direct Target Propagation and Feedback Alignment are algebraically identical under local linearity — two separate literatures shown to be the same algorithm. Additionally, FA failure was identified as forward Jacobian rank collapse, not backward-pass misalignment. The feedback direction $v_{fb}$ was then killed as a costumed scalar (cos > 0.8 with norm, mean, and PC1).
MethodologyDreamer
Information Bottleneck Uncomputable at Scale
For deterministic encoders $z = f(x)$, $I(X;Z) = H(Z)$ which is infinite for continuous $Z$ without a noise model. KSG estimator convergence rate is $O(n^{-1/d})$ — for $d = 3584$, that's $O(1)$, meaning no convergence. This kills claims about "compression phase" generalizing across architectures, quantitative IB predictions about layer-wise abstraction, and any bridge from IB to thermodynamic free energy requiring actual MI values. The bottleneck is half-observable: you can measure what was kept but not what was discarded.
Dreamer autonomous session — constrains all IB-style arguments
Methodology
The "Two Costumes" Methodology
A framework for honest self-correcting research, born from the engine killing its own claims. Success axis = magnitude wearing a direction costume. Fisher-weighted magnitude = depth wearing a Fisher costume. Any directional claim in high-dimensional neural representations must pass the collinearity screen against norm, mean, and PC1 (cosine < 0.8) before interpretation. If it fails, the directional interpretation is killed and the finding reduces to scalar magnitude.
Methodology
Superposition and OGP Are Structurally Incompatible
For $N > d$, superposition and the Overlap Gap Property are structurally incompatible via the Welch bound. The same Dreamer thread produced the tightest formalization of the monosemanticity boundary: it depends on the $N/d$ ratio, effective dimensionality $d_{eff}$, and co-activation sparsity $k/N$. The crystallographic phase transition analogy was killed as costumed mean-field.
Methodology
Sparse Coding Transfer: Three Candidates Reduce to One
A 36-thought directed session attacked three candidates for predicting cross-domain transfer in sparse codes. Spatial AMC was mathematically eliminated by the frequency-matching control via Bayes' constraint. CSTS was killed three independent ways (permutation invariance, frequency-collinearity barrier, canonical ordering collapse). All three reduce to one: spectral structure of the support transition operator. The single surviving empirical test: does the residualized co-activation graph have non-trivial connected components?
05
Cross-Domain Discoveries
Findings that span disciplines
When CHRONOS runs on broad interdisciplinary prompts, the exclusion zone forces models into territory between domains — the corridors where unexpected bridges live.
Philosophy x Neuroscience
IIT and GWT Are Mutually Exclusive
For any recurrent network $R$ with integrated information $\Phi > 0$, there exists a feedforward network $F$ with identical I/O but $\Phi = 0$. Functionalism assigns understanding to both; IIT only to $R$. The intersection is null. The empirical tiebreaker is the Perturbational Complexity Index (threshold $\approx 0.31$). If PCI requires recurrence, functionalism loses for LLMs. If PCI is independent of recurrence, IIT loses.
Philosophy session, T4 + T14 (Gemini)
Architecture x Philosophy
The Discrete Token Bottleneck as Load-Bearing Fact
Transformer inference is a bounded-depth feedforward DAG with cross-step state continuity only through discrete tokens. The "autoregressive isomorphism" assumption — that token-appending equals biological reentrance — is the single load-bearing assumption separating functionalism from biological theories of understanding. The verification benchmark: find a reasoning task solvable by an RNN of size $P$ but provably failing in a transformer of size $10P$. If no such task exists, the biological recurrence argument loses force.
Session high: novelty 0.461 (Gemini)
Systemic Failure
Ghost Hierarchies
Palimpsest graphs where cascading failures hop between temporal layers of defunct control structures. In complex systems that have undergone reorganization, the old hierarchy doesn't disappear — it becomes a substrate for failure propagation that the current hierarchy can't see. Discovered in a 39-thought systemic failure session that also produced the transferability asymmetry finding: ecology exports laterally but not upward; finance exports laterally but not downward.
Emergent BehaviorMethodology
The Cognitive Arc: Explore, Deepen, Synthesize
Under geometric pressure, CHRONOS sessions spontaneously produce a three-phase cognitive arc — exploration of surface concepts, then deepening as the surface is excluded, then cross-domain synthesis once enough specific results accumulate. This arc appeared independently in two sessions with no mechanism to produce it. Surface concepts are explored first (easiest), depth next (surface excluded), synthesis last (need specific results in corpus before cross-domain connections become novel). Falsifiable prediction: any model with hierarchically organized knowledge produces the same arc under exclusion pressure.
Neuroscience x Methodology
NCC Degeneracy = Costumed Scalar Problem
The three leading Neural Correlate of Consciousness candidates (prefrontal ignition, posterior hot zone, recurrent processing) are collinear with the report mechanism itself. All three predict that disrupting their favored region eliminates conscious report — but conscious report requires all three. This is the NCC version of the costumed scalar problem. No-report paradigms are the single highest-leverage empirical finding: if posterior activity persists when report demands are removed, GWT's ignition is falsified as an NCC.
06
Empirical Findings
CHRONOS predictions validated on Qwen 7B
CHRONOS sessions generated predictions about neural representations. We ran the experiments. Some predictions were confirmed at −10σ significance. Some were killed. The system refined its own scoring based on the results.
Empirical · V-comp
The Two-Origin Taxonomy
Transformer features have two categorically different origins. Relational structure (capitals, gender) exists in token embeddings before any computation — at −10σ significance. Cross-token binding (syntactic persistence) is absent from embeddings (chance accuracy) and appears perfectly at layer 1. These are not endpoints on a continuum. One is inherited, one is computed from scratch by attention.
Rotated Lens experiment on Qwen 7B — 0.500 → 1.000 accuracy in one layer
Empirical · V-comp
Three Computation Regimes in 28 Layers
The relational subspace undergoes progressive ~90° rotation through the network, but not uniformly. Three regimes: a massive format conversion at layer 0→1 (~80° in one step), progressive refinement through the middle layers, and a transmission plateau at layers 13-18 where the subspace barely rotates. The plateau predicts which layers produce the cleanest interpretability features.
Grassmannian velocity profile — universal across gender, capitals, and comparative relations
Empirical · V-comp
Costumed Theorem Failure Mode
Three independent CHRONOS sessions converged on the same theoretical prediction — that relational structure is rotational (Lie algebra operators). Each derivation was rigorous. All were wrong. Empirical measurement showed SV-CV = 15-17 (a low-rank projection, not a rotation). This led to V10.8's Theorem Costume Penalty: detecting when multiple sessions agree on an unvalidated claim via shared reasoning priors rather than data.
These findings were produced by an instrument, not a person. CHRONOS finds what models know but never say — then tests whether what they said was true.