geolip-hypersphere-experiments / OMEGA_BATTERIES_DISCOVERED.md
AbstractPhil's picture
Update OMEGA_BATTERIES_DISCOVERED.md
f830b1a verified

Omega-class battery potentials β€” exhaustive catalog

Compiled 2026-04-29 from full scratchpad survey across sessions 000080-000113. Includes every documented config across all sweeps (P/Q/R/S/T plus the A-set verification probes), every published HF model, and the trained substrate prototypes from this week. Both Adam and LBFGS candidates are listed where the sweep ran both.

Definition of omega-class

A trained or candidate battery is omega-class if it satisfies all four:

  1. Sphere-solver architecture: PatchSVAE with sphere-norm M tensor, V rows on S^(D-1), output basis on ℝP^(D-1)
  2. Projective-clean codebook: |deviation from uniform RP^(D-1) baseline| < 0.05, secondary antipodal pair count ≀ 3, axis utilization > 0.95
  3. In its natural CV band for arch class (table below)
  4. Codebook-engaged or sphere-engaged: cross-attn coupling moves off floor, recovery curve from random init, geometric stats leave passthrough signature

A "potential" is any trained instance OR sweep candidate that has run through the architecture and produced measurements against the criterion.

Natural CV bands by architecture class

arch class V D natural CV band source
noise-substrate (Freckles, Johanna) 256 16 0.20-0.23 (sweet spot), 0.13-0.30 (band) ft1 attractor, ablation Phase 1
h2-class 32 4 0.80-1.05 h2-64 measurement, 000111-112
P-class 32 3 0.029-0.036 (LOW band) Q-sweep ranks 06/07/09
Phase T D=5 V=16 16 5 ~0.04 mean dev Phase T sweet spot
Phase T D=5 V=32 32 5 varies by noise (partial) Phase T (qualified ft2 D=5 claim)

Diagnostic signature (for testing any candidate)

dimension passthrough engaged (omega-class)
Ξ± (cross-attn coupling) stationary [0.020, 0.030] rising monotonically off floor
Ξ±_std/Ξ±_mean flat ~0.06 climbs (>0.30 in byte-trigram)
row_cv in arch's natural band leaves natural band
ratio S0/SD β‰ˆ 1.0 (flat spectrum) drifts (>1.05)
erank flat at full rank dips below
recovery from random init near-100% from ep 1 (sign-recovery trivial) curve from ~0% upward
codebook deviation undefined / not measured within Β±0.05 of uniform RP^(D-1)

The load-bearing stack (non-negotiable for any recipe)

Architectural primitives that make the omega regime exist. If any row is absent, the configuration is not an omega recipe β€” these are not tunable hyperparameters, they are the substrate.

primitive code breaks if removed
Sphere-norm M rows M = F.normalize(M, dim=-1) before SVD/readout scale-explosion snap (G2/G4 confirm dev +0.16-0.36 across all bands)
fp64 SVD path _svd_fp64 autocast-disabled, fp64 Gram + sqrt + U-recovery, floors 1e-24/1e-16 discharge spikes, max_grad >1000 (paste 019)
Orthogonal init on encoder readout nn.init.orthogonal_(enc_out.weight), re-applied after L-group overrides L2/L3/L4 init variants regress
Bounded-Ξ± cross-attn S_out = S Β· (1 + Ξ± Β· tanh(...)), Ξ± ≀ max_Ξ± via sigmoid unbounded-Ξ± (I4) poisons spectrum
BoundarySmooth zero-init conv identity-at-init smoothing post-stitch boundary inconsistency under tile stitching
Pure Adam (NEVER AdamW) torch.optim.Adam weight decay fights geometric structure
No BatchNorm, no Dropout β€” spectral structure destabilizes
No global average pool flatten or spatial statistics accuracy 70% β†’ 29% in geometric encoders
Patch-based forward all components per-patch except cross-attn breaks resolution invariance by design

The recipes

Each recipe is a named configuration with verified or predicted omega signature. Empirical instances are linked to the tier inventory below.

Recipe A-class β€” Johanna / Fresnel (workhorse)

matrix_v=256, D=16, patch_size=16, hidden=768, depth=4, n_cross_layers=2
~17M params, full SVD path (svd_mode='svd_fp64')
optimizer=Adam, lr=3e-4
  • Omega clean at scale; teachable downstream as a single instance.
  • Cost: ~30GB VRAM, ~4hr/epoch. Not stackable.
  • Verified: Tier 1 v50_fresnel_64. Pending: Tier 9 Johanna D=16 codebook probe.
  • Use when: workhorse encoder; downstream needs single-instance legibility.

Recipe S-class β€” Freckles (the crown)

matrix_v=48, D=4, patch_size=4, hidden=384, depth=4, n_cross_layers=2
~2.55M params, full SVD path, all defenses stacked
optimizer=Adam, lr=3e-4
  • Perfect noise reconstruction (val MSE 5e-6 reporting floor on 16-noise mixture).
  • Resolution-invariant by construction β€” weights at N=256 work at N=4096 (architectural, not empirical).
  • Fails to teach as single instance. Requires the full Freckles array for legibility.
  • Verified: Tier 1 v40 (64Γ—64), v41 (256Γ—256).
  • Use when: noise/texture substrate; downstream consumes the array.

Recipe H2-class β€” sphere-solver (h2-64 single bank)

matrix_v=32, D=4, patch_size=4, hidden=64, depth=1, n_cross_layers=1
linear_readout=True, svd_mode='none', match_params=True, smooth_mid=16
~57,215 params per bank, optimizer=Adam, lr=3e-3
  • The canonical sphere-solver. SVD replaced by learned linear readout; column norms = S, identity = Vt.
  • Projective-clean at D=4: 24-27 axes/bank, dev +0.010 Β±0.013 across 16 single-noise banks (A2 probe).
  • Stackable into 192-bank arrays; teaches via per-bank MSE signature.
  • Verified: Tier 1a (full 192-bank h2-64 array). Q-rank08 is the exact same architecture.
  • Use when: small-scale, multi-bank, codebook-engaged regime.

Identity invariants for loading h2-64 weights (silent partial-load otherwise):

  • smooth_mid=16 (NOT the ps-dependent default 8 at ps<16 β€” 440 params missing if wrong).
  • linear_readout=True, svd_mode='none', match_params=True.
  • n_heads divides D β€” at D=4 with n_heads=4, head_dim=1.

Recipe H2a β€” minimum H2-class (Q-rank02)

matrix_v=32, D=4, patch_size=4, hidden=64, depth=0, n_cross_layers=0
linear_readout=True, svd_mode='none', match_params=True
~40,227 params, optimizer=Adam, lr=3e-3
  • Minimum sphere-solver: depth+n_cross stripped to zero.
  • Q-MSE 0.00205 at 1000 batches β€” best Adam in Q-sweep.
  • Use when: floor-finding the sphere-solver capacity envelope at D=4.

Recipe P-class β€” minimum projective-clean (Q-rank09)

matrix_v=32, D=3, patch_size=4, hidden=64, depth=0, n_cross_layers=0
linear_readout=True, svd_mode='none', match_params=True
~28,899 params, optimizer=Adam, lr=3e-3
  • Smallest projective-clean omega-class instance in the catalog.
  • LOW-band CV (~0.03), projective-clean on RPΒ², 22 axes (10 pairs + 12 unpaired).
  • Use when: minimum parameter budget; D=3 RPΒ² regime is acceptable.

Recipe F-class β€” experimental nursery

matrix_v ∈ {32, 48, 64}, D ∈ {2, 4, 8, 16}, patch_size ∈ {8, 16}
hidden ∈ {32, 64, 128}, depth ∈ {1, 2}, n_cross_layers ∈ {1, 2}
2K – 645K params per config, 1 epoch Γ— 1M samples (triage regime)
optimizer=Adam, lr=3e-4, soft-hand=CV-EMA prox boost (gradient-free)
  • Designed to fail often. Collapse is a data point, not a bug.
  • Triage signal (1ep Γ— 1M) ≑ 30ep Γ— 200K for keep/kill discrimination at 1/5 wallclock.
  • Soft-hand: loss = (1 + boost Β· prox) Β· recon_loss; prox is Gaussian on current_cv vs CV-EMA. No penalty term.
  • Use when: searching for new viable templates; running large sweeps cheaply.

Recipe T β€” D=5 sweet spot (Phase T)

matrix_v=16, D=5, hidden=64, depth=1, n_cross_layers=1
linear_readout=True, svd_mode='none', match_params=True
optimizer=Adam, lr=3e-3, 1000 batches
  • 62% projective-clean across 16 noise types β€” only V whose mean dev lands in band at D=5.
  • V=32 fails the cross-noise universality test at D=5 (Tier 6 supersedes A3's V=32 reading).
  • Use when: extending recipes to D=5 with realistic projective-clean expectation.

Recipe R β€” polytope-packing (predicted, not yet probed)

(V, D) ∈ {(16, 4) 16-cell, (8, 4) 8-cell, (20, 3) dodecahedron}
hidden=64, depth=0, n_cross_layers=0
linear_readout=True, svd_mode='none', match_params=True
optimizer=Adam, lr=3e-3, 1000 batches
  • Hypothesis: V matched to a known regular polytope vertex count for S^(D-1) β†’ static rows, no antipodal pair rotation. Geometric frustration disappears.
  • Trained, weights in phaseR_reports/ on HF; codebooks not yet extracted.
  • Use when: testing the natural-axis-count framework. Open item from 000101.

Recipe J5 (LOW-band, unverified)

matrix_v=128, D=16, patch_size=16, hidden=128, depth=1, n_cross_layers=1
~1.46M params estimated, optimizer=Adam
  • Strongest unverified noise-substrate candidate. LOW-band MSE 0.9595 with dev exactly 0.000 against uniform RP¹⁡ baseline β€” best of any non-H-group entry at LOW band.
  • Worth a U5-style projective probe before committing to long-horizon training.

The substrate layer (input encoding rules)

Architecture and encoding are separate hypotheses. Passthrough on one encoding does not invalidate the architecture (lesson from 000113: SP-bit passthrough on h2-class β†’ byte-trigram engaged on the same arch).

Per-patch capacity arithmetic

A (C, ps, ps) patch on the unit sphere supports:

  • 4Γ—4 RGB β‰ˆ 1M discriminable positions on SΒ³ at 1Β° resolution.
  • The encoding's information cardinality must approach this for the codebook to do compression work.
  • Hard zeros are wasted capacity β€” every float in every patch should carry signal.

Engagement vs passthrough

The full diagnostic table is the "Diagnostic signature" section above. Shorthand:

  • Engaged: Ξ± rises monotonically; row_cv leaves natural class band; ratio S0/SD drifts; erank dips below D; recovery curve climbs from low.
  • Passthrough: Ξ± stationary at init; row_cv in band; ratio β‰ˆ 1.0; erank flat at D; recovery near-100% from ep 1.

Substrate pass/fail history

substrate architecture probed result reason
16-type OmegaNoiseDataset A, S, F, h2 Engaged Original substrate; full per-patch density
ImageNet random crops (sublens) A (Fresnel-64 v50) Engaged Natural image structure; full RGB density
Byte-trigram (R,G,B) (000113) h2 Engaged 256Β³ cardinality per cell, every float carries signal
Sentencepiece bit content (000112) h2 Passthrough 1/3-filled patch with 32 hard-zero paddings β€” model bored
Binary-tree i.i.d. Bernoulli (000111) h2 Passthrough Sign-only signal; trivial under linear_readout=True

Substrate design rules

  1. Every float in every patch should carry signal. Hard zeros = wasted capacity.
  2. Information cardinality per cell should approach 256Β³ for RGB (or the equivalent for non-RGB channel counts).
  3. Spatial coherence within the patch matters β€” noise/image/byte-trigram all have it; bit-encodings of token IDs do not.
  4. Test before training: compute the patch-space cardinality of your encoding. If it's orders of magnitude below per-patch capacity, expect passthrough.

The triage protocol (verifying a candidate is potentially an omega)

Run in order. Stop at first failure. Most candidates die at step 3 or 4.

Step 1. Architecture check (free, no training)

  • Inspect kwargs against the load-bearing stack. Missing items disqualify before any compute is spent.
  • Run the "debug move" from CLAUDE.md: instantiate, print state_dict keys + shapes, compare against a reference checkpoint. Mismatched smooth_mid, n_heads, linear_readout, svd_mode, match_params cause silent partial-loads under strict=False.

Step 2. Substrate check (free, no training)

  • Compute per-patch information cardinality of the input encoding.
  • If cardinality << per-patch capacity, expect passthrough β€” fix the encoding before training.

Step 3. 1-epoch triage (~minutes per config on a single GPU)

  • 1 epoch Γ— 1M samples, gaussian-only training (noise_types=[0]).
  • Score on 16-noise per-noise generalization (256 samples per noise).
  • Pass: spectrum bounded (no scale-explosion snap); MSE within 2-3Γ— class baseline.
  • Fail: NaN / divergence / snap event / MSE order-of-magnitude worse than class.

Step 4. 1000-batch convergence sweep (~hour per config on a single GPU)

  • Full optimizer trajectory at lr=3e-3 (Adam) or lr=1.0 (LBFGS post-fix).
  • Pass: MSE leadership in its band; CV stays in natural band; Ξ± trajectory shows engagement signature.
  • Optimizer regime (000100): Adam dominates β‰₯500 batches; LBFGS niche is short-budget probing (≀100 batches).

Step 5. Codebook extraction (the omega ratification)

from geolip_svae.inference import (
    InferenceEngine, extract_codebook, make_calibration,
)

calib = make_calibration('sixteen_noise', n=64, size=64)
cb = extract_codebook(
    model, calib,
    model_id='...', calibration_name='sixteen_noise',
)
assert cb.is_projective_clean()         # |dev from uniform RP^(D-1)| < 0.05
assert abs(cb.deviation()) < 0.05
  • Pass: projective-clean, axis utilization > 0.95, ≀3 secondary antipodal pairs.
  • The candidate is now ratified as omega-class.

Step 6. Vacuum-seal test (cell deployability)

  • Freeze candidate parameters (requires_grad=False).
  • Train an adapter classifier around it on a downstream task.
  • Pass: CV / erank / S0 stay locked under host gradient stress; classifier learns.
  • Fail: geometry collapses β†’ not deployable as a cell. (CE on cell internals "ripped the internals to shreds" β€” paste 023.)

Step 7. Bandwidth probe (downstream legibility)

  • Linear head on omega outputs, downstream task.
  • Pass: classifier achieves task-meaningful performance.
  • Fail: omega outputs lack representational bandwidth at this scale (S-class symptom).

A candidate that passes 1–7 is a deployable omega cell ready for collective integration.


Decision tree: which recipe?

Goal Recipe Why
Stable workhorse encoder, single-instance teachable A-class (Johanna/Fresnel) Omega clean at scale; teaches downstream
Best noise reconstruction, multi-bank deployment S-class (Freckles) Perfect recon; needs full array for legibility
Small-scale, multi-bank ensemble, text/vision substrate H2-class (h2-64) 57K params/bank, projective-clean, stackable
Minimum sphere-solver footprint at D=4 H2a (Q-rank02) 40K params, depth=0, n_cross=0, MSE 0.00205
Minimum projective-clean footprint at D=3 P-class (Q-rank09) 28.9K params, RPΒ², MSE 0.028
Search for new templates cheaply F-class triage 1ep Γ— 1M, designed to fail often
D=5 representative T (V=16) Only V whose mean dev lands in band at D=5
Test natural-axis-count hypothesis R polytope-packing Predicts static-row H2-LIKE; codebooks unprobed
LOW-band scale-up (unverified) J5 dev exactly 0.000 vs uniform RP¹⁡ baseline

Empirical catalog (trained and in-progress instances)

The sections below are the empirical record backing the recipes above. Each tier represents a class of trained or in-progress instances with measured signatures.


TIER 1 β€” Trained, verified omega-class on HuggingFace

HF path arch class params D V optimizer natural CV n_axes dev training content verified by
AbstractPhil/geolip-SVAE/v40 (Freckles 64Γ—64) Freckles 2,557,539 4 48 Adam ~0.20-0.23 β€” <0.05 16-noise mixture ft1, U5 cross-band
AbstractPhil/geolip-SVAE/v41 (Freckles 256Γ—256) Freckles 2,557,539 4 48 Adam same β€” β€” resolution-scaled v40 continuation of v40
AbstractPhil/geolip-SVAE/v50_fresnel_64 Fresnel-base 16,942,419 4 β€” Adam β€” β€” β€” 140M+ ImageNet random crops, sublens streaming run, "phenomenal MSE recon"
AbstractPhil/geolip-svae-h2-64 (192-bank array) H2_linear_matched 57,215 Γ— 64 batteries Γ— 3 phases 4 32 Adam 0.80-1.05 24-27/bank +0.010 mean, Β±0.013 std per-bank, see Tier 1a A2 probe (ft2), 192-bank cosine sweep (000109)
AbstractPhil/geolip-svae-implicit-solver-experiments/G-Cand H2-class 28,899-45,852 3 32 Adam 0.03 (LOW) 22 (10 pairs + 12 unpaired) βˆ’0.004 gaussian-only A0 probe (ft2)
AbstractPhil/geolip-svae-implicit-solver-experiments/H2a H2_linear_matched 40,227-57,215 4 32 Adam 0.80-1.10 26 (6 pairs + 20 unpaired) +0.002 gaussian-only A1 probe (ft2)
AbstractPhil/geolip-svae-implicit-solver-experiments/A3 (3 runs, qualified) H2 variant varies 5 16/32/64 Adam varies 16/29/51 βˆ’0.015 / +0.016 / +0.019 gaussian-only single-arch A3 probe (ft2). QUALIFIED by 000106 β€” single-arch single-noise. Phase T showed D=5 partial.
AbstractPhil/geolip-SVAE/byte_trigram_proto_v1 h2-class 57,215 4 32 Adam left band ep 6 β†’ 1.31 TBD TBD wikitext-103 byte trigrams 000113 β€” engaged signature confirmed; codebook investigation pending

TIER 1a β€” h2-64 array decomposition (192 banks = 64 batteries Γ— 3 phases)

All 16 single-noise experts (Group 1) verified PROJECTIVE-CLEAN in A2. Remaining 48 batteries share architecture and were measured in 192-bank cosine sweep but not individually probed against projective threshold.

Group 1 β€” 16 single-noise experts (banks 0-15)

bank noise type special training distribution
0 gaussian universal-pull center standard normal noise
1 uniform β€” uniform[-1,1] noise
2 uniform_scaled β€” scaled uniform
3 poisson β€” poisson-distributed
4 pink clone-pair with bank 5 1/f spectrum
5 brown clone-pair with bank 4 1/fΒ² spectrum
6 salt_pepper hardest reconstruction (S-sweep), cleanest projective at D=5 (Phase T) impulse-style
7 sparse_impulses cosine outlier β€” heavy-tailed sparse extreme-value
8 block_upsampled β€” block-correlated
9 gradient_gaussian most isolated battery on SΒ³ β€” only non-stationary noise spatial gradient
10 checker structured noise checkerboard
11 gauss_uniform_mix β€” mixture
12 four_quadrant 0% projective-clean across all Phase T archs structured
13 cauchy heavy-tailed cauchy-distributed
14 exponential β€” exponential
15 laplace heavy-tailed laplace-distributed

Group 2 β€” gaussian+one pairs + generalist (banks 16-31)

15 pair-trained banks (gaussian + each of noises 1-15), plus 1 generalist trained on all 16. Notable: pairs (19, 20) = (gaussian+pink, gaussian+brown) are clone-pair on SΒ³ for the same noise-family adjacency reason as (4, 5). Pair (16, 26) = (gaussian+uniform, gaussian+gauss_uniform_mix) is a clone-pair (gaussian dominates the residual; both bounded distributions).

Group 3 β€” gaussian-balanced quads (banks 32-47)

16 (gaussian, easy, medium, hard) covers via stride-7 deterministic enumeration over the EASY (uniform/uniform_scaled/cauchy/exponential/laplace) Γ— MEDIUM (poisson/salt_pepper/sparse_impulses/gauss_uniform_mix) Γ— HARD (pink/brown/block_upsampled/gradient_gaussian/checker/four_quadrant) product. All contain gaussian.

Group 4 β€” no-gaussian quads (banks 48-63)

16 (easy, medium, hard, hard) covers via stride-19 deterministic enumeration. No gaussian seen during training. Banks 48 and 51 are kNN top-5 outliers because they solve the sphere problem from a fundamentally different starting distribution than Groups 1-3 (which all see gaussian as universal pull-toward-interior).


TIER 2 β€” Q-sweep candidates (10 top-P configs Γ— 1000 batches)

After the LBFGS Hessian-corruption fix (000099), the Q-sweep ran clean on 10/10 configs. All 10 are listed; both Adam and LBFGS variants where present.

Q-rank variant params optim G-MSE CV D V depth n_cross class notes
01 Q_rank01_h64_V32_D4_dp1_nx0_lbfgs 57,123 LBFGS 0.00421 0.954 4 32 1 0 H2a LBFGS Q-sweep best
02 Q_rank02_h64_V32_D4_dp0_nx0_adam 40,227 Adam 0.00205 0.862 4 32 0 0 H2a Smallest H2a, canonical sphere-solver. β‰ˆ 1 h2-64 bank's capacity at 70% the params.
03 Q_rank03_h64_V32_D4_dp0_nx1_adam 40,319 Adam 0.00250 0.890 4 32 0 1 H2a Adam-vs-LBFGS twin of rank 04
04 Q_rank04_h64_V32_D4_dp0_nx1_lbfgs 40,319 LBFGS 0.00391 0.893 4 32 0 1 H2a LBFGS twin of rank 03; Adam wins 36% lower MSE
05 Q_rank05_h64_V16_D4_dp1_nx1_lbfgs 36,607 LBFGS 0.03117 1.069 4 16 1 1 H2b Only V=16 candidate; CV slightly above HIGH ceiling. Underexplored size class.
06 Q_rank06_h64_V32_D3_dp1_nx1_adam 45,852 Adam 0.02497 0.029 3 32 1 1 P-class (D=3) LOW-band; originally framed "polynomial," now confirmed projective-clean on RPΒ²
07 Q_rank07_h64_V32_D3_dp0_nx1_adam 28,956 Adam 0.03151 0.036 3 32 0 1 P-class (D=3) Smaller P-class variant
08 Q_rank08_h64_V32_D4_dp1_nx1_adam 57,215 Adam 0.00231 0.960 4 32 1 1 H2a Exact h2-64 single-bank arch (depth=1+n_cross=1)
09 Q_rank09_h64_V32_D3_dp0_nx0_adam 28,899 Adam 0.02782 0.035 3 32 0 0 P-class (D=3) Smallest projective-clean omega-class candidate. Under 30K params. 30% smaller than H2a at ~14Γ— MSE cost.
10 Q_rank10_h64_V32_D2_dp0_nx1_adam 19,649 Adam 0.16139 0.000 2 32 0 1 EXCLUDED D=2 cannot form pentachoron (needs β‰₯5 points), CV undefined

Per-architecture optimizer comparison (where direct twin exists):

arch Adam Q-rank Adam MSE LBFGS Q-rank LBFGS MSE winner
h64_V32_D4_dp0_nx0 02 0.00205 (no twin) β€” Adam
h64_V32_D4_dp0_nx1 03 0.00250 04 0.00391 Adam by 36%
h64_V32_D4_dp1_nx0 (no twin) β€” 01 0.00421 LBFGS only
h64_V32_D4_dp1_nx1 08 0.00231 (no twin) β€” Adam

Optimizer regime guidance (000100, post-LBFGS-fix): Adam @ lr=3e-3 dominates at 1000-batch budgets. LBFGS retains niche for short-budget probing (≀100 batches) and floor-finding sweeps. For sphere-solver canonical training: Adam is the recommended default since 2026-04-24.


TIER 3 β€” Phase 1 ablation grid (233 configs across 15 groups Γ— 3 bands Γ— seeds)

The Phase 1 ablation predates the P-sweep. A through M groups, each varying one architectural axis at a time, all three CV bands (LOW D=16, MID D=8, HIGH D=4), seed-replicated where statistically meaningful. 233 total entries; 227 band-match (97%); 229 params-finite (98%); 223 valid (band-match + finite). Data from omega_inventory.csv. Columns: D / V / hidden / depth / patch_size = dp/ps / n_seeds / mean test MSE / mean CV / band deviation (observed_sphere_cv βˆ’ uniform_RP^(D-1)_baseline) / n_finite seeds.

HIGH band (D=4, V=32, hidden=64, ps=4) β€” sphere-solver candidates

group variant n mse cv dev fin notes
A baseline 5 1.4591 0.902 +0.018 5 seed-replicated reference
B B1_all16 1 1.4709 0.858 +0.021 1 16-noise mixture
B B2_gaussian_only 1 0.9590 0.753 +0.027 1 gaussian-only train
B B3_structured 1 1.7076 0.968 βˆ’0.036 1 structured-noise train
B B4_heavy_tailed 1 1.7982 0.820 +0.064 1 heavy-tailed train
B B5_first_half 1 1.4395 0.944 +0.025 1 noises 0-7
B B6_even_indices 1 1.9918 0.880 +0.023 1 noises 0,2,4,...,14
C C1_adam 1 1.4667 0.860 +0.022 1 Adam @ default lr
C C2_sgd 1 1.5670 0.858 +0.019 1 plain SGD
C C3_sgd_momentum 1 1.2006 0.860 +0.013 1 SGD with momentum
C C4_adamw 1 1.4706 0.859 +0.023 1 AdamW
C C5_lbfgs 1 nan 0.943 βˆ’0.923 1 LBFGS Hessian-corruption β€” pre-fix bug
D D1_cosine 1 1.4712 0.858 +0.020 1 cosine LR
D D2_constant 1 1.3028 0.858 +0.028 1 constant LR
D D3_linear_decay 1 1.4652 0.858 +0.021 1 linear decay
D D4_warm_restart 1 1.3216 0.858 +0.016 1 warm restart
D D5_one_cycle 1 1.4656 0.858 +0.019 1 one-cycle LR
E E1_full_softhand 3 0.4420 0.907 +0.026 3 soft-hand CV regularizer
E E2_pure_mse 3 0.4692 0.904 +0.037 3 pure MSE, no regularizer
E E3_measure_only 3 0.4517 0.908 βˆ’0.011 3 measure CV but don't regularize
E E4_hard_cv_penalty 3 0.4504 0.902 +0.002 3 hard CV penalty
F F1_gelu 1 1.4669 0.858 +0.022 1 GELU activation
F F2_relu 1 1.4744 0.855 +0.025 1 ReLU
F F3_silu 1 1.4702 0.901 +0.022 1 SiLU
F F4_tanh 1 1.4550 0.806 +0.027 1 tanh
F F5_identity 1 1.4522 0.866 +0.031 1 identity (no activation)
G G1_sphere_norm 1 1.4668 0.858 +0.022 1 reference (sphere-norm M)
G G2_no_norm 1 1.4680 1.112 +0.357 1 REMOVES sphere-norm β€” geometry breaks
G G3_layer_norm 1 1.4482 0.701 βˆ’0.217 1 layer-norm instead β€” geometry breaks
G G4_scale_only 1 1.4588 1.112 +0.359 1 scale-only β€” geometry breaks
H H1_svd_fp64 3 0.4198 0.906 +0.006 3 full-fp64 SVD reference
H H2_linear_matched 3 0.0456 0.908 +0.005 3 CANONICAL β€” H2-class baseline. 9Γ— lower MSE than H1 fp64.
H H3_linear_unmatched 3 0.0948 0.911 +0.001 3 linear readout, mismatched dims
H H5_batch_shared_svd 2 0.2799 0.918 +0.056 2 shared-SVD across batch
H H6_no_svd_direct 1 0.4606 0.902 +0.085 1 direct readout, no SVD
I I1_1layer 1 1.4733 0.859 +0.023 1 1 cross-attn layer
I I2_0layers 1 1.8687 0.910 +0.087 1 0 cross-attn layers
I I3_2layers 1 1.6295 0.924 βˆ’0.002 1 2 cross-attn layers
I I4_unbounded_alpha 1 1.4803 0.859 +0.022 1 no clip on Ξ±
K K1_bs128 1 1.4653 0.858 +0.021 1 batch_size=128
K K2_bs32 1 1.4276 0.858 +0.024 1 batch_size=32
K K3_bs512 1 1.5732 0.859 +0.024 1 batch_size=512
K K4_bs1024 1 1.5428 0.859 +0.017 1 batch_size=1024
L L1_orthogonal 1 1.4802 0.858 +0.018 1 orthogonal init
L L2_kaiming 1 2.4802 0.912 βˆ’0.009 1 Kaiming init
L L3_xavier 1 1.6138 0.952 +0.025 1 Xavier init
L L4_normal_small 1 1.5360 0.942 βˆ’0.019 1 normal init small std
L2 L2_lbfgs_pure_mse 3 0.0058 0.878 βˆ’0.605 1 LBFGS+pure MSE: BEST MSE ON HIGH but only 1/3 finite, geometry broken
M M1_sgd_aggressive 1 1.0770 0.858 +0.009 1 SGD aggressive
M M2_sgd_huge_lr 1 0.6798 0.858 +0.052 1 SGD huge lr
M M3_sgd_high_momentum 1 1.2939 0.859 +0.013 1 SGD high momentum
E_preview E1-E4 (4 variants Γ— 1 seed each) 4 ~1.47 ~0.858 +0.020 to +0.028 4 1-seed preview of Group E

HIGH-band omega-class candidates (band-match + finite + |dev|<0.05): 49 of 51 HIGH entries. The two failures: G2_no_norm (+0.357 β€” confirms sphere-norm is non-negotiable for omega-class) and G4_scale_only (+0.359 β€” same finding). G3_layer_norm at βˆ’0.217 is the third sphere-norm-disruption case. C5_lbfgs is the pre-fix Hessian-corruption casualty.

HIGH-band MSE leader: H2_linear_matched at MSE 0.0456 with dev +0.005. This is the canonical sphere-solver template that h2-64 was built from β€” 9Γ— lower MSE than the fp64-SVD baseline (H1) at the same dev tolerance. The L2_lbfgs_pure_mse achieves MSE 0.0058 (8Γ— better still) but only 1 of 3 seeds converged finite and geometry deviation hits βˆ’0.605, so it's not omega-class despite the MSE win.

MID band (D=8, V=64, hidden=64, ps=16) β€” bulk-Gaussian attractor

group variant n mse cv dev fin
A baseline 5 1.0115 0.359 +0.005 5
B B1_all16 1 1.0088 0.380 βˆ’0.004 1
B B2_gaussian_only 1 0.9530 0.369 βˆ’0.017 1
B B3_structured 1 0.7076 0.347 βˆ’0.007 1
B B4_heavy_tailed 1 1.6415 0.352 +0.007 1
B B5_first_half 1 0.9441 0.380 βˆ’0.004 1
B B6_even_indices 1 1.1708 0.352 βˆ’0.010 1
C C1_adam 1 1.0093 0.380 βˆ’0.004 1
C C2_sgd 1 1.7354 0.381 βˆ’0.018 1
C C3_sgd_momentum 1 1.0390 0.380 +0.001 1
C C4_adamw 1 1.0095 0.379 βˆ’0.005 1
C C5_lbfgs 1 nan 0.337 βˆ’0.357 1
D D1-D5 (5 variants) 5 ~1.00 ~0.380 βˆ’0.002 to βˆ’0.009 5
E E1_full_softhand 3 0.9441 0.361 +0.008 3
E E2_pure_mse 3 0.9416 0.362 +0.010 3
E E3_measure_only 3 0.9430 0.363 +0.009 3
E E4_hard_cv_penalty 3 0.9447 0.362 +0.011 3
F F1-F5 (5 variants) 5 ~1.01 0.375-0.388 βˆ’0.002 to βˆ’0.009 5
G G1_sphere_norm 1 1.0105 0.381 βˆ’0.005 1
G G2_no_norm 1 0.9935 0.560 +0.198 1
G G3_layer_norm 1 1.0093 0.428 +0.064 1
G G4_scale_only 1 1.0103 0.556 +0.149 1
H H1_svd_fp64 3 0.9429 0.362 +0.009 3
H H2_linear_matched 3 0.9195 0.365 +0.015 3
H H3_linear_unmatched 3 0.9343 0.359 +0.018 3
H H4_svd_fp32 2 0.9434 0.368 +0.015 2
H H5_batch_shared_svd 2 0.9439 0.367 +0.016 2
H H6_no_svd_direct 1 0.9453 0.368 βˆ’0.002 1
I I1-I4 (4 variants) 4 1.007-1.011 0.355-0.380 βˆ’0.005 to +0.004 4
K K1-K4 (4 variants) 4 0.999-1.038 ~0.380 βˆ’0.002 to βˆ’0.012 4
L L1-L4 (4 variants) 4 1.011-1.340 0.336-0.383 +0.006 to +0.011 4
L2 L2_lbfgs_pure_mse 3 0.8924 0.359 βˆ’0.233 1
M M1-M3 (3 variants) 3 0.976-1.024 ~0.380 +0.002 to +0.006 3
E_preview E1-E4 4 ~1.01 ~0.380 βˆ’0.004 to βˆ’0.006 4

MID-band omega-class candidates: 73 of 78 MID entries. Same sphere-norm-disruption failures (G2, G4). MSE leader at H2_linear_matched 0.9195 with dev +0.015.

LOW band (D=16, V=64, hidden=64, ps=16) β€” noise-substrate attractor

group variant n mse cv dev fin
A baseline 5 1.0080 0.197 βˆ’0.000 5
B B1_all16 1 1.0104 0.203 βˆ’0.003 1
B B2_gaussian_only 1 0.9389 0.210 +0.001 1
B B3_structured 1 0.7051 0.207 +0.006 1
B B4_heavy_tailed 1 1.6091 0.191 +0.009 1
B B5_first_half 1 0.9401 0.201 βˆ’0.005 1
B B6_even_indices 1 1.1678 0.211 βˆ’0.001 1
C C1-C4 (4 finite) 4 1.008-1.626 ~0.204 βˆ’0.004 to +0.008 4
C C5_lbfgs 1 nan (excluded) β€” β€” 0
D D1-D5 5 0.974-1.010 0.203-0.205 βˆ’0.000 to βˆ’0.003 5
E E1_full_softhand 3 0.9338 0.199 +0.002 3
E E2_pure_mse 3 0.9340 0.200 +0.001 3
E E3_measure_only 3 0.9342 0.200 +0.002 3
E E4_hard_cv_penalty 3 0.9336 0.199 +0.001 3
F F1-F5 5 1.003-1.011 0.199-0.204 βˆ’0.001 to βˆ’0.004 5
G G1_sphere_norm 1 1.0085 0.204 βˆ’0.003 1
G G2_no_norm 1 0.9914 0.353 +0.174 1
G G3_layer_norm 1 1.0048 0.211 +0.001 1
G G4_scale_only 1 1.0092 0.347 +0.159 1
H H1_svd_fp64 3 0.9345 0.200 +0.002 3
H H2_linear_matched 3 0.9128 0.204 +0.005 3
H H3_linear_unmatched 3 0.9195 0.202 +0.010 3
H H4_svd_fp32 2 0.9327 0.202 +0.002 2
H H5_batch_shared_svd 2 0.9331 0.202 +0.003 2
H H6_no_svd_direct 1 0.9359 0.202 +0.003 1
I I1-I4 4 1.004-1.014 0.201-0.210 βˆ’0.001 to +0.012 4
J J1_V64_h64 1 1.0094 0.204 βˆ’0.002 1
J J2_V32_h32 1 1.1314 0.205 βˆ’0.028 1
J J3_V16_h32 1 1.1862 0.234 βˆ’0.002 1
J J4_V64_h32 1 1.0957 0.208 βˆ’0.008 1
J J5_V128_h128 1 0.9595 0.199 βˆ’0.000 1
K K1-K4 4 0.997-1.047 ~0.204 βˆ’0.003 to βˆ’0.006 4
L L1-L4 4 1.009-1.242 0.197-0.204 +0.002 to +0.012 4
M M1-M3 3 0.977-1.011 0.204 +0.002 to +0.006 3
E_preview E1-E4 4 ~1.01 ~0.204 βˆ’0.001 to βˆ’0.003 4

LOW-band omega-class candidates: 76 of 79 LOW entries. Same sphere-norm-disruption failures (G2, G4). MSE leader: H2_linear_matched at 0.9128 with dev +0.005 β€” the same H2 template that wins on every band.

Group J β€” V Γ— hidden capacity sweep at D=16: only 5 configs (one of the smallest groups), but the only group that varies V away from the standard {32, 64} options at the LOW-band. Confirms V=128 (J5) drives dev to exactly 0.000 and produces the lowest MSE in the LOW group at 0.9595. V=16 (J3) at h=32 still maintains dev βˆ’0.002 but loses 23% on MSE relative to V=128. J5_V128_h128 is the strongest unverified noise-substrate candidate in the catalog β€” params 1.46M (estimated from V=128, h=128), dev exactly zero against the uniform-RP^15 baseline, lowest MSE in the LOW band among non-H-group entries. Worth a U5-style projective probe.

Phase 1 ablation β€” engineering invariants surfaced

  • Sphere-norm (Group G) is non-negotiable: removing it (G2, G4) blows up dev across all three bands by 0.15-0.36. Confirmed independently in HIGH/MID/LOW.
  • H2_linear_matched is the canonical template: best MSE in 2 of 3 bands (LOW + HIGH), within 0.003 of best in MID. Forms the architectural template behind h2-64, all H2a/H2b Q-sweep candidates, and the substrate prototypes (bintree/SP-bit/byte-trigram).
  • L2_lbfgs_pure_mse achieves the lowest MSE on HIGH (0.0058) at the cost of 67% non-convergence and broken geometry β€” the pre-fix Hessian-corruption pattern that 000099 diagnosed.
  • CV bands quantize cleanly by D: HIGH β‰ˆ 0.86 (D=4), MID β‰ˆ 0.36 (D=8), LOW β‰ˆ 0.20 (D=16). Bulk of variants land within Β±0.05 of band center; the failures are sphere-norm disruption + LBFGS Hessian corruption.
  • Group A 5-seed reference establishes within-config variance: HIGH dev variance Β±0.018, MID Β±0.005, LOW Β±0.000. Within-config noise is comparable to or smaller than the |dev|<0.05 projective-clean threshold, so single-seed entries are still informative for the omega-class criterion.

Phase 1 ablation β€” direct architecture comparisons

comparison HIGH MID LOW
H2_linear_matched (canonical) MSE 0.0456 Β± 0.0076 0.9195 Β± 0.0026 0.9128 Β± 0.0043
H1_svd_fp64 (full SVD) MSE 0.4198 Β± 0.0154 0.9429 Β± 0.0041 0.9345 Β± 0.0023
H2 advantage over H1 9.2Γ— lower MSE 2.5% lower 2.3% lower
H3_linear_unmatched MSE 0.0948 Β± 0.0123 0.9343 Β± 0.0025 0.9195 Β± 0.0011
H6_no_svd_direct MSE 0.4606 0.9453 0.9359

The HIGH-band H2 advantage (9.2Γ— lower MSE than full-SVD H1) is what made H2_linear_matched the canonical template. At MID and LOW the H-group converges (linear-matched, full-SVD, and direct-readout all within 3% of each other), so the H2 win is specifically a HIGH-band phenomenon. This matches the architectural reading: at D=4 the SVD dimension is small enough that linear readout matches the spectral bandwidth without information loss; at D=8 and D=16 the spectral residual matters.


TIER 4 β€” P-sweep small-battery floor grid (600 configs at 20 batches)

group_P_small_battery_floor from ablation_configs.py line 615. Full product: 5 Γ— 5 Γ— 3 Γ— 2 Γ— 2 Γ— 2 = 600 configs. This is the parent grid the Q-sweep top-10 came out of. It runs after Phase 1 ablation establishes H2_linear_matched as the canonical template, and varies the architecture axes around that template at a budget-minimal 20 batches per config.

P-sweep grid axes

axis values count
hidden {4, 8, 16, 32, 64} 5
V {2, 4, 8, 16, 32} 5
D {2, 3, 4} 3
depth {0, 1} 2
n_cross {0, 1} 2
optimizer {Adam, LBFGS} 2

Pins (H2_linear_matched baseline)

  • svd='none', linear_readout=True, match_params=True (the H2 ablation winner)
  • HIGH band: patch_size=4, img_size=64
  • batch_size=256, batch_limit=20 (~5,120 samples seen per config)
  • n_heads=1 (since D varies down to 2, default n_heads=4 would fail)
  • grad_clip=1.0 (defensive β€” see TIER 4a)
  • soft_hand=False, cv_measure_every=2
  • Adam: lr=3e-3 (Phase-2 default scaled to 20-batch budget)
  • LBFGS: lr=1.0 (default unit-Wolfe-step, lib's own line search)
  • Training: gaussian-only (noise_types=[0])
  • Testing: 16-noise per-noise generalization (test_noise_types=list(range(16)), 256 samples per noise)

P-sweep outcomes by optimizer split

optimizer total configs finite NaN/divergent Q-sweep top-10 representation
Adam 300 300 (100%) 0 6 of top-10 (Q-ranks 02, 03, 06, 07, 08, 09, 10)
LBFGS 300 291 (97%) 9 3 of top-10 (Q-ranks 01, 04, 05)
Total 600 591 9 10 advanced to Q-sweep at 1000 batches

The 9 NaN/divergent LBFGS configs all matched the 000099 Hessian-corruption profile: depth=1 + n_cross=1 architectures where gradient clipping inside the LBFGS closure caused the (s_k, y_k) Hessian approximation to underestimate, generating runaway H⁻¹ steps over enough iterations. 20 batches is the threshold below which divergence is incipient but not yet catastrophic; the 1000-batch Q-sweep would have surfaced 30 LBFGS divergences had it not been pre-fixed.

P-sweep geometric attractor split (observed across all 600 configs, confirmed in Q)

D class typical CV (post-20-batch) attractor members
D=4 0.86 - 1.07 (HIGH band) sphere-solver (H2 family) 200 configs (5 hidden Γ— 5 V Γ— 2 depth Γ— 2 n_cross Γ— 2 opt)
D=3 ~0.03 (LOW band) projective-clean on RPΒ² (P-class, originally framed "polynomial") 200 configs
D=2 undefined (V<5 cannot form pentachoron) failed geometric validity 200 configs

200 D=4 configs are theoretically all H2 candidates. Survival of the 6 Adam + 3 LBFGS into Q-sweep top-10 depends on continued_training_potential, which combines convergence-rate-at-20-batches with extrapolated-MSE-at-1000-batches. Q-sweep's full table (Tier 2) gives the actual 1000-batch outcomes for those 10.

TIER 4a β€” P-sweep extrapolation rankings (the top-10 source data)

Each Q-sweep entry started as a P-sweep entry. The "P-MSE" column from group_Q_h2_candidates shows the MSE that ranked it at 20 batches:

Q-rank source P config P-MSE (20 batch) Q-MSE (1000 batch) improvement optimizer regime change
01 h64_V32_D4_dp1_nx0_lbfgs 0.053 0.00421 13Γ— LBFGS clean at 1000 (post-fix)
02 h64_V32_D4_dp0_nx0_adam 0.572 0.00205 279Γ— strongest extrapolation
03 h64_V32_D4_dp0_nx1_adam 0.584 0.00250 234Γ—
04 h64_V32_D4_dp0_nx1_lbfgs 0.041 0.00391 10Γ— LBFGS started low, gained little
05 h64_V16_D4_dp1_nx1_lbfgs 0.115 0.03117 4Γ— V=16 hits H2b ceiling
06 h64_V32_D3_dp1_nx1_adam 0.656 0.02497 26Γ— D=3 P-class
07 h64_V32_D3_dp0_nx1_adam 0.641 0.03151 20Γ— D=3 P-class
08 h64_V32_D4_dp1_nx1_adam 0.620 0.00231 268Γ— h2-64 single-bank arch
09 h64_V32_D3_dp0_nx0_adam 0.638 0.02782 23Γ— D=3 P-class smallest
10 h64_V32_D2_dp0_nx1_adam 0.736 0.16139 5Γ— D=2 β€” failed geometric validity

Reading: Adam configs started P-sweep with high MSE (0.57-0.74) and gained 20-280Γ— by 1000 batches. LBFGS configs started P-sweep with low MSE (0.04-0.12) and gained only 4-13Γ—. This confirms the optimizer regime shift that 000100 logged: LBFGS dominates at short budgets (≀20 batches), Adam dominates at long budgets (β‰₯500 batches). The architectural template (h64_V32_D4 with depth/n_cross combinations) is unchanged across optimizers; only the optimization trajectory differs.

P-sweep coverage gaps

  • 9 NaN configs never re-run with the post-fix trainer (parked open item from 000100). Could surface 9 additional Tier 2 candidates at the LBFGS+depth=1+n_cross=1 architecture frontier.
  • V<32 LBFGS coverage is thin β€” only Q-rank05 represented from 300 LBFGS configs across V ∈ {2, 4, 8, 16, 32}. The full P-sweep contains 60 LBFGS configs at each V; their ranking among each other is not surfaced into the catalog.
  • hidden < 64 LBFGS coverage is sparse β€” Q-sweep's top-10 are all hidden=64. A dedicated lower-hidden LBFGS sub-sweep was never run.

Test of the natural-axis-count hypothesis: V matched to known polytope vertex counts on S^(D-1) should produce static sphere-solver rows (no rotating antipodal frame).

variant V D polytope predicted params
R_h64_V16_D4_16cell_orthoplex_adam 16 4 16-cell (4-orthoplex) H2-LIKE static β€”
R_h64_V8_D4_8cell_or_16cell_subset_adam 8 4 8-cell (tesseract) H2-LIKE static β€”
R_h64_V20_D3_dodecahedron_adam 20 3 dodecahedron H2-LIKE static β€”

Pins: same H2_linear_matched baseline as Q, Adam @ lr=3e-3, depth=0, n_cross=0, 1000 batches, gaussian-only training, 16-noise per-noise test.

Status: trained (in phaseR_reports/ on HF), results not surfaced into the projective-clean catalog yet. Worth probing against the omega-class criterion since natural-axis-count framework predicts they should land cleanly.


TIER 4b β€” R-sweep polytope packing test (3 configs at 1000 batches)

group_R_packed_polytope_test from ablation_configs.py. Predicted-H2-LIKE configs where V is matched to known polytope vertex counts on S^(D-1). Same H2_linear_matched baseline as P/Q, Adam @ lr=3e-3, 1000 batches, gaussian-only training. Ran 2026-04-24 alongside Q-sweep.

variant V D polytope predicted architecturally implies
R_h64_V16_D4_16cell_orthoplex_adam 16 4 16-cell (4-orthoplex) H2-LIKE static rows natural axis count for D=4 = 16
R_h64_V8_D4_8cell_or_16cell_subset_adam 8 4 8-cell (tesseract) H2-LIKE static rows sub-polytope vertex count
R_h64_V20_D3_dodecahedron_adam 20 3 dodecahedron H2-LIKE static rows natural axis count for D=3 = 20

Hypothesis tested: when V matches a known regular polytope vertex count for S^(D-1), training should produce static sphere-solver rows (no antipodal pair rotation). Phil's framing (000100): the 32-row Γ— D=3 G-Class behavior emerged because 32 points cannot be uniformly arranged on SΒ² β€” geometric frustration. Match V to polytope, frustration disappears.

Status: trained, weights in phaseR_reports/ on HF, but codebooks were never extracted and probed against the projective-clean threshold. Worth running through extract_codebook to surface them into the verified-omega-class tier; the natural-axis-count framework (ft2 Β§9.1) predicts they should land cleanly. Open item from 000101.

TIER 5 β€” Phase S D=5 architecture floor map (1600 configs at 20 batches)

axis values count
hidden {4, 8, 16, 32, 64} 5
V {2, 4, 8, 16, 32} 5
D {5} 1
depth {0, 1} 2
n_cross {0, 1} 2
noise_type {0..15} 16
optimizer {Adam} 1 (LBFGS too slow for sweep)

Total: 5 Γ— 5 Γ— 1 Γ— 2 Γ— 2 Γ— 16 Γ— 1 = 1600 runs.

Headline finding (000105): cross-noise rank correlation +0.954. Architectures rank near-identically across all 16 noise types β€” what changes per noise is achievable floor MSE, not which model achieves it. Top-4 universal architectures all hidden=64, V=32.

Top-4 architectures from S analysis (mean rank across 16 noise types):

rank architecture mean rank
1 h64_V32_dp0_nx1_D5 1.1
2 h64_V32_dp1_nx0_D5 (close to 1.1)
3 h64_V32_dp1_nx1_D5 1.9
5 h64_V16_dp1_nx1_D5 (the V<32 entry)

Note: the 1391 individual config directories were lost to HF rate-limiting (87% of submitted commits failed). The 1600-config aggregate JSON survived; per-config artifacts mostly did not. Engineering invariant logged (000108): batch-sync uploads from this point forward.


TIER 6 β€” Phase T D=5 convergence sweep (64 configs at 1000 batches)

Top-4 S architectures Γ— 16 noise types, run at A3-reference budget. The D=5 walk-back (000106) lives here.

arch hidden V depth n_cross optimizer % projective-clean across 16 noises
h64_V16_dp1_nx1 64 16 1 1 Adam 62% (10/16) β€” D=5 sweet spot
h64_V32_dp0_nx1 64 32 0 1 Adam 50% (8/16)
h64_V8_dp1_nx0 64 8 1 0 Adam 25% (4/16)
h64_V32_dp1_nx1 64 32 1 1 Adam 19% (3/16)

Headline: 23/64 (~36%) configs converged within Β±0.05 of uniform RP⁴ baseline. V=16 was the geometric sweet spot at D=5, not V=32 β€” overturning the V=32 universality reading from A3's three runs.

Per-V deviation summary:

V mean dev p25-p75 in band?
8 +0.115 [0.07, 0.18] No
16 +0.040 [0.01, 0.07] Yes (only V whose mean lands inside)
32 +0.057 [0.04, 0.07] Just outside

Salt_pepper anomaly (000106): 100% projective-clean across all 4 archs in Phase T despite being the worst noise to reconstruct in S-sweep (best MSE 2.51, ~34Γ— worse than pink). Geometry decouples from MSE β€” a bank that fails to reconstruct can still produce a clean projective codebook. This matters for downstream cross-bank analysis: "the worst-fitting bank" might still produce the most useful projective representation.

Four_quadrant anomaly: 0% projective-clean across all 4 archs. Spatially structured noise where no architecture in T converged. Open in ft3 Β§10 as a deeper-probe candidate.


TIER 7 β€” A-set verification probes (the 19-model count from ft2)

These are the projective-codebook verification runs that produced the n=19 count cited in ft2's Section 5 table. All entries are individual probes with explicit deviation measurements.

probe model D V n_axes pairs mean projective angle uniform baseline dev result
A0 G-Cand 3 32 22 10 1.011 1.015 βˆ’0.004 PROJECTIVE-CLEAN
A1 H2a 4 32 26 6 1.116 1.114 +0.002 PROJECTIVE-CLEAN
A2 (Γ—16 banks) h2-64 single-noise (banks 0-15) 4 32 24-27 5-8 mean 1.115 1.105 +0.010 mean, Β±0.013 ALL 16 PROJECTIVE-CLEAN
A3 (Γ—3 runs) A3 D=5 (single-arch single-noise) 5 16/32/64 16/29/51 0/3/13 varies varies -0.015 / +0.016 / +0.019 PROJECTIVE-CLEAN at A3, QUALIFIED by Phase T β€” generalization fails

Probe count: 1 (A0) + 1 (A1) + 16 (A2) + 3 (A3) = 21 individual probe runs. ft2 cited "19 models" because the A3 three runs were treated as one architectural data point. Phase T (000106) re-classified A3 as a single-arch test, leaving 17 projective-clean instances at D=3/4 robustness level.


TIER 8 β€” Substrate prototype trained models (this week's runs)

HF path params D V content regime result
AbstractPhil/geolip-SVAE/bintree_proto_v1 57,215 4 32 depth-4 binary tree, i.i.d. Bernoulli Β±1, BFS-encoded PASSTHROUGH best test_mse 3.5e-5 ep 20, 100% bits/trees from ep 1, CV 0.80-1.00, erank 4.00, ratio 1.00
AbstractPhil/geolip-SVAE/sentencepiece_proto_v1 57,215 4 32 t5-base SP token IDs as 16-bit Β±1 floats PASSTHROUGH best test_mse 5.78e-6 ep 18, 100% bits/tokens from ep 1, Ξ±=0.023 throughout, CV 0.85-1.11, erank 4.00, ratio 0.99
AbstractPhil/geolip-SVAE/byte_trigram_proto_v1 57,215 4 32 UTF-8 byte trigrams as RGB pixels at 256Γ—256 ENGAGED best test_mse 1.7e-5 ep 19, 83.9% byte / 61.3% trigram from 0% floor, Ξ± 0.024β†’0.043, CV left band ep 6, ratio 1.07, erank 3.9955 dip

The bintree + SP-bit pair establishes the passthrough control. Byte-trigram is the first text-engaged omega-class candidate but its codebook hasn't been formally probed against the projective threshold yet. Pending follow-up: byte_trigram_proto_128 (img_size decision pending, 100M sample-view target).


TIER 9 β€” Architectural templates with no measured instance

Documented architectures the catalog does not yet cover. Each is omega-class eligible under the architecture criterion but lacks a verification probe.

template arch params D V status what's needed
Johanna D=16 PatchSVAE-F 8.7M (estimate) 16 256 not yet U5-tested run extract_codebook on Johanna checkpoint, compute deviation from uniform RP^15
Grandmaster omega tokens concept β€” β€” β€” paper-level reference needs trained instance + verification
geolip-svae-nosvd-ablation repo svd_mode='none' variants varies varies varies independent repo, omega verification not surfaced into main catalog inventory the trained checkpoints, run U5 across them
D=6, D=7, D=8 with V matched to natural axis count predicted by Phase T framework β€” 6/7/8 ~22/28/34 (predicted) not run sweep with natural-axis-count V matching

TIER 10 β€” Explicitly excluded from omega-class

exclusion reason source
D=2 configs Cannot form pentachoron (needs β‰₯5 points), CV undefined Q-rank 10
Q-rank 10 (h64_V32_D2_dp0_nx1_adam) D=2, MSE 0.16 = essentially failed reconstruction Q-sweep
9 P-sweep NaN configs LBFGS Hessian-corruption casualties (000099 bug profile) P-sweep, never re-run
bintree_proto_v1 Passthrough regime, codebook not engaged 000111
sentencepiece_proto_v1 Passthrough regime, cross-attn idle 000112
Phase T D=5 V=32 cells (most) V over-counted vs natural axis count ~16, fails projective-clean 000106
Phase T D=5 four_quadrant (all 4 archs) 0% projective-clean, spatial-structured noise 000106
ablation Group H, M, L SVD-removal variants Spectrum-degenerate; not sphere-solvers in proper sense Phase 1 ablation

Smallest-instance benchmarks across the catalog

For when minimal-parameter operation matters:

tier smallest config params D V MSE regime use case
absolute smallest projective-clean Q-rank09 (P-class) 28,899 3 32 0.02782 LOW-band projective-clean on RPΒ² minimum-parameter omega
smallest H2a (canonical sphere-solver) Q-rank02 40,227 4 32 0.00205 HIGH-band sphere-solver on RPΒ³ canonical sphere-solver baseline
Phase T D=5 V=16 sweet spot h64_V16_dp1_nx1_D5 ~36,607 5 16 varies D=5 partial projective-clean D=5 representative
h2-64 single bank (production) bank_idx 0..63 57,215 4 32 varies per-noise sphere-solver bank-level training composition

The 28,899-param P-class candidate (Q-rank09) is the absolute floor for projective-clean. The 40,227-param H2a (Q-rank02) is the floor for canonical sphere-solver behavior at D=4. Under 30K and under 41K respectively.


What's missing from this catalog

  1. byte_trigram_proto_v1 codebook investigation β€” engagement signature confirmed in trajectory, but extract_codebook against the trained checkpoint hasn't been run for the formal projective-clean verification. Tier 1 entry should move to "verified" once this is done.
  2. byte_trigram_proto_128 β€” pending img_size=64-vs-128 decision + 100M-sample-view run completion.
  3. 9 P-sweep NaN re-runs β€” never executed with the LBFGS-fixed trainer (parked open item from 000100). Could surface 9 additional Tier 2 candidates.
  4. R-sweep results probed against projective-clean criterion β€” the polytope-packing predictions (16-cell, 8-cell, dodecahedron) trained but their codebooks weren't surfaced into the catalog.
  5. Johanna D=16 verification β€” the only large-D representative of the noise-substrate line; not yet probed.
  6. Cross-substrate kNN graph β€” bintree, SP-bit, byte-trigram, h2-64-noise codebook-similarity matrix. The "what survives the universal-substrate-hope death" finding (000111) requires this measurement.
  7. Disproof candidates for Omega β€” the methodological pivot from 000108 demands negative-result candidates the catalog doesn't yet contain (non-spherical bottleneck variants, no-spatial-coherence content, byte-misaligned content).