Omega-class battery potentials β exhaustive catalog
Compiled 2026-04-29 from full scratchpad survey across sessions 000080-000113. Includes every documented config across all sweeps (P/Q/R/S/T plus the A-set verification probes), every published HF model, and the trained substrate prototypes from this week. Both Adam and LBFGS candidates are listed where the sweep ran both.
Definition of omega-class
A trained or candidate battery is omega-class if it satisfies all four:
- Sphere-solver architecture: PatchSVAE with sphere-norm M tensor, V rows on S^(D-1), output basis on βP^(D-1)
- Projective-clean codebook: |deviation from uniform RP^(D-1) baseline| < 0.05, secondary antipodal pair count β€ 3, axis utilization > 0.95
- In its natural CV band for arch class (table below)
- Codebook-engaged or sphere-engaged: cross-attn coupling moves off floor, recovery curve from random init, geometric stats leave passthrough signature
A "potential" is any trained instance OR sweep candidate that has run through the architecture and produced measurements against the criterion.
Natural CV bands by architecture class
| arch class | V | D | natural CV band | source |
|---|---|---|---|---|
| noise-substrate (Freckles, Johanna) | 256 | 16 | 0.20-0.23 (sweet spot), 0.13-0.30 (band) | ft1 attractor, ablation Phase 1 |
| h2-class | 32 | 4 | 0.80-1.05 | h2-64 measurement, 000111-112 |
| P-class | 32 | 3 | 0.029-0.036 (LOW band) | Q-sweep ranks 06/07/09 |
| Phase T D=5 V=16 | 16 | 5 | ~0.04 mean dev | Phase T sweet spot |
| Phase T D=5 V=32 | 32 | 5 | varies by noise (partial) | Phase T (qualified ft2 D=5 claim) |
Diagnostic signature (for testing any candidate)
| dimension | passthrough | engaged (omega-class) |
|---|---|---|
| Ξ± (cross-attn coupling) | stationary [0.020, 0.030] | rising monotonically off floor |
| Ξ±_std/Ξ±_mean | flat ~0.06 | climbs (>0.30 in byte-trigram) |
| row_cv | in arch's natural band | leaves natural band |
| ratio S0/SD | β 1.0 (flat spectrum) | drifts (>1.05) |
| erank | flat at full rank | dips below |
| recovery from random init | near-100% from ep 1 (sign-recovery trivial) | curve from ~0% upward |
| codebook deviation | undefined / not measured | within Β±0.05 of uniform RP^(D-1) |
The load-bearing stack (non-negotiable for any recipe)
Architectural primitives that make the omega regime exist. If any row is absent, the configuration is not an omega recipe β these are not tunable hyperparameters, they are the substrate.
| primitive | code | breaks if removed |
|---|---|---|
| Sphere-norm M rows | M = F.normalize(M, dim=-1) before SVD/readout |
scale-explosion snap (G2/G4 confirm dev +0.16-0.36 across all bands) |
| fp64 SVD path | _svd_fp64 autocast-disabled, fp64 Gram + sqrt + U-recovery, floors 1e-24/1e-16 |
discharge spikes, max_grad >1000 (paste 019) |
| Orthogonal init on encoder readout | nn.init.orthogonal_(enc_out.weight), re-applied after L-group overrides |
L2/L3/L4 init variants regress |
| Bounded-Ξ± cross-attn | S_out = S Β· (1 + Ξ± Β· tanh(...)), Ξ± β€ max_Ξ± via sigmoid |
unbounded-Ξ± (I4) poisons spectrum |
| BoundarySmooth zero-init conv | identity-at-init smoothing post-stitch | boundary inconsistency under tile stitching |
| Pure Adam (NEVER AdamW) | torch.optim.Adam |
weight decay fights geometric structure |
| No BatchNorm, no Dropout | β | spectral structure destabilizes |
| No global average pool | flatten or spatial statistics | accuracy 70% β 29% in geometric encoders |
| Patch-based forward | all components per-patch except cross-attn | breaks resolution invariance by design |
The recipes
Each recipe is a named configuration with verified or predicted omega signature. Empirical instances are linked to the tier inventory below.
Recipe A-class β Johanna / Fresnel (workhorse)
matrix_v=256, D=16, patch_size=16, hidden=768, depth=4, n_cross_layers=2
~17M params, full SVD path (svd_mode='svd_fp64')
optimizer=Adam, lr=3e-4
- Omega clean at scale; teachable downstream as a single instance.
- Cost: ~30GB VRAM, ~4hr/epoch. Not stackable.
- Verified: Tier 1
v50_fresnel_64. Pending: Tier 9 Johanna D=16 codebook probe. - Use when: workhorse encoder; downstream needs single-instance legibility.
Recipe S-class β Freckles (the crown)
matrix_v=48, D=4, patch_size=4, hidden=384, depth=4, n_cross_layers=2
~2.55M params, full SVD path, all defenses stacked
optimizer=Adam, lr=3e-4
- Perfect noise reconstruction (val MSE 5e-6 reporting floor on 16-noise mixture).
- Resolution-invariant by construction β weights at N=256 work at N=4096 (architectural, not empirical).
- Fails to teach as single instance. Requires the full Freckles array for legibility.
- Verified: Tier 1
v40(64Γ64),v41(256Γ256). - Use when: noise/texture substrate; downstream consumes the array.
Recipe H2-class β sphere-solver (h2-64 single bank)
matrix_v=32, D=4, patch_size=4, hidden=64, depth=1, n_cross_layers=1
linear_readout=True, svd_mode='none', match_params=True, smooth_mid=16
~57,215 params per bank, optimizer=Adam, lr=3e-3
- The canonical sphere-solver. SVD replaced by learned linear readout; column norms = S, identity = Vt.
- Projective-clean at D=4: 24-27 axes/bank, dev +0.010 Β±0.013 across 16 single-noise banks (A2 probe).
- Stackable into 192-bank arrays; teaches via per-bank MSE signature.
- Verified: Tier 1a (full 192-bank h2-64 array). Q-rank08 is the exact same architecture.
- Use when: small-scale, multi-bank, codebook-engaged regime.
Identity invariants for loading h2-64 weights (silent partial-load otherwise):
smooth_mid=16(NOT the ps-dependent default 8 at ps<16 β 440 params missing if wrong).linear_readout=True,svd_mode='none',match_params=True.n_headsdivides D β at D=4 withn_heads=4, head_dim=1.
Recipe H2a β minimum H2-class (Q-rank02)
matrix_v=32, D=4, patch_size=4, hidden=64, depth=0, n_cross_layers=0
linear_readout=True, svd_mode='none', match_params=True
~40,227 params, optimizer=Adam, lr=3e-3
- Minimum sphere-solver: depth+n_cross stripped to zero.
- Q-MSE 0.00205 at 1000 batches β best Adam in Q-sweep.
- Use when: floor-finding the sphere-solver capacity envelope at D=4.
Recipe P-class β minimum projective-clean (Q-rank09)
matrix_v=32, D=3, patch_size=4, hidden=64, depth=0, n_cross_layers=0
linear_readout=True, svd_mode='none', match_params=True
~28,899 params, optimizer=Adam, lr=3e-3
- Smallest projective-clean omega-class instance in the catalog.
- LOW-band CV (~0.03), projective-clean on RPΒ², 22 axes (10 pairs + 12 unpaired).
- Use when: minimum parameter budget; D=3 RPΒ² regime is acceptable.
Recipe F-class β experimental nursery
matrix_v β {32, 48, 64}, D β {2, 4, 8, 16}, patch_size β {8, 16}
hidden β {32, 64, 128}, depth β {1, 2}, n_cross_layers β {1, 2}
2K β 645K params per config, 1 epoch Γ 1M samples (triage regime)
optimizer=Adam, lr=3e-4, soft-hand=CV-EMA prox boost (gradient-free)
- Designed to fail often. Collapse is a data point, not a bug.
- Triage signal (1ep Γ 1M) β‘ 30ep Γ 200K for keep/kill discrimination at 1/5 wallclock.
- Soft-hand:
loss = (1 + boost Β· prox) Β· recon_loss; prox is Gaussian oncurrent_cvvs CV-EMA. No penalty term. - Use when: searching for new viable templates; running large sweeps cheaply.
Recipe T β D=5 sweet spot (Phase T)
matrix_v=16, D=5, hidden=64, depth=1, n_cross_layers=1
linear_readout=True, svd_mode='none', match_params=True
optimizer=Adam, lr=3e-3, 1000 batches
- 62% projective-clean across 16 noise types β only V whose mean dev lands in band at D=5.
- V=32 fails the cross-noise universality test at D=5 (Tier 6 supersedes A3's V=32 reading).
- Use when: extending recipes to D=5 with realistic projective-clean expectation.
Recipe R β polytope-packing (predicted, not yet probed)
(V, D) β {(16, 4) 16-cell, (8, 4) 8-cell, (20, 3) dodecahedron}
hidden=64, depth=0, n_cross_layers=0
linear_readout=True, svd_mode='none', match_params=True
optimizer=Adam, lr=3e-3, 1000 batches
- Hypothesis: V matched to a known regular polytope vertex count for S^(D-1) β static rows, no antipodal pair rotation. Geometric frustration disappears.
- Trained, weights in
phaseR_reports/on HF; codebooks not yet extracted. - Use when: testing the natural-axis-count framework. Open item from 000101.
Recipe J5 (LOW-band, unverified)
matrix_v=128, D=16, patch_size=16, hidden=128, depth=1, n_cross_layers=1
~1.46M params estimated, optimizer=Adam
- Strongest unverified noise-substrate candidate. LOW-band MSE 0.9595 with dev exactly 0.000 against uniform RPΒΉβ΅ baseline β best of any non-H-group entry at LOW band.
- Worth a U5-style projective probe before committing to long-horizon training.
The substrate layer (input encoding rules)
Architecture and encoding are separate hypotheses. Passthrough on one encoding does not invalidate the architecture (lesson from 000113: SP-bit passthrough on h2-class β byte-trigram engaged on the same arch).
Per-patch capacity arithmetic
A (C, ps, ps) patch on the unit sphere supports:
- 4Γ4 RGB β 1M discriminable positions on SΒ³ at 1Β° resolution.
- The encoding's information cardinality must approach this for the codebook to do compression work.
- Hard zeros are wasted capacity β every float in every patch should carry signal.
Engagement vs passthrough
The full diagnostic table is the "Diagnostic signature" section above. Shorthand:
- Engaged: Ξ± rises monotonically; row_cv leaves natural class band; ratio S0/SD drifts; erank dips below D; recovery curve climbs from low.
- Passthrough: Ξ± stationary at init; row_cv in band; ratio β 1.0; erank flat at D; recovery near-100% from ep 1.
Substrate pass/fail history
| substrate | architecture probed | result | reason |
|---|---|---|---|
16-type OmegaNoiseDataset |
A, S, F, h2 | Engaged | Original substrate; full per-patch density |
| ImageNet random crops (sublens) | A (Fresnel-64 v50) | Engaged | Natural image structure; full RGB density |
Byte-trigram (R,G,B) (000113) |
h2 | Engaged | 256Β³ cardinality per cell, every float carries signal |
| Sentencepiece bit content (000112) | h2 | Passthrough | 1/3-filled patch with 32 hard-zero paddings β model bored |
| Binary-tree i.i.d. Bernoulli (000111) | h2 | Passthrough | Sign-only signal; trivial under linear_readout=True |
Substrate design rules
- Every float in every patch should carry signal. Hard zeros = wasted capacity.
- Information cardinality per cell should approach 256Β³ for RGB (or the equivalent for non-RGB channel counts).
- Spatial coherence within the patch matters β noise/image/byte-trigram all have it; bit-encodings of token IDs do not.
- Test before training: compute the patch-space cardinality of your encoding. If it's orders of magnitude below per-patch capacity, expect passthrough.
The triage protocol (verifying a candidate is potentially an omega)
Run in order. Stop at first failure. Most candidates die at step 3 or 4.
Step 1. Architecture check (free, no training)
- Inspect kwargs against the load-bearing stack. Missing items disqualify before any compute is spent.
- Run the "debug move" from
CLAUDE.md: instantiate, printstate_dictkeys + shapes, compare against a reference checkpoint. Mismatchedsmooth_mid,n_heads,linear_readout,svd_mode,match_paramscause silent partial-loads understrict=False.
Step 2. Substrate check (free, no training)
- Compute per-patch information cardinality of the input encoding.
- If cardinality << per-patch capacity, expect passthrough β fix the encoding before training.
Step 3. 1-epoch triage (~minutes per config on a single GPU)
- 1 epoch Γ 1M samples, gaussian-only training (
noise_types=[0]). - Score on 16-noise per-noise generalization (256 samples per noise).
- Pass: spectrum bounded (no scale-explosion snap); MSE within 2-3Γ class baseline.
- Fail: NaN / divergence / snap event / MSE order-of-magnitude worse than class.
Step 4. 1000-batch convergence sweep (~hour per config on a single GPU)
- Full optimizer trajectory at lr=3e-3 (Adam) or lr=1.0 (LBFGS post-fix).
- Pass: MSE leadership in its band; CV stays in natural band; Ξ± trajectory shows engagement signature.
- Optimizer regime (000100): Adam dominates β₯500 batches; LBFGS niche is short-budget probing (β€100 batches).
Step 5. Codebook extraction (the omega ratification)
from geolip_svae.inference import (
InferenceEngine, extract_codebook, make_calibration,
)
calib = make_calibration('sixteen_noise', n=64, size=64)
cb = extract_codebook(
model, calib,
model_id='...', calibration_name='sixteen_noise',
)
assert cb.is_projective_clean() # |dev from uniform RP^(D-1)| < 0.05
assert abs(cb.deviation()) < 0.05
- Pass: projective-clean, axis utilization > 0.95, β€3 secondary antipodal pairs.
- The candidate is now ratified as omega-class.
Step 6. Vacuum-seal test (cell deployability)
- Freeze candidate parameters (
requires_grad=False). - Train an adapter classifier around it on a downstream task.
- Pass: CV / erank / S0 stay locked under host gradient stress; classifier learns.
- Fail: geometry collapses β not deployable as a cell. (CE on cell internals "ripped the internals to shreds" β paste 023.)
Step 7. Bandwidth probe (downstream legibility)
- Linear head on omega outputs, downstream task.
- Pass: classifier achieves task-meaningful performance.
- Fail: omega outputs lack representational bandwidth at this scale (S-class symptom).
A candidate that passes 1β7 is a deployable omega cell ready for collective integration.
Decision tree: which recipe?
| Goal | Recipe | Why |
|---|---|---|
| Stable workhorse encoder, single-instance teachable | A-class (Johanna/Fresnel) | Omega clean at scale; teaches downstream |
| Best noise reconstruction, multi-bank deployment | S-class (Freckles) | Perfect recon; needs full array for legibility |
| Small-scale, multi-bank ensemble, text/vision substrate | H2-class (h2-64) | 57K params/bank, projective-clean, stackable |
| Minimum sphere-solver footprint at D=4 | H2a (Q-rank02) | 40K params, depth=0, n_cross=0, MSE 0.00205 |
| Minimum projective-clean footprint at D=3 | P-class (Q-rank09) | 28.9K params, RPΒ², MSE 0.028 |
| Search for new templates cheaply | F-class triage | 1ep Γ 1M, designed to fail often |
| D=5 representative | T (V=16) | Only V whose mean dev lands in band at D=5 |
| Test natural-axis-count hypothesis | R polytope-packing | Predicts static-row H2-LIKE; codebooks unprobed |
| LOW-band scale-up (unverified) | J5 | dev exactly 0.000 vs uniform RPΒΉβ΅ baseline |
Empirical catalog (trained and in-progress instances)
The sections below are the empirical record backing the recipes above. Each tier represents a class of trained or in-progress instances with measured signatures.
TIER 1 β Trained, verified omega-class on HuggingFace
| HF path | arch class | params | D | V | optimizer | natural CV | n_axes | dev | training content | verified by |
|---|---|---|---|---|---|---|---|---|---|---|
AbstractPhil/geolip-SVAE/v40 (Freckles 64Γ64) |
Freckles | 2,557,539 | 4 | 48 | Adam | ~0.20-0.23 | β | <0.05 | 16-noise mixture | ft1, U5 cross-band |
AbstractPhil/geolip-SVAE/v41 (Freckles 256Γ256) |
Freckles | 2,557,539 | 4 | 48 | Adam | same | β | β | resolution-scaled v40 | continuation of v40 |
AbstractPhil/geolip-SVAE/v50_fresnel_64 |
Fresnel-base | 16,942,419 | 4 | β | Adam | β | β | β | 140M+ ImageNet random crops, sublens | streaming run, "phenomenal MSE recon" |
AbstractPhil/geolip-svae-h2-64 (192-bank array) |
H2_linear_matched | 57,215 Γ 64 batteries Γ 3 phases | 4 | 32 | Adam | 0.80-1.05 | 24-27/bank | +0.010 mean, Β±0.013 std | per-bank, see Tier 1a | A2 probe (ft2), 192-bank cosine sweep (000109) |
AbstractPhil/geolip-svae-implicit-solver-experiments/G-Cand |
H2-class | 28,899-45,852 | 3 | 32 | Adam | 0.03 (LOW) | 22 (10 pairs + 12 unpaired) | β0.004 | gaussian-only | A0 probe (ft2) |
AbstractPhil/geolip-svae-implicit-solver-experiments/H2a |
H2_linear_matched | 40,227-57,215 | 4 | 32 | Adam | 0.80-1.10 | 26 (6 pairs + 20 unpaired) | +0.002 | gaussian-only | A1 probe (ft2) |
AbstractPhil/geolip-svae-implicit-solver-experiments/A3 (3 runs, qualified) |
H2 variant | varies | 5 | 16/32/64 | Adam | varies | 16/29/51 | β0.015 / +0.016 / +0.019 | gaussian-only single-arch | A3 probe (ft2). QUALIFIED by 000106 β single-arch single-noise. Phase T showed D=5 partial. |
AbstractPhil/geolip-SVAE/byte_trigram_proto_v1 |
h2-class | 57,215 | 4 | 32 | Adam | left band ep 6 β 1.31 | TBD | TBD | wikitext-103 byte trigrams | 000113 β engaged signature confirmed; codebook investigation pending |
TIER 1a β h2-64 array decomposition (192 banks = 64 batteries Γ 3 phases)
All 16 single-noise experts (Group 1) verified PROJECTIVE-CLEAN in A2. Remaining 48 batteries share architecture and were measured in 192-bank cosine sweep but not individually probed against projective threshold.
Group 1 β 16 single-noise experts (banks 0-15)
| bank | noise type | special | training distribution |
|---|---|---|---|
| 0 | gaussian | universal-pull center | standard normal noise |
| 1 | uniform | β | uniform[-1,1] noise |
| 2 | uniform_scaled | β | scaled uniform |
| 3 | poisson | β | poisson-distributed |
| 4 | pink | clone-pair with bank 5 | 1/f spectrum |
| 5 | brown | clone-pair with bank 4 | 1/fΒ² spectrum |
| 6 | salt_pepper | hardest reconstruction (S-sweep), cleanest projective at D=5 (Phase T) | impulse-style |
| 7 | sparse_impulses | cosine outlier β heavy-tailed | sparse extreme-value |
| 8 | block_upsampled | β | block-correlated |
| 9 | gradient_gaussian | most isolated battery on SΒ³ β only non-stationary noise | spatial gradient |
| 10 | checker | structured noise | checkerboard |
| 11 | gauss_uniform_mix | β | mixture |
| 12 | four_quadrant | 0% projective-clean across all Phase T archs | structured |
| 13 | cauchy | heavy-tailed | cauchy-distributed |
| 14 | exponential | β | exponential |
| 15 | laplace | heavy-tailed | laplace-distributed |
Group 2 β gaussian+one pairs + generalist (banks 16-31)
15 pair-trained banks (gaussian + each of noises 1-15), plus 1 generalist trained on all 16. Notable: pairs (19, 20) = (gaussian+pink, gaussian+brown) are clone-pair on SΒ³ for the same noise-family adjacency reason as (4, 5). Pair (16, 26) = (gaussian+uniform, gaussian+gauss_uniform_mix) is a clone-pair (gaussian dominates the residual; both bounded distributions).
Group 3 β gaussian-balanced quads (banks 32-47)
16 (gaussian, easy, medium, hard) covers via stride-7 deterministic enumeration over the EASY (uniform/uniform_scaled/cauchy/exponential/laplace) Γ MEDIUM (poisson/salt_pepper/sparse_impulses/gauss_uniform_mix) Γ HARD (pink/brown/block_upsampled/gradient_gaussian/checker/four_quadrant) product. All contain gaussian.
Group 4 β no-gaussian quads (banks 48-63)
16 (easy, medium, hard, hard) covers via stride-19 deterministic enumeration. No gaussian seen during training. Banks 48 and 51 are kNN top-5 outliers because they solve the sphere problem from a fundamentally different starting distribution than Groups 1-3 (which all see gaussian as universal pull-toward-interior).
TIER 2 β Q-sweep candidates (10 top-P configs Γ 1000 batches)
After the LBFGS Hessian-corruption fix (000099), the Q-sweep ran clean on 10/10 configs. All 10 are listed; both Adam and LBFGS variants where present.
| Q-rank | variant | params | optim | G-MSE | CV | D | V | depth | n_cross | class | notes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 01 | Q_rank01_h64_V32_D4_dp1_nx0_lbfgs | 57,123 | LBFGS | 0.00421 | 0.954 | 4 | 32 | 1 | 0 | H2a | LBFGS Q-sweep best |
| 02 | Q_rank02_h64_V32_D4_dp0_nx0_adam | 40,227 | Adam | 0.00205 | 0.862 | 4 | 32 | 0 | 0 | H2a | Smallest H2a, canonical sphere-solver. β 1 h2-64 bank's capacity at 70% the params. |
| 03 | Q_rank03_h64_V32_D4_dp0_nx1_adam | 40,319 | Adam | 0.00250 | 0.890 | 4 | 32 | 0 | 1 | H2a | Adam-vs-LBFGS twin of rank 04 |
| 04 | Q_rank04_h64_V32_D4_dp0_nx1_lbfgs | 40,319 | LBFGS | 0.00391 | 0.893 | 4 | 32 | 0 | 1 | H2a | LBFGS twin of rank 03; Adam wins 36% lower MSE |
| 05 | Q_rank05_h64_V16_D4_dp1_nx1_lbfgs | 36,607 | LBFGS | 0.03117 | 1.069 | 4 | 16 | 1 | 1 | H2b | Only V=16 candidate; CV slightly above HIGH ceiling. Underexplored size class. |
| 06 | Q_rank06_h64_V32_D3_dp1_nx1_adam | 45,852 | Adam | 0.02497 | 0.029 | 3 | 32 | 1 | 1 | P-class (D=3) | LOW-band; originally framed "polynomial," now confirmed projective-clean on RPΒ² |
| 07 | Q_rank07_h64_V32_D3_dp0_nx1_adam | 28,956 | Adam | 0.03151 | 0.036 | 3 | 32 | 0 | 1 | P-class (D=3) | Smaller P-class variant |
| 08 | Q_rank08_h64_V32_D4_dp1_nx1_adam | 57,215 | Adam | 0.00231 | 0.960 | 4 | 32 | 1 | 1 | H2a | Exact h2-64 single-bank arch (depth=1+n_cross=1) |
| 09 | Q_rank09_h64_V32_D3_dp0_nx0_adam | 28,899 | Adam | 0.02782 | 0.035 | 3 | 32 | 0 | 0 | P-class (D=3) | Smallest projective-clean omega-class candidate. Under 30K params. 30% smaller than H2a at ~14Γ MSE cost. |
| 10 | Q_rank10_h64_V32_D2_dp0_nx1_adam | 19,649 | Adam | 0.16139 | 0.000 | 2 | 32 | 0 | 1 | EXCLUDED | D=2 cannot form pentachoron (needs β₯5 points), CV undefined |
Per-architecture optimizer comparison (where direct twin exists):
| arch | Adam Q-rank | Adam MSE | LBFGS Q-rank | LBFGS MSE | winner |
|---|---|---|---|---|---|
| h64_V32_D4_dp0_nx0 | 02 | 0.00205 | (no twin) | β | Adam |
| h64_V32_D4_dp0_nx1 | 03 | 0.00250 | 04 | 0.00391 | Adam by 36% |
| h64_V32_D4_dp1_nx0 | (no twin) | β | 01 | 0.00421 | LBFGS only |
| h64_V32_D4_dp1_nx1 | 08 | 0.00231 | (no twin) | β | Adam |
Optimizer regime guidance (000100, post-LBFGS-fix): Adam @ lr=3e-3 dominates at 1000-batch budgets. LBFGS retains niche for short-budget probing (β€100 batches) and floor-finding sweeps. For sphere-solver canonical training: Adam is the recommended default since 2026-04-24.
TIER 3 β Phase 1 ablation grid (233 configs across 15 groups Γ 3 bands Γ seeds)
The Phase 1 ablation predates the P-sweep. A through M groups, each varying one architectural axis at a time, all three CV bands (LOW D=16, MID D=8, HIGH D=4), seed-replicated where statistically meaningful. 233 total entries; 227 band-match (97%); 229 params-finite (98%); 223 valid (band-match + finite). Data from omega_inventory.csv. Columns: D / V / hidden / depth / patch_size = dp/ps / n_seeds / mean test MSE / mean CV / band deviation (observed_sphere_cv β uniform_RP^(D-1)_baseline) / n_finite seeds.
HIGH band (D=4, V=32, hidden=64, ps=4) β sphere-solver candidates
| group | variant | n | mse | cv | dev | fin | notes |
|---|---|---|---|---|---|---|---|
| A | baseline | 5 | 1.4591 | 0.902 | +0.018 | 5 | seed-replicated reference |
| B | B1_all16 | 1 | 1.4709 | 0.858 | +0.021 | 1 | 16-noise mixture |
| B | B2_gaussian_only | 1 | 0.9590 | 0.753 | +0.027 | 1 | gaussian-only train |
| B | B3_structured | 1 | 1.7076 | 0.968 | β0.036 | 1 | structured-noise train |
| B | B4_heavy_tailed | 1 | 1.7982 | 0.820 | +0.064 | 1 | heavy-tailed train |
| B | B5_first_half | 1 | 1.4395 | 0.944 | +0.025 | 1 | noises 0-7 |
| B | B6_even_indices | 1 | 1.9918 | 0.880 | +0.023 | 1 | noises 0,2,4,...,14 |
| C | C1_adam | 1 | 1.4667 | 0.860 | +0.022 | 1 | Adam @ default lr |
| C | C2_sgd | 1 | 1.5670 | 0.858 | +0.019 | 1 | plain SGD |
| C | C3_sgd_momentum | 1 | 1.2006 | 0.860 | +0.013 | 1 | SGD with momentum |
| C | C4_adamw | 1 | 1.4706 | 0.859 | +0.023 | 1 | AdamW |
| C | C5_lbfgs | 1 | nan | 0.943 | β0.923 | 1 | LBFGS Hessian-corruption β pre-fix bug |
| D | D1_cosine | 1 | 1.4712 | 0.858 | +0.020 | 1 | cosine LR |
| D | D2_constant | 1 | 1.3028 | 0.858 | +0.028 | 1 | constant LR |
| D | D3_linear_decay | 1 | 1.4652 | 0.858 | +0.021 | 1 | linear decay |
| D | D4_warm_restart | 1 | 1.3216 | 0.858 | +0.016 | 1 | warm restart |
| D | D5_one_cycle | 1 | 1.4656 | 0.858 | +0.019 | 1 | one-cycle LR |
| E | E1_full_softhand | 3 | 0.4420 | 0.907 | +0.026 | 3 | soft-hand CV regularizer |
| E | E2_pure_mse | 3 | 0.4692 | 0.904 | +0.037 | 3 | pure MSE, no regularizer |
| E | E3_measure_only | 3 | 0.4517 | 0.908 | β0.011 | 3 | measure CV but don't regularize |
| E | E4_hard_cv_penalty | 3 | 0.4504 | 0.902 | +0.002 | 3 | hard CV penalty |
| F | F1_gelu | 1 | 1.4669 | 0.858 | +0.022 | 1 | GELU activation |
| F | F2_relu | 1 | 1.4744 | 0.855 | +0.025 | 1 | ReLU |
| F | F3_silu | 1 | 1.4702 | 0.901 | +0.022 | 1 | SiLU |
| F | F4_tanh | 1 | 1.4550 | 0.806 | +0.027 | 1 | tanh |
| F | F5_identity | 1 | 1.4522 | 0.866 | +0.031 | 1 | identity (no activation) |
| G | G1_sphere_norm | 1 | 1.4668 | 0.858 | +0.022 | 1 | reference (sphere-norm M) |
| G | G2_no_norm | 1 | 1.4680 | 1.112 | +0.357 | 1 | REMOVES sphere-norm β geometry breaks |
| G | G3_layer_norm | 1 | 1.4482 | 0.701 | β0.217 | 1 | layer-norm instead β geometry breaks |
| G | G4_scale_only | 1 | 1.4588 | 1.112 | +0.359 | 1 | scale-only β geometry breaks |
| H | H1_svd_fp64 | 3 | 0.4198 | 0.906 | +0.006 | 3 | full-fp64 SVD reference |
| H | H2_linear_matched | 3 | 0.0456 | 0.908 | +0.005 | 3 | CANONICAL β H2-class baseline. 9Γ lower MSE than H1 fp64. |
| H | H3_linear_unmatched | 3 | 0.0948 | 0.911 | +0.001 | 3 | linear readout, mismatched dims |
| H | H5_batch_shared_svd | 2 | 0.2799 | 0.918 | +0.056 | 2 | shared-SVD across batch |
| H | H6_no_svd_direct | 1 | 0.4606 | 0.902 | +0.085 | 1 | direct readout, no SVD |
| I | I1_1layer | 1 | 1.4733 | 0.859 | +0.023 | 1 | 1 cross-attn layer |
| I | I2_0layers | 1 | 1.8687 | 0.910 | +0.087 | 1 | 0 cross-attn layers |
| I | I3_2layers | 1 | 1.6295 | 0.924 | β0.002 | 1 | 2 cross-attn layers |
| I | I4_unbounded_alpha | 1 | 1.4803 | 0.859 | +0.022 | 1 | no clip on Ξ± |
| K | K1_bs128 | 1 | 1.4653 | 0.858 | +0.021 | 1 | batch_size=128 |
| K | K2_bs32 | 1 | 1.4276 | 0.858 | +0.024 | 1 | batch_size=32 |
| K | K3_bs512 | 1 | 1.5732 | 0.859 | +0.024 | 1 | batch_size=512 |
| K | K4_bs1024 | 1 | 1.5428 | 0.859 | +0.017 | 1 | batch_size=1024 |
| L | L1_orthogonal | 1 | 1.4802 | 0.858 | +0.018 | 1 | orthogonal init |
| L | L2_kaiming | 1 | 2.4802 | 0.912 | β0.009 | 1 | Kaiming init |
| L | L3_xavier | 1 | 1.6138 | 0.952 | +0.025 | 1 | Xavier init |
| L | L4_normal_small | 1 | 1.5360 | 0.942 | β0.019 | 1 | normal init small std |
| L2 | L2_lbfgs_pure_mse | 3 | 0.0058 | 0.878 | β0.605 | 1 | LBFGS+pure MSE: BEST MSE ON HIGH but only 1/3 finite, geometry broken |
| M | M1_sgd_aggressive | 1 | 1.0770 | 0.858 | +0.009 | 1 | SGD aggressive |
| M | M2_sgd_huge_lr | 1 | 0.6798 | 0.858 | +0.052 | 1 | SGD huge lr |
| M | M3_sgd_high_momentum | 1 | 1.2939 | 0.859 | +0.013 | 1 | SGD high momentum |
| E_preview | E1-E4 (4 variants Γ 1 seed each) | 4 | ~1.47 | ~0.858 | +0.020 to +0.028 | 4 | 1-seed preview of Group E |
HIGH-band omega-class candidates (band-match + finite + |dev|<0.05): 49 of 51 HIGH entries. The two failures: G2_no_norm (+0.357 β confirms sphere-norm is non-negotiable for omega-class) and G4_scale_only (+0.359 β same finding). G3_layer_norm at β0.217 is the third sphere-norm-disruption case. C5_lbfgs is the pre-fix Hessian-corruption casualty.
HIGH-band MSE leader: H2_linear_matched at MSE 0.0456 with dev +0.005. This is the canonical sphere-solver template that h2-64 was built from β 9Γ lower MSE than the fp64-SVD baseline (H1) at the same dev tolerance. The L2_lbfgs_pure_mse achieves MSE 0.0058 (8Γ better still) but only 1 of 3 seeds converged finite and geometry deviation hits β0.605, so it's not omega-class despite the MSE win.
MID band (D=8, V=64, hidden=64, ps=16) β bulk-Gaussian attractor
| group | variant | n | mse | cv | dev | fin |
|---|---|---|---|---|---|---|
| A | baseline | 5 | 1.0115 | 0.359 | +0.005 | 5 |
| B | B1_all16 | 1 | 1.0088 | 0.380 | β0.004 | 1 |
| B | B2_gaussian_only | 1 | 0.9530 | 0.369 | β0.017 | 1 |
| B | B3_structured | 1 | 0.7076 | 0.347 | β0.007 | 1 |
| B | B4_heavy_tailed | 1 | 1.6415 | 0.352 | +0.007 | 1 |
| B | B5_first_half | 1 | 0.9441 | 0.380 | β0.004 | 1 |
| B | B6_even_indices | 1 | 1.1708 | 0.352 | β0.010 | 1 |
| C | C1_adam | 1 | 1.0093 | 0.380 | β0.004 | 1 |
| C | C2_sgd | 1 | 1.7354 | 0.381 | β0.018 | 1 |
| C | C3_sgd_momentum | 1 | 1.0390 | 0.380 | +0.001 | 1 |
| C | C4_adamw | 1 | 1.0095 | 0.379 | β0.005 | 1 |
| C | C5_lbfgs | 1 | nan | 0.337 | β0.357 | 1 |
| D | D1-D5 (5 variants) | 5 | ~1.00 | ~0.380 | β0.002 to β0.009 | 5 |
| E | E1_full_softhand | 3 | 0.9441 | 0.361 | +0.008 | 3 |
| E | E2_pure_mse | 3 | 0.9416 | 0.362 | +0.010 | 3 |
| E | E3_measure_only | 3 | 0.9430 | 0.363 | +0.009 | 3 |
| E | E4_hard_cv_penalty | 3 | 0.9447 | 0.362 | +0.011 | 3 |
| F | F1-F5 (5 variants) | 5 | ~1.01 | 0.375-0.388 | β0.002 to β0.009 | 5 |
| G | G1_sphere_norm | 1 | 1.0105 | 0.381 | β0.005 | 1 |
| G | G2_no_norm | 1 | 0.9935 | 0.560 | +0.198 | 1 |
| G | G3_layer_norm | 1 | 1.0093 | 0.428 | +0.064 | 1 |
| G | G4_scale_only | 1 | 1.0103 | 0.556 | +0.149 | 1 |
| H | H1_svd_fp64 | 3 | 0.9429 | 0.362 | +0.009 | 3 |
| H | H2_linear_matched | 3 | 0.9195 | 0.365 | +0.015 | 3 |
| H | H3_linear_unmatched | 3 | 0.9343 | 0.359 | +0.018 | 3 |
| H | H4_svd_fp32 | 2 | 0.9434 | 0.368 | +0.015 | 2 |
| H | H5_batch_shared_svd | 2 | 0.9439 | 0.367 | +0.016 | 2 |
| H | H6_no_svd_direct | 1 | 0.9453 | 0.368 | β0.002 | 1 |
| I | I1-I4 (4 variants) | 4 | 1.007-1.011 | 0.355-0.380 | β0.005 to +0.004 | 4 |
| K | K1-K4 (4 variants) | 4 | 0.999-1.038 | ~0.380 | β0.002 to β0.012 | 4 |
| L | L1-L4 (4 variants) | 4 | 1.011-1.340 | 0.336-0.383 | +0.006 to +0.011 | 4 |
| L2 | L2_lbfgs_pure_mse | 3 | 0.8924 | 0.359 | β0.233 | 1 |
| M | M1-M3 (3 variants) | 3 | 0.976-1.024 | ~0.380 | +0.002 to +0.006 | 3 |
| E_preview | E1-E4 | 4 | ~1.01 | ~0.380 | β0.004 to β0.006 | 4 |
MID-band omega-class candidates: 73 of 78 MID entries. Same sphere-norm-disruption failures (G2, G4). MSE leader at H2_linear_matched 0.9195 with dev +0.015.
LOW band (D=16, V=64, hidden=64, ps=16) β noise-substrate attractor
| group | variant | n | mse | cv | dev | fin |
|---|---|---|---|---|---|---|
| A | baseline | 5 | 1.0080 | 0.197 | β0.000 | 5 |
| B | B1_all16 | 1 | 1.0104 | 0.203 | β0.003 | 1 |
| B | B2_gaussian_only | 1 | 0.9389 | 0.210 | +0.001 | 1 |
| B | B3_structured | 1 | 0.7051 | 0.207 | +0.006 | 1 |
| B | B4_heavy_tailed | 1 | 1.6091 | 0.191 | +0.009 | 1 |
| B | B5_first_half | 1 | 0.9401 | 0.201 | β0.005 | 1 |
| B | B6_even_indices | 1 | 1.1678 | 0.211 | β0.001 | 1 |
| C | C1-C4 (4 finite) | 4 | 1.008-1.626 | ~0.204 | β0.004 to +0.008 | 4 |
| C | C5_lbfgs | 1 | nan (excluded) | β | β | 0 |
| D | D1-D5 | 5 | 0.974-1.010 | 0.203-0.205 | β0.000 to β0.003 | 5 |
| E | E1_full_softhand | 3 | 0.9338 | 0.199 | +0.002 | 3 |
| E | E2_pure_mse | 3 | 0.9340 | 0.200 | +0.001 | 3 |
| E | E3_measure_only | 3 | 0.9342 | 0.200 | +0.002 | 3 |
| E | E4_hard_cv_penalty | 3 | 0.9336 | 0.199 | +0.001 | 3 |
| F | F1-F5 | 5 | 1.003-1.011 | 0.199-0.204 | β0.001 to β0.004 | 5 |
| G | G1_sphere_norm | 1 | 1.0085 | 0.204 | β0.003 | 1 |
| G | G2_no_norm | 1 | 0.9914 | 0.353 | +0.174 | 1 |
| G | G3_layer_norm | 1 | 1.0048 | 0.211 | +0.001 | 1 |
| G | G4_scale_only | 1 | 1.0092 | 0.347 | +0.159 | 1 |
| H | H1_svd_fp64 | 3 | 0.9345 | 0.200 | +0.002 | 3 |
| H | H2_linear_matched | 3 | 0.9128 | 0.204 | +0.005 | 3 |
| H | H3_linear_unmatched | 3 | 0.9195 | 0.202 | +0.010 | 3 |
| H | H4_svd_fp32 | 2 | 0.9327 | 0.202 | +0.002 | 2 |
| H | H5_batch_shared_svd | 2 | 0.9331 | 0.202 | +0.003 | 2 |
| H | H6_no_svd_direct | 1 | 0.9359 | 0.202 | +0.003 | 1 |
| I | I1-I4 | 4 | 1.004-1.014 | 0.201-0.210 | β0.001 to +0.012 | 4 |
| J | J1_V64_h64 | 1 | 1.0094 | 0.204 | β0.002 | 1 |
| J | J2_V32_h32 | 1 | 1.1314 | 0.205 | β0.028 | 1 |
| J | J3_V16_h32 | 1 | 1.1862 | 0.234 | β0.002 | 1 |
| J | J4_V64_h32 | 1 | 1.0957 | 0.208 | β0.008 | 1 |
| J | J5_V128_h128 | 1 | 0.9595 | 0.199 | β0.000 | 1 |
| K | K1-K4 | 4 | 0.997-1.047 | ~0.204 | β0.003 to β0.006 | 4 |
| L | L1-L4 | 4 | 1.009-1.242 | 0.197-0.204 | +0.002 to +0.012 | 4 |
| M | M1-M3 | 3 | 0.977-1.011 | 0.204 | +0.002 to +0.006 | 3 |
| E_preview | E1-E4 | 4 | ~1.01 | ~0.204 | β0.001 to β0.003 | 4 |
LOW-band omega-class candidates: 76 of 79 LOW entries. Same sphere-norm-disruption failures (G2, G4). MSE leader: H2_linear_matched at 0.9128 with dev +0.005 β the same H2 template that wins on every band.
Group J β V Γ hidden capacity sweep at D=16: only 5 configs (one of the smallest groups), but the only group that varies V away from the standard {32, 64} options at the LOW-band. Confirms V=128 (J5) drives dev to exactly 0.000 and produces the lowest MSE in the LOW group at 0.9595. V=16 (J3) at h=32 still maintains dev β0.002 but loses 23% on MSE relative to V=128. J5_V128_h128 is the strongest unverified noise-substrate candidate in the catalog β params 1.46M (estimated from V=128, h=128), dev exactly zero against the uniform-RP^15 baseline, lowest MSE in the LOW band among non-H-group entries. Worth a U5-style projective probe.
Phase 1 ablation β engineering invariants surfaced
- Sphere-norm (Group G) is non-negotiable: removing it (G2, G4) blows up dev across all three bands by 0.15-0.36. Confirmed independently in HIGH/MID/LOW.
- H2_linear_matched is the canonical template: best MSE in 2 of 3 bands (LOW + HIGH), within 0.003 of best in MID. Forms the architectural template behind h2-64, all H2a/H2b Q-sweep candidates, and the substrate prototypes (bintree/SP-bit/byte-trigram).
- L2_lbfgs_pure_mse achieves the lowest MSE on HIGH (0.0058) at the cost of 67% non-convergence and broken geometry β the pre-fix Hessian-corruption pattern that 000099 diagnosed.
- CV bands quantize cleanly by D: HIGH β 0.86 (D=4), MID β 0.36 (D=8), LOW β 0.20 (D=16). Bulk of variants land within Β±0.05 of band center; the failures are sphere-norm disruption + LBFGS Hessian corruption.
- Group A 5-seed reference establishes within-config variance: HIGH dev variance Β±0.018, MID Β±0.005, LOW Β±0.000. Within-config noise is comparable to or smaller than the |dev|<0.05 projective-clean threshold, so single-seed entries are still informative for the omega-class criterion.
Phase 1 ablation β direct architecture comparisons
| comparison | HIGH | MID | LOW |
|---|---|---|---|
| H2_linear_matched (canonical) MSE | 0.0456 Β± 0.0076 | 0.9195 Β± 0.0026 | 0.9128 Β± 0.0043 |
| H1_svd_fp64 (full SVD) MSE | 0.4198 Β± 0.0154 | 0.9429 Β± 0.0041 | 0.9345 Β± 0.0023 |
| H2 advantage over H1 | 9.2Γ lower MSE | 2.5% lower | 2.3% lower |
| H3_linear_unmatched MSE | 0.0948 Β± 0.0123 | 0.9343 Β± 0.0025 | 0.9195 Β± 0.0011 |
| H6_no_svd_direct MSE | 0.4606 | 0.9453 | 0.9359 |
The HIGH-band H2 advantage (9.2Γ lower MSE than full-SVD H1) is what made H2_linear_matched the canonical template. At MID and LOW the H-group converges (linear-matched, full-SVD, and direct-readout all within 3% of each other), so the H2 win is specifically a HIGH-band phenomenon. This matches the architectural reading: at D=4 the SVD dimension is small enough that linear readout matches the spectral bandwidth without information loss; at D=8 and D=16 the spectral residual matters.
TIER 4 β P-sweep small-battery floor grid (600 configs at 20 batches)
group_P_small_battery_floor from ablation_configs.py line 615. Full product: 5 Γ 5 Γ 3 Γ 2 Γ 2 Γ 2 = 600 configs. This is the parent grid the Q-sweep top-10 came out of. It runs after Phase 1 ablation establishes H2_linear_matched as the canonical template, and varies the architecture axes around that template at a budget-minimal 20 batches per config.
P-sweep grid axes
| axis | values | count |
|---|---|---|
| hidden | {4, 8, 16, 32, 64} | 5 |
| V | {2, 4, 8, 16, 32} | 5 |
| D | {2, 3, 4} | 3 |
| depth | {0, 1} | 2 |
| n_cross | {0, 1} | 2 |
| optimizer | {Adam, LBFGS} | 2 |
Pins (H2_linear_matched baseline)
svd='none',linear_readout=True,match_params=True(the H2 ablation winner)- HIGH band:
patch_size=4,img_size=64 batch_size=256,batch_limit=20(~5,120 samples seen per config)n_heads=1(since D varies down to 2, default n_heads=4 would fail)grad_clip=1.0(defensive β see TIER 4a)soft_hand=False,cv_measure_every=2- Adam:
lr=3e-3(Phase-2 default scaled to 20-batch budget) - LBFGS:
lr=1.0(default unit-Wolfe-step, lib's own line search) - Training: gaussian-only (
noise_types=[0]) - Testing: 16-noise per-noise generalization (
test_noise_types=list(range(16)), 256 samples per noise)
P-sweep outcomes by optimizer split
| optimizer | total configs | finite | NaN/divergent | Q-sweep top-10 representation |
|---|---|---|---|---|
| Adam | 300 | 300 (100%) | 0 | 6 of top-10 (Q-ranks 02, 03, 06, 07, 08, 09, 10) |
| LBFGS | 300 | 291 (97%) | 9 | 3 of top-10 (Q-ranks 01, 04, 05) |
| Total | 600 | 591 | 9 | 10 advanced to Q-sweep at 1000 batches |
The 9 NaN/divergent LBFGS configs all matched the 000099 Hessian-corruption profile: depth=1 + n_cross=1 architectures where gradient clipping inside the LBFGS closure caused the (s_k, y_k) Hessian approximation to underestimate, generating runaway Hβ»ΒΉ steps over enough iterations. 20 batches is the threshold below which divergence is incipient but not yet catastrophic; the 1000-batch Q-sweep would have surfaced 30 LBFGS divergences had it not been pre-fixed.
P-sweep geometric attractor split (observed across all 600 configs, confirmed in Q)
| D class | typical CV (post-20-batch) | attractor | members |
|---|---|---|---|
| D=4 | 0.86 - 1.07 (HIGH band) | sphere-solver (H2 family) | 200 configs (5 hidden Γ 5 V Γ 2 depth Γ 2 n_cross Γ 2 opt) |
| D=3 | ~0.03 (LOW band) | projective-clean on RPΒ² (P-class, originally framed "polynomial") | 200 configs |
| D=2 | undefined (V<5 cannot form pentachoron) | failed geometric validity | 200 configs |
200 D=4 configs are theoretically all H2 candidates. Survival of the 6 Adam + 3 LBFGS into Q-sweep top-10 depends on continued_training_potential, which combines convergence-rate-at-20-batches with extrapolated-MSE-at-1000-batches. Q-sweep's full table (Tier 2) gives the actual 1000-batch outcomes for those 10.
TIER 4a β P-sweep extrapolation rankings (the top-10 source data)
Each Q-sweep entry started as a P-sweep entry. The "P-MSE" column from group_Q_h2_candidates shows the MSE that ranked it at 20 batches:
| Q-rank | source P config | P-MSE (20 batch) | Q-MSE (1000 batch) | improvement | optimizer regime change |
|---|---|---|---|---|---|
| 01 | h64_V32_D4_dp1_nx0_lbfgs | 0.053 | 0.00421 | 13Γ | LBFGS clean at 1000 (post-fix) |
| 02 | h64_V32_D4_dp0_nx0_adam | 0.572 | 0.00205 | 279Γ | strongest extrapolation |
| 03 | h64_V32_D4_dp0_nx1_adam | 0.584 | 0.00250 | 234Γ | |
| 04 | h64_V32_D4_dp0_nx1_lbfgs | 0.041 | 0.00391 | 10Γ | LBFGS started low, gained little |
| 05 | h64_V16_D4_dp1_nx1_lbfgs | 0.115 | 0.03117 | 4Γ | V=16 hits H2b ceiling |
| 06 | h64_V32_D3_dp1_nx1_adam | 0.656 | 0.02497 | 26Γ | D=3 P-class |
| 07 | h64_V32_D3_dp0_nx1_adam | 0.641 | 0.03151 | 20Γ | D=3 P-class |
| 08 | h64_V32_D4_dp1_nx1_adam | 0.620 | 0.00231 | 268Γ | h2-64 single-bank arch |
| 09 | h64_V32_D3_dp0_nx0_adam | 0.638 | 0.02782 | 23Γ | D=3 P-class smallest |
| 10 | h64_V32_D2_dp0_nx1_adam | 0.736 | 0.16139 | 5Γ | D=2 β failed geometric validity |
Reading: Adam configs started P-sweep with high MSE (0.57-0.74) and gained 20-280Γ by 1000 batches. LBFGS configs started P-sweep with low MSE (0.04-0.12) and gained only 4-13Γ. This confirms the optimizer regime shift that 000100 logged: LBFGS dominates at short budgets (β€20 batches), Adam dominates at long budgets (β₯500 batches). The architectural template (h64_V32_D4 with depth/n_cross combinations) is unchanged across optimizers; only the optimization trajectory differs.
P-sweep coverage gaps
- 9 NaN configs never re-run with the post-fix trainer (parked open item from 000100). Could surface 9 additional Tier 2 candidates at the LBFGS+depth=1+n_cross=1 architecture frontier.
- V<32 LBFGS coverage is thin β only Q-rank05 represented from 300 LBFGS configs across V β {2, 4, 8, 16, 32}. The full P-sweep contains 60 LBFGS configs at each V; their ranking among each other is not surfaced into the catalog.
- hidden < 64 LBFGS coverage is sparse β Q-sweep's top-10 are all hidden=64. A dedicated lower-hidden LBFGS sub-sweep was never run.
Test of the natural-axis-count hypothesis: V matched to known polytope vertex counts on S^(D-1) should produce static sphere-solver rows (no rotating antipodal frame).
| variant | V | D | polytope | predicted | params |
|---|---|---|---|---|---|
| R_h64_V16_D4_16cell_orthoplex_adam | 16 | 4 | 16-cell (4-orthoplex) | H2-LIKE static | β |
| R_h64_V8_D4_8cell_or_16cell_subset_adam | 8 | 4 | 8-cell (tesseract) | H2-LIKE static | β |
| R_h64_V20_D3_dodecahedron_adam | 20 | 3 | dodecahedron | H2-LIKE static | β |
Pins: same H2_linear_matched baseline as Q, Adam @ lr=3e-3, depth=0, n_cross=0, 1000 batches, gaussian-only training, 16-noise per-noise test.
Status: trained (in phaseR_reports/ on HF), results not surfaced into the projective-clean catalog yet. Worth probing against the omega-class criterion since natural-axis-count framework predicts they should land cleanly.
TIER 4b β R-sweep polytope packing test (3 configs at 1000 batches)
group_R_packed_polytope_test from ablation_configs.py. Predicted-H2-LIKE configs where V is matched to known polytope vertex counts on S^(D-1). Same H2_linear_matched baseline as P/Q, Adam @ lr=3e-3, 1000 batches, gaussian-only training. Ran 2026-04-24 alongside Q-sweep.
| variant | V | D | polytope | predicted | architecturally implies |
|---|---|---|---|---|---|
| R_h64_V16_D4_16cell_orthoplex_adam | 16 | 4 | 16-cell (4-orthoplex) | H2-LIKE static rows | natural axis count for D=4 = 16 |
| R_h64_V8_D4_8cell_or_16cell_subset_adam | 8 | 4 | 8-cell (tesseract) | H2-LIKE static rows | sub-polytope vertex count |
| R_h64_V20_D3_dodecahedron_adam | 20 | 3 | dodecahedron | H2-LIKE static rows | natural axis count for D=3 = 20 |
Hypothesis tested: when V matches a known regular polytope vertex count for S^(D-1), training should produce static sphere-solver rows (no antipodal pair rotation). Phil's framing (000100): the 32-row Γ D=3 G-Class behavior emerged because 32 points cannot be uniformly arranged on SΒ² β geometric frustration. Match V to polytope, frustration disappears.
Status: trained, weights in phaseR_reports/ on HF, but codebooks were never extracted and probed against the projective-clean threshold. Worth running through extract_codebook to surface them into the verified-omega-class tier; the natural-axis-count framework (ft2 Β§9.1) predicts they should land cleanly. Open item from 000101.
TIER 5 β Phase S D=5 architecture floor map (1600 configs at 20 batches)
| axis | values | count |
|---|---|---|
| hidden | {4, 8, 16, 32, 64} | 5 |
| V | {2, 4, 8, 16, 32} | 5 |
| D | {5} | 1 |
| depth | {0, 1} | 2 |
| n_cross | {0, 1} | 2 |
| noise_type | {0..15} | 16 |
| optimizer | {Adam} | 1 (LBFGS too slow for sweep) |
Total: 5 Γ 5 Γ 1 Γ 2 Γ 2 Γ 16 Γ 1 = 1600 runs.
Headline finding (000105): cross-noise rank correlation +0.954. Architectures rank near-identically across all 16 noise types β what changes per noise is achievable floor MSE, not which model achieves it. Top-4 universal architectures all hidden=64, V=32.
Top-4 architectures from S analysis (mean rank across 16 noise types):
| rank | architecture | mean rank |
|---|---|---|
| 1 | h64_V32_dp0_nx1_D5 | 1.1 |
| 2 | h64_V32_dp1_nx0_D5 | (close to 1.1) |
| 3 | h64_V32_dp1_nx1_D5 | 1.9 |
| 5 | h64_V16_dp1_nx1_D5 | (the V<32 entry) |
Note: the 1391 individual config directories were lost to HF rate-limiting (87% of submitted commits failed). The 1600-config aggregate JSON survived; per-config artifacts mostly did not. Engineering invariant logged (000108): batch-sync uploads from this point forward.
TIER 6 β Phase T D=5 convergence sweep (64 configs at 1000 batches)
Top-4 S architectures Γ 16 noise types, run at A3-reference budget. The D=5 walk-back (000106) lives here.
| arch | hidden | V | depth | n_cross | optimizer | % projective-clean across 16 noises |
|---|---|---|---|---|---|---|
| h64_V16_dp1_nx1 | 64 | 16 | 1 | 1 | Adam | 62% (10/16) β D=5 sweet spot |
| h64_V32_dp0_nx1 | 64 | 32 | 0 | 1 | Adam | 50% (8/16) |
| h64_V8_dp1_nx0 | 64 | 8 | 1 | 0 | Adam | 25% (4/16) |
| h64_V32_dp1_nx1 | 64 | 32 | 1 | 1 | Adam | 19% (3/16) |
Headline: 23/64 (~36%) configs converged within Β±0.05 of uniform RPβ΄ baseline. V=16 was the geometric sweet spot at D=5, not V=32 β overturning the V=32 universality reading from A3's three runs.
Per-V deviation summary:
| V | mean dev | p25-p75 | in band? |
|---|---|---|---|
| 8 | +0.115 | [0.07, 0.18] | No |
| 16 | +0.040 | [0.01, 0.07] | Yes (only V whose mean lands inside) |
| 32 | +0.057 | [0.04, 0.07] | Just outside |
Salt_pepper anomaly (000106): 100% projective-clean across all 4 archs in Phase T despite being the worst noise to reconstruct in S-sweep (best MSE 2.51, ~34Γ worse than pink). Geometry decouples from MSE β a bank that fails to reconstruct can still produce a clean projective codebook. This matters for downstream cross-bank analysis: "the worst-fitting bank" might still produce the most useful projective representation.
Four_quadrant anomaly: 0% projective-clean across all 4 archs. Spatially structured noise where no architecture in T converged. Open in ft3 Β§10 as a deeper-probe candidate.
TIER 7 β A-set verification probes (the 19-model count from ft2)
These are the projective-codebook verification runs that produced the n=19 count cited in ft2's Section 5 table. All entries are individual probes with explicit deviation measurements.
| probe | model | D | V | n_axes | pairs | mean projective angle | uniform baseline | dev | result |
|---|---|---|---|---|---|---|---|---|---|
| A0 | G-Cand | 3 | 32 | 22 | 10 | 1.011 | 1.015 | β0.004 | PROJECTIVE-CLEAN |
| A1 | H2a | 4 | 32 | 26 | 6 | 1.116 | 1.114 | +0.002 | PROJECTIVE-CLEAN |
| A2 (Γ16 banks) | h2-64 single-noise (banks 0-15) | 4 | 32 | 24-27 | 5-8 | mean 1.115 | 1.105 | +0.010 mean, Β±0.013 | ALL 16 PROJECTIVE-CLEAN |
| A3 (Γ3 runs) | A3 D=5 (single-arch single-noise) | 5 | 16/32/64 | 16/29/51 | 0/3/13 | varies | varies | -0.015 / +0.016 / +0.019 | PROJECTIVE-CLEAN at A3, QUALIFIED by Phase T β generalization fails |
Probe count: 1 (A0) + 1 (A1) + 16 (A2) + 3 (A3) = 21 individual probe runs. ft2 cited "19 models" because the A3 three runs were treated as one architectural data point. Phase T (000106) re-classified A3 as a single-arch test, leaving 17 projective-clean instances at D=3/4 robustness level.
TIER 8 β Substrate prototype trained models (this week's runs)
| HF path | params | D | V | content | regime | result |
|---|---|---|---|---|---|---|
AbstractPhil/geolip-SVAE/bintree_proto_v1 |
57,215 | 4 | 32 | depth-4 binary tree, i.i.d. Bernoulli Β±1, BFS-encoded | PASSTHROUGH | best test_mse 3.5e-5 ep 20, 100% bits/trees from ep 1, CV 0.80-1.00, erank 4.00, ratio 1.00 |
AbstractPhil/geolip-SVAE/sentencepiece_proto_v1 |
57,215 | 4 | 32 | t5-base SP token IDs as 16-bit Β±1 floats | PASSTHROUGH | best test_mse 5.78e-6 ep 18, 100% bits/tokens from ep 1, Ξ±=0.023 throughout, CV 0.85-1.11, erank 4.00, ratio 0.99 |
AbstractPhil/geolip-SVAE/byte_trigram_proto_v1 |
57,215 | 4 | 32 | UTF-8 byte trigrams as RGB pixels at 256Γ256 | ENGAGED | best test_mse 1.7e-5 ep 19, 83.9% byte / 61.3% trigram from 0% floor, Ξ± 0.024β0.043, CV left band ep 6, ratio 1.07, erank 3.9955 dip |
The bintree + SP-bit pair establishes the passthrough control. Byte-trigram is the first text-engaged omega-class candidate but its codebook hasn't been formally probed against the projective threshold yet. Pending follow-up: byte_trigram_proto_128 (img_size decision pending, 100M sample-view target).
TIER 9 β Architectural templates with no measured instance
Documented architectures the catalog does not yet cover. Each is omega-class eligible under the architecture criterion but lacks a verification probe.
| template | arch | params | D | V | status | what's needed |
|---|---|---|---|---|---|---|
| Johanna D=16 | PatchSVAE-F | 8.7M (estimate) | 16 | 256 | not yet U5-tested | run extract_codebook on Johanna checkpoint, compute deviation from uniform RP^15 |
| Grandmaster omega tokens | concept | β | β | β | paper-level reference | needs trained instance + verification |
geolip-svae-nosvd-ablation repo |
svd_mode='none' variants | varies | varies | varies | independent repo, omega verification not surfaced into main catalog | inventory the trained checkpoints, run U5 across them |
| D=6, D=7, D=8 with V matched to natural axis count | predicted by Phase T framework | β | 6/7/8 | ~22/28/34 (predicted) | not run | sweep with natural-axis-count V matching |
TIER 10 β Explicitly excluded from omega-class
| exclusion | reason | source |
|---|---|---|
| D=2 configs | Cannot form pentachoron (needs β₯5 points), CV undefined | Q-rank 10 |
| Q-rank 10 (h64_V32_D2_dp0_nx1_adam) | D=2, MSE 0.16 = essentially failed reconstruction | Q-sweep |
| 9 P-sweep NaN configs | LBFGS Hessian-corruption casualties (000099 bug profile) | P-sweep, never re-run |
| bintree_proto_v1 | Passthrough regime, codebook not engaged | 000111 |
| sentencepiece_proto_v1 | Passthrough regime, cross-attn idle | 000112 |
| Phase T D=5 V=32 cells (most) | V over-counted vs natural axis count ~16, fails projective-clean | 000106 |
| Phase T D=5 four_quadrant (all 4 archs) | 0% projective-clean, spatial-structured noise | 000106 |
| ablation Group H, M, L SVD-removal variants | Spectrum-degenerate; not sphere-solvers in proper sense | Phase 1 ablation |
Smallest-instance benchmarks across the catalog
For when minimal-parameter operation matters:
| tier | smallest config | params | D | V | MSE | regime | use case |
|---|---|---|---|---|---|---|---|
| absolute smallest projective-clean | Q-rank09 (P-class) | 28,899 | 3 | 32 | 0.02782 | LOW-band projective-clean on RPΒ² | minimum-parameter omega |
| smallest H2a (canonical sphere-solver) | Q-rank02 | 40,227 | 4 | 32 | 0.00205 | HIGH-band sphere-solver on RPΒ³ | canonical sphere-solver baseline |
| Phase T D=5 V=16 sweet spot | h64_V16_dp1_nx1_D5 | ~36,607 | 5 | 16 | varies | D=5 partial projective-clean | D=5 representative |
| h2-64 single bank (production) | bank_idx 0..63 | 57,215 | 4 | 32 | varies | per-noise sphere-solver | bank-level training composition |
The 28,899-param P-class candidate (Q-rank09) is the absolute floor for projective-clean. The 40,227-param H2a (Q-rank02) is the floor for canonical sphere-solver behavior at D=4. Under 30K and under 41K respectively.
What's missing from this catalog
- byte_trigram_proto_v1 codebook investigation β engagement signature confirmed in trajectory, but
extract_codebookagainst the trained checkpoint hasn't been run for the formal projective-clean verification. Tier 1 entry should move to "verified" once this is done. - byte_trigram_proto_128 β pending img_size=64-vs-128 decision + 100M-sample-view run completion.
- 9 P-sweep NaN re-runs β never executed with the LBFGS-fixed trainer (parked open item from 000100). Could surface 9 additional Tier 2 candidates.
- R-sweep results probed against projective-clean criterion β the polytope-packing predictions (16-cell, 8-cell, dodecahedron) trained but their codebooks weren't surfaced into the catalog.
- Johanna D=16 verification β the only large-D representative of the noise-substrate line; not yet probed.
- Cross-substrate kNN graph β bintree, SP-bit, byte-trigram, h2-64-noise codebook-similarity matrix. The "what survives the universal-substrate-hope death" finding (000111) requires this measurement.
- Disproof candidates for Omega β the methodological pivot from 000108 demands negative-result candidates the catalog doesn't yet contain (non-spherical bottleneck variants, no-spatial-coherence content, byte-misaligned content).