# Omega-class battery potentials — exhaustive catalog Compiled 2026-04-29 from full scratchpad survey across sessions 000080-000113. Includes every documented config across all sweeps (P/Q/R/S/T plus the A-set verification probes), every published HF model, and the trained substrate prototypes from this week. Both Adam and LBFGS candidates are listed where the sweep ran both. ## Definition of omega-class A trained or candidate battery is **omega-class** if it satisfies all four: 1. **Sphere-solver architecture**: PatchSVAE with sphere-norm M tensor, V rows on S^(D-1), output basis on ℝP^(D-1) 2. **Projective-clean codebook**: |deviation from uniform RP^(D-1) baseline| < 0.05, secondary antipodal pair count ≤ 3, axis utilization > 0.95 3. **In its natural CV band** for arch class (table below) 4. **Codebook-engaged or sphere-engaged**: cross-attn coupling moves off floor, recovery curve from random init, geometric stats leave passthrough signature A "potential" is any trained instance OR sweep candidate that has run through the architecture and produced measurements against the criterion. ## Natural CV bands by architecture class | arch class | V | D | natural CV band | source | | --- | --- | --- | --- | --- | | noise-substrate (Freckles, Johanna) | 256 | 16 | 0.20-0.23 (sweet spot), 0.13-0.30 (band) | ft1 attractor, ablation Phase 1 | | h2-class | 32 | 4 | 0.80-1.05 | h2-64 measurement, 000111-112 | | P-class | 32 | 3 | 0.029-0.036 (LOW band) | Q-sweep ranks 06/07/09 | | Phase T D=5 V=16 | 16 | 5 | ~0.04 mean dev | Phase T sweet spot | | Phase T D=5 V=32 | 32 | 5 | varies by noise (partial) | Phase T (qualified ft2 D=5 claim) | ## Diagnostic signature (for testing any candidate) | dimension | passthrough | engaged (omega-class) | | --- | --- | --- | | α (cross-attn coupling) | stationary [0.020, 0.030] | rising monotonically off floor | | α\_std/α\_mean | flat ~0.06 | climbs (>0.30 in byte-trigram) | | row_cv | in arch's natural band | leaves natural band | | ratio S0/SD | ≈ 1.0 (flat spectrum) | drifts (>1.05) | | erank | flat at full rank | dips below | | recovery from random init | near-100% from ep 1 (sign-recovery trivial) | curve from ~0% upward | | codebook deviation | undefined / not measured | within ±0.05 of uniform RP^(D-1) | --- ## The load-bearing stack (non-negotiable for any recipe) Architectural primitives that make the omega regime exist. If any row is absent, the configuration is **not** an omega recipe — these are not tunable hyperparameters, they are the substrate. | primitive | code | breaks if removed | | --- | --- | --- | | Sphere-norm M rows | `M = F.normalize(M, dim=-1)` before SVD/readout | scale-explosion snap (G2/G4 confirm dev +0.16-0.36 across all bands) | | fp64 SVD path | `_svd_fp64` autocast-disabled, fp64 Gram + sqrt + U-recovery, floors 1e-24/1e-16 | discharge spikes, max_grad >1000 (paste 019) | | Orthogonal init on encoder readout | `nn.init.orthogonal_(enc_out.weight)`, re-applied after L-group overrides | L2/L3/L4 init variants regress | | Bounded-α cross-attn | `S_out = S · (1 + α · tanh(...))`, α ≤ max_α via sigmoid | unbounded-α (I4) poisons spectrum | | BoundarySmooth zero-init conv | identity-at-init smoothing post-stitch | boundary inconsistency under tile stitching | | Pure Adam (NEVER AdamW) | `torch.optim.Adam` | weight decay fights geometric structure | | No BatchNorm, no Dropout | — | spectral structure destabilizes | | No global average pool | flatten or spatial statistics | accuracy 70% → 29% in geometric encoders | | Patch-based forward | all components per-patch except cross-attn | breaks resolution invariance by design | --- ## The recipes Each recipe is a **named configuration** with verified or predicted omega signature. Empirical instances are linked to the tier inventory below. ### Recipe A-class — Johanna / Fresnel (workhorse) ``` matrix_v=256, D=16, patch_size=16, hidden=768, depth=4, n_cross_layers=2 ~17M params, full SVD path (svd_mode='svd_fp64') optimizer=Adam, lr=3e-4 ``` - Omega clean at scale; teachable downstream as a single instance. - Cost: ~30GB VRAM, ~4hr/epoch. **Not stackable.** - Verified: Tier 1 `v50_fresnel_64`. Pending: Tier 9 Johanna D=16 codebook probe. - Use when: workhorse encoder; downstream needs single-instance legibility. ### Recipe S-class — Freckles (the crown) ``` matrix_v=48, D=4, patch_size=4, hidden=384, depth=4, n_cross_layers=2 ~2.55M params, full SVD path, all defenses stacked optimizer=Adam, lr=3e-4 ``` - Perfect noise reconstruction (val MSE 5e-6 reporting floor on 16-noise mixture). - Resolution-invariant by construction — weights at N=256 work at N=4096 (architectural, not empirical). - **Fails to teach as single instance.** Requires the full Freckles array for legibility. - Verified: Tier 1 `v40` (64×64), `v41` (256×256). - Use when: noise/texture substrate; downstream consumes the array. ### Recipe H2-class — sphere-solver (h2-64 single bank) ``` matrix_v=32, D=4, patch_size=4, hidden=64, depth=1, n_cross_layers=1 linear_readout=True, svd_mode='none', match_params=True, smooth_mid=16 ~57,215 params per bank, optimizer=Adam, lr=3e-3 ``` - The canonical sphere-solver. SVD replaced by learned linear readout; column norms = S, identity = Vt. - Projective-clean at D=4: 24-27 axes/bank, dev +0.010 ±0.013 across 16 single-noise banks (A2 probe). - Stackable into 192-bank arrays; teaches via per-bank MSE signature. - Verified: Tier 1a (full 192-bank h2-64 array). Q-rank08 is the exact same architecture. - Use when: small-scale, multi-bank, codebook-engaged regime. **Identity invariants for loading h2-64 weights** (silent partial-load otherwise): - `smooth_mid=16` (NOT the ps-dependent default 8 at ps<16 — 440 params missing if wrong). - `linear_readout=True`, `svd_mode='none'`, `match_params=True`. - `n_heads` divides D — at D=4 with `n_heads=4`, head_dim=1. ### Recipe H2a — minimum H2-class (Q-rank02) ``` matrix_v=32, D=4, patch_size=4, hidden=64, depth=0, n_cross_layers=0 linear_readout=True, svd_mode='none', match_params=True ~40,227 params, optimizer=Adam, lr=3e-3 ``` - Minimum sphere-solver: depth+n_cross stripped to zero. - Q-MSE 0.00205 at 1000 batches — best Adam in Q-sweep. - Use when: floor-finding the sphere-solver capacity envelope at D=4. ### Recipe P-class — minimum projective-clean (Q-rank09) ``` matrix_v=32, D=3, patch_size=4, hidden=64, depth=0, n_cross_layers=0 linear_readout=True, svd_mode='none', match_params=True ~28,899 params, optimizer=Adam, lr=3e-3 ``` - Smallest projective-clean omega-class instance in the catalog. - LOW-band CV (~0.03), projective-clean on RP², 22 axes (10 pairs + 12 unpaired). - Use when: minimum parameter budget; D=3 RP² regime is acceptable. ### Recipe F-class — experimental nursery ``` matrix_v ∈ {32, 48, 64}, D ∈ {2, 4, 8, 16}, patch_size ∈ {8, 16} hidden ∈ {32, 64, 128}, depth ∈ {1, 2}, n_cross_layers ∈ {1, 2} 2K – 645K params per config, 1 epoch × 1M samples (triage regime) optimizer=Adam, lr=3e-4, soft-hand=CV-EMA prox boost (gradient-free) ``` - Designed to fail often. Collapse is a data point, not a bug. - Triage signal (1ep × 1M) ≡ 30ep × 200K for keep/kill discrimination at 1/5 wallclock. - Soft-hand: `loss = (1 + boost · prox) · recon_loss`; prox is Gaussian on `current_cv` vs CV-EMA. **No penalty term.** - Use when: searching for new viable templates; running large sweeps cheaply. ### Recipe T — D=5 sweet spot (Phase T) ``` matrix_v=16, D=5, hidden=64, depth=1, n_cross_layers=1 linear_readout=True, svd_mode='none', match_params=True optimizer=Adam, lr=3e-3, 1000 batches ``` - 62% projective-clean across 16 noise types — only V whose mean dev lands in band at D=5. - V=32 fails the cross-noise universality test at D=5 (Tier 6 supersedes A3's V=32 reading). - Use when: extending recipes to D=5 with realistic projective-clean expectation. ### Recipe R — polytope-packing (predicted, not yet probed) ``` (V, D) ∈ {(16, 4) 16-cell, (8, 4) 8-cell, (20, 3) dodecahedron} hidden=64, depth=0, n_cross_layers=0 linear_readout=True, svd_mode='none', match_params=True optimizer=Adam, lr=3e-3, 1000 batches ``` - Hypothesis: V matched to a known regular polytope vertex count for S^(D-1) → static rows, no antipodal pair rotation. Geometric frustration disappears. - Trained, weights in `phaseR_reports/` on HF; **codebooks not yet extracted**. - Use when: testing the natural-axis-count framework. Open item from 000101. ### Recipe J5 (LOW-band, unverified) ``` matrix_v=128, D=16, patch_size=16, hidden=128, depth=1, n_cross_layers=1 ~1.46M params estimated, optimizer=Adam ``` - Strongest unverified noise-substrate candidate. LOW-band MSE 0.9595 with dev exactly 0.000 against uniform RP¹⁵ baseline — best of any non-H-group entry at LOW band. - Worth a U5-style projective probe before committing to long-horizon training. --- ## The substrate layer (input encoding rules) Architecture and encoding are **separate hypotheses**. Passthrough on one encoding does not invalidate the architecture (lesson from 000113: SP-bit passthrough on h2-class → byte-trigram engaged on the same arch). ### Per-patch capacity arithmetic A `(C, ps, ps)` patch on the unit sphere supports: - **4×4 RGB ≈ 1M discriminable positions on S³** at 1° resolution. - The encoding's information cardinality must approach this for the codebook to do compression work. - Hard zeros are wasted capacity — every float in every patch should carry signal. ### Engagement vs passthrough The full diagnostic table is the "Diagnostic signature" section above. Shorthand: - **Engaged**: α rises monotonically; row_cv leaves natural class band; ratio S0/SD drifts; erank dips below D; recovery curve climbs from low. - **Passthrough**: α stationary at init; row_cv in band; ratio ≈ 1.0; erank flat at D; recovery near-100% from ep 1. ### Substrate pass/fail history | substrate | architecture probed | result | reason | | --- | --- | --- | --- | | 16-type `OmegaNoiseDataset` | A, S, F, h2 | Engaged | Original substrate; full per-patch density | | ImageNet random crops (sublens) | A (Fresnel-64 v50) | Engaged | Natural image structure; full RGB density | | Byte-trigram `(R,G,B)` (000113) | h2 | **Engaged** | 256³ cardinality per cell, every float carries signal | | Sentencepiece bit content (000112) | h2 | **Passthrough** | 1/3-filled patch with 32 hard-zero paddings — model bored | | Binary-tree i.i.d. Bernoulli (000111) | h2 | **Passthrough** | Sign-only signal; trivial under linear_readout=True | ### Substrate design rules 1. Every float in every patch should carry signal. Hard zeros = wasted capacity. 2. Information cardinality per cell should approach 256³ for RGB (or the equivalent for non-RGB channel counts). 3. Spatial coherence within the patch matters — noise/image/byte-trigram all have it; bit-encodings of token IDs do not. 4. Test before training: compute the patch-space cardinality of your encoding. If it's orders of magnitude below per-patch capacity, expect passthrough. --- ## The triage protocol (verifying a candidate is potentially an omega) Run in order. Stop at first failure. Most candidates die at step 3 or 4. ### Step 1. Architecture check (free, no training) - Inspect kwargs against the load-bearing stack. Missing items disqualify before any compute is spent. - Run the "debug move" from `CLAUDE.md`: instantiate, print `state_dict` keys + shapes, compare against a reference checkpoint. Mismatched `smooth_mid`, `n_heads`, `linear_readout`, `svd_mode`, `match_params` cause silent partial-loads under `strict=False`. ### Step 2. Substrate check (free, no training) - Compute per-patch information cardinality of the input encoding. - If cardinality << per-patch capacity, expect passthrough — fix the encoding before training. ### Step 3. 1-epoch triage (~minutes per config on a single GPU) - 1 epoch × 1M samples, gaussian-only training (`noise_types=[0]`). - Score on 16-noise per-noise generalization (256 samples per noise). - **Pass**: spectrum bounded (no scale-explosion snap); MSE within 2-3× class baseline. - **Fail**: NaN / divergence / snap event / MSE order-of-magnitude worse than class. ### Step 4. 1000-batch convergence sweep (~hour per config on a single GPU) - Full optimizer trajectory at lr=3e-3 (Adam) or lr=1.0 (LBFGS post-fix). - **Pass**: MSE leadership in its band; CV stays in natural band; α trajectory shows engagement signature. - **Optimizer regime** (000100): Adam dominates ≥500 batches; LBFGS niche is short-budget probing (≤100 batches). ### Step 5. Codebook extraction (the omega ratification) ```python from geolip_svae.inference import ( InferenceEngine, extract_codebook, make_calibration, ) calib = make_calibration('sixteen_noise', n=64, size=64) cb = extract_codebook( model, calib, model_id='...', calibration_name='sixteen_noise', ) assert cb.is_projective_clean() # |dev from uniform RP^(D-1)| < 0.05 assert abs(cb.deviation()) < 0.05 ``` - **Pass**: projective-clean, axis utilization > 0.95, ≤3 secondary antipodal pairs. - The candidate is now ratified as omega-class. ### Step 6. Vacuum-seal test (cell deployability) - Freeze candidate parameters (`requires_grad=False`). - Train an adapter classifier around it on a downstream task. - **Pass**: CV / erank / S0 stay locked under host gradient stress; classifier learns. - **Fail**: geometry collapses → not deployable as a cell. (CE on cell internals "ripped the internals to shreds" — paste 023.) ### Step 7. Bandwidth probe (downstream legibility) - Linear head on omega outputs, downstream task. - **Pass**: classifier achieves task-meaningful performance. - **Fail**: omega outputs lack representational bandwidth at this scale (S-class symptom). A candidate that passes 1–7 is a deployable omega cell ready for collective integration. --- ## Decision tree: which recipe? | Goal | Recipe | Why | | --- | --- | --- | | Stable workhorse encoder, single-instance teachable | A-class (Johanna/Fresnel) | Omega clean at scale; teaches downstream | | Best noise reconstruction, multi-bank deployment | S-class (Freckles) | Perfect recon; needs full array for legibility | | Small-scale, multi-bank ensemble, text/vision substrate | H2-class (h2-64) | 57K params/bank, projective-clean, stackable | | Minimum sphere-solver footprint at D=4 | H2a (Q-rank02) | 40K params, depth=0, n_cross=0, MSE 0.00205 | | Minimum projective-clean footprint at D=3 | P-class (Q-rank09) | 28.9K params, RP², MSE 0.028 | | Search for new templates cheaply | F-class triage | 1ep × 1M, designed to fail often | | D=5 representative | T (V=16) | Only V whose mean dev lands in band at D=5 | | Test natural-axis-count hypothesis | R polytope-packing | Predicts static-row H2-LIKE; codebooks unprobed | | LOW-band scale-up (unverified) | J5 | dev exactly 0.000 vs uniform RP¹⁵ baseline | --- # Empirical catalog (trained and in-progress instances) The sections below are the empirical record backing the recipes above. Each tier represents a class of trained or in-progress instances with measured signatures. --- ## TIER 1 — Trained, verified omega-class on HuggingFace | HF path | arch class | params | D | V | optimizer | natural CV | n_axes | dev | training content | verified by | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | `AbstractPhil/geolip-SVAE/v40` (Freckles 64×64) | Freckles | 2,557,539 | 4 | 48 | Adam | ~0.20-0.23 | — | <0.05 | 16-noise mixture | ft1, U5 cross-band | | `AbstractPhil/geolip-SVAE/v41` (Freckles 256×256) | Freckles | 2,557,539 | 4 | 48 | Adam | same | — | — | resolution-scaled v40 | continuation of v40 | | `AbstractPhil/geolip-SVAE/v50_fresnel_64` | Fresnel-base | 16,942,419 | 4 | — | Adam | — | — | — | 140M+ ImageNet random crops, sublens | streaming run, "phenomenal MSE recon" | | `AbstractPhil/geolip-svae-h2-64` (192-bank array) | H2_linear_matched | 57,215 × 64 batteries × 3 phases | 4 | 32 | Adam | 0.80-1.05 | 24-27/bank | +0.010 mean, ±0.013 std | per-bank, see Tier 1a | A2 probe (ft2), 192-bank cosine sweep (000109) | | `AbstractPhil/geolip-svae-implicit-solver-experiments/G-Cand` | H2-class | 28,899-45,852 | 3 | 32 | Adam | 0.03 (LOW) | 22 (10 pairs + 12 unpaired) | −0.004 | gaussian-only | A0 probe (ft2) | | `AbstractPhil/geolip-svae-implicit-solver-experiments/H2a` | H2_linear_matched | 40,227-57,215 | 4 | 32 | Adam | 0.80-1.10 | 26 (6 pairs + 20 unpaired) | +0.002 | gaussian-only | A1 probe (ft2) | | `AbstractPhil/geolip-svae-implicit-solver-experiments/A3` (3 runs, qualified) | H2 variant | varies | 5 | 16/32/64 | Adam | varies | 16/29/51 | −0.015 / +0.016 / +0.019 | gaussian-only single-arch | A3 probe (ft2). **QUALIFIED by 000106** — single-arch single-noise. Phase T showed D=5 partial. | | `AbstractPhil/geolip-SVAE/byte_trigram_proto_v1` | h2-class | 57,215 | 4 | 32 | Adam | left band ep 6 → 1.31 | TBD | TBD | wikitext-103 byte trigrams | 000113 — engaged signature confirmed; codebook investigation pending | ## TIER 1a — h2-64 array decomposition (192 banks = 64 batteries × 3 phases) All 16 single-noise experts (Group 1) verified PROJECTIVE-CLEAN in A2. Remaining 48 batteries share architecture and were measured in 192-bank cosine sweep but not individually probed against projective threshold. ### Group 1 — 16 single-noise experts (banks 0-15) | bank | noise type | special | training distribution | | --- | --- | --- | --- | | 0 | gaussian | universal-pull center | standard normal noise | | 1 | uniform | — | uniform[-1,1] noise | | 2 | uniform_scaled | — | scaled uniform | | 3 | poisson | — | poisson-distributed | | 4 | pink | clone-pair with bank 5 | 1/f spectrum | | 5 | brown | clone-pair with bank 4 | 1/f² spectrum | | 6 | salt_pepper | hardest reconstruction (S-sweep), cleanest projective at D=5 (Phase T) | impulse-style | | 7 | sparse_impulses | cosine outlier — heavy-tailed | sparse extreme-value | | 8 | block_upsampled | — | block-correlated | | 9 | gradient_gaussian | **most isolated battery on S³** — only non-stationary noise | spatial gradient | | 10 | checker | structured noise | checkerboard | | 11 | gauss_uniform_mix | — | mixture | | 12 | four_quadrant | 0% projective-clean across all Phase T archs | structured | | 13 | cauchy | heavy-tailed | cauchy-distributed | | 14 | exponential | — | exponential | | 15 | laplace | heavy-tailed | laplace-distributed | ### Group 2 — gaussian+one pairs + generalist (banks 16-31) 15 pair-trained banks (gaussian + each of noises 1-15), plus 1 generalist trained on all 16. Notable: pairs (19, 20) = (gaussian+pink, gaussian+brown) are clone-pair on S³ for the same noise-family adjacency reason as (4, 5). Pair (16, 26) = (gaussian+uniform, gaussian+gauss_uniform_mix) is a clone-pair (gaussian dominates the residual; both bounded distributions). ### Group 3 — gaussian-balanced quads (banks 32-47) 16 (gaussian, easy, medium, hard) covers via stride-7 deterministic enumeration over the EASY (uniform/uniform_scaled/cauchy/exponential/laplace) × MEDIUM (poisson/salt_pepper/sparse_impulses/gauss_uniform_mix) × HARD (pink/brown/block_upsampled/gradient_gaussian/checker/four_quadrant) product. All contain gaussian. ### Group 4 — no-gaussian quads (banks 48-63) 16 (easy, medium, hard, hard) covers via stride-19 deterministic enumeration. **No gaussian seen during training.** Banks 48 and 51 are kNN top-5 outliers because they solve the sphere problem from a fundamentally different starting distribution than Groups 1-3 (which all see gaussian as universal pull-toward-interior). --- ## TIER 2 — Q-sweep candidates (10 top-P configs × 1000 batches) After the LBFGS Hessian-corruption fix (000099), the Q-sweep ran clean on 10/10 configs. **All 10 are listed; both Adam and LBFGS variants where present.** | Q-rank | variant | params | optim | G-MSE | CV | D | V | depth | n_cross | class | notes | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 01 | Q_rank01_h64_V32_D4_dp1_nx0_lbfgs | 57,123 | LBFGS | 0.00421 | 0.954 | 4 | 32 | 1 | 0 | **H2a** | LBFGS Q-sweep best | | 02 | Q_rank02_h64_V32_D4_dp0_nx0_adam | **40,227** | Adam | **0.00205** | 0.862 | 4 | 32 | 0 | 0 | **H2a** | **Smallest H2a, canonical sphere-solver. ≈ 1 h2-64 bank's capacity at 70% the params.** | | 03 | Q_rank03_h64_V32_D4_dp0_nx1_adam | 40,319 | Adam | 0.00250 | 0.890 | 4 | 32 | 0 | 1 | **H2a** | Adam-vs-LBFGS twin of rank 04 | | 04 | Q_rank04_h64_V32_D4_dp0_nx1_lbfgs | 40,319 | LBFGS | 0.00391 | 0.893 | 4 | 32 | 0 | 1 | **H2a** | LBFGS twin of rank 03; Adam wins 36% lower MSE | | 05 | Q_rank05_h64_V16_D4_dp1_nx1_lbfgs | 36,607 | LBFGS | 0.03117 | 1.069 | 4 | 16 | 1 | 1 | **H2b** | Only V=16 candidate; CV slightly above HIGH ceiling. **Underexplored size class.** | | 06 | Q_rank06_h64_V32_D3_dp1_nx1_adam | 45,852 | Adam | 0.02497 | 0.029 | 3 | 32 | 1 | 1 | **P-class (D=3)** | LOW-band; originally framed "polynomial," now confirmed projective-clean on RP² | | 07 | Q_rank07_h64_V32_D3_dp0_nx1_adam | 28,956 | Adam | 0.03151 | 0.036 | 3 | 32 | 0 | 1 | **P-class (D=3)** | Smaller P-class variant | | 08 | Q_rank08_h64_V32_D4_dp1_nx1_adam | **57,215** | Adam | 0.00231 | 0.960 | 4 | 32 | 1 | 1 | **H2a** | **Exact h2-64 single-bank arch** (depth=1+n_cross=1) | | 09 | Q_rank09_h64_V32_D3_dp0_nx0_adam | **28,899** | Adam | 0.02782 | 0.035 | 3 | 32 | 0 | 0 | **P-class (D=3)** | **Smallest projective-clean omega-class candidate. Under 30K params.** 30% smaller than H2a at ~14× MSE cost. | | 10 | Q_rank10_h64_V32_D2_dp0_nx1_adam | 19,649 | Adam | 0.16139 | 0.000 | 2 | 32 | 0 | 1 | **EXCLUDED** | D=2 cannot form pentachoron (needs ≥5 points), CV undefined | **Per-architecture optimizer comparison** (where direct twin exists): | arch | Adam Q-rank | Adam MSE | LBFGS Q-rank | LBFGS MSE | winner | | --- | --- | --- | --- | --- | --- | | h64_V32_D4_dp0_nx0 | 02 | 0.00205 | (no twin) | — | Adam | | h64_V32_D4_dp0_nx1 | 03 | 0.00250 | 04 | 0.00391 | **Adam by 36%** | | h64_V32_D4_dp1_nx0 | (no twin) | — | 01 | 0.00421 | LBFGS only | | h64_V32_D4_dp1_nx1 | 08 | 0.00231 | (no twin) | — | Adam | **Optimizer regime guidance** (000100, post-LBFGS-fix): Adam @ lr=3e-3 dominates at 1000-batch budgets. LBFGS retains niche for short-budget probing (≤100 batches) and floor-finding sweeps. For sphere-solver canonical training: **Adam is the recommended default** since 2026-04-24. --- ## TIER 3 — Phase 1 ablation grid (233 configs across 15 groups × 3 bands × seeds) The Phase 1 ablation predates the P-sweep. **A through M groups, each varying one architectural axis at a time, all three CV bands (LOW D=16, MID D=8, HIGH D=4), seed-replicated where statistically meaningful.** 233 total entries; 227 band-match (97%); 229 params-finite (98%); 223 valid (band-match + finite). Data from `omega_inventory.csv`. Columns: D / V / hidden / depth / patch_size = `dp`/`ps` / n_seeds / mean test MSE / mean CV / band deviation (observed_sphere_cv − uniform_RP^(D-1)_baseline) / n_finite seeds. ### HIGH band (D=4, V=32, hidden=64, ps=4) — sphere-solver candidates | group | variant | n | mse | cv | dev | fin | notes | | --- | --- | --- | --- | --- | --- | --- | --- | | A | baseline | 5 | 1.4591 | 0.902 | +0.018 | 5 | seed-replicated reference | | B | B1_all16 | 1 | 1.4709 | 0.858 | +0.021 | 1 | 16-noise mixture | | B | B2_gaussian_only | 1 | 0.9590 | 0.753 | +0.027 | 1 | gaussian-only train | | B | B3_structured | 1 | 1.7076 | 0.968 | −0.036 | 1 | structured-noise train | | B | B4_heavy_tailed | 1 | 1.7982 | 0.820 | +0.064 | 1 | heavy-tailed train | | B | B5_first_half | 1 | 1.4395 | 0.944 | +0.025 | 1 | noises 0-7 | | B | B6_even_indices | 1 | 1.9918 | 0.880 | +0.023 | 1 | noises 0,2,4,...,14 | | C | C1_adam | 1 | 1.4667 | 0.860 | +0.022 | 1 | Adam @ default lr | | C | C2_sgd | 1 | 1.5670 | 0.858 | +0.019 | 1 | plain SGD | | C | C3_sgd_momentum | 1 | 1.2006 | 0.860 | +0.013 | 1 | SGD with momentum | | C | C4_adamw | 1 | 1.4706 | 0.859 | +0.023 | 1 | AdamW | | C | C5_lbfgs | 1 | nan | 0.943 | −0.923 | 1 | **LBFGS Hessian-corruption — pre-fix bug** | | D | D1_cosine | 1 | 1.4712 | 0.858 | +0.020 | 1 | cosine LR | | D | D2_constant | 1 | 1.3028 | 0.858 | +0.028 | 1 | constant LR | | D | D3_linear_decay | 1 | 1.4652 | 0.858 | +0.021 | 1 | linear decay | | D | D4_warm_restart | 1 | 1.3216 | 0.858 | +0.016 | 1 | warm restart | | D | D5_one_cycle | 1 | 1.4656 | 0.858 | +0.019 | 1 | one-cycle LR | | E | E1_full_softhand | 3 | 0.4420 | 0.907 | +0.026 | 3 | soft-hand CV regularizer | | E | E2_pure_mse | 3 | 0.4692 | 0.904 | +0.037 | 3 | pure MSE, no regularizer | | E | E3_measure_only | 3 | 0.4517 | 0.908 | −0.011 | 3 | measure CV but don't regularize | | E | E4_hard_cv_penalty | 3 | 0.4504 | 0.902 | +0.002 | 3 | hard CV penalty | | F | F1_gelu | 1 | 1.4669 | 0.858 | +0.022 | 1 | GELU activation | | F | F2_relu | 1 | 1.4744 | 0.855 | +0.025 | 1 | ReLU | | F | F3_silu | 1 | 1.4702 | 0.901 | +0.022 | 1 | SiLU | | F | F4_tanh | 1 | 1.4550 | 0.806 | +0.027 | 1 | tanh | | F | F5_identity | 1 | 1.4522 | 0.866 | +0.031 | 1 | identity (no activation) | | G | G1_sphere_norm | 1 | 1.4668 | 0.858 | +0.022 | 1 | reference (sphere-norm M) | | G | G2_no_norm | 1 | 1.4680 | 1.112 | **+0.357** | 1 | **REMOVES sphere-norm — geometry breaks** | | G | G3_layer_norm | 1 | 1.4482 | 0.701 | **−0.217** | 1 | **layer-norm instead — geometry breaks** | | G | G4_scale_only | 1 | 1.4588 | 1.112 | **+0.359** | 1 | **scale-only — geometry breaks** | | H | H1_svd_fp64 | 3 | 0.4198 | 0.906 | +0.006 | 3 | full-fp64 SVD reference | | **H** | **H2_linear_matched** | **3** | **0.0456** | **0.908** | **+0.005** | **3** | **CANONICAL — H2-class baseline.** 9× lower MSE than H1 fp64. | | H | H3_linear_unmatched | 3 | 0.0948 | 0.911 | +0.001 | 3 | linear readout, mismatched dims | | H | H5_batch_shared_svd | 2 | 0.2799 | 0.918 | +0.056 | 2 | shared-SVD across batch | | H | H6_no_svd_direct | 1 | 0.4606 | 0.902 | +0.085 | 1 | direct readout, no SVD | | I | I1_1layer | 1 | 1.4733 | 0.859 | +0.023 | 1 | 1 cross-attn layer | | I | I2_0layers | 1 | 1.8687 | 0.910 | +0.087 | 1 | 0 cross-attn layers | | I | I3_2layers | 1 | 1.6295 | 0.924 | −0.002 | 1 | 2 cross-attn layers | | I | I4_unbounded_alpha | 1 | 1.4803 | 0.859 | +0.022 | 1 | no clip on α | | K | K1_bs128 | 1 | 1.4653 | 0.858 | +0.021 | 1 | batch_size=128 | | K | K2_bs32 | 1 | 1.4276 | 0.858 | +0.024 | 1 | batch_size=32 | | K | K3_bs512 | 1 | 1.5732 | 0.859 | +0.024 | 1 | batch_size=512 | | K | K4_bs1024 | 1 | 1.5428 | 0.859 | +0.017 | 1 | batch_size=1024 | | L | L1_orthogonal | 1 | 1.4802 | 0.858 | +0.018 | 1 | orthogonal init | | L | L2_kaiming | 1 | 2.4802 | 0.912 | −0.009 | 1 | Kaiming init | | L | L3_xavier | 1 | 1.6138 | 0.952 | +0.025 | 1 | Xavier init | | L | L4_normal_small | 1 | 1.5360 | 0.942 | −0.019 | 1 | normal init small std | | L2 | L2_lbfgs_pure_mse | 3 | **0.0058** | 0.878 | **−0.605** | 1 | **LBFGS+pure MSE: BEST MSE ON HIGH but only 1/3 finite, geometry broken** | | M | M1_sgd_aggressive | 1 | 1.0770 | 0.858 | +0.009 | 1 | SGD aggressive | | M | M2_sgd_huge_lr | 1 | 0.6798 | 0.858 | +0.052 | 1 | SGD huge lr | | M | M3_sgd_high_momentum | 1 | 1.2939 | 0.859 | +0.013 | 1 | SGD high momentum | | E_preview | E1-E4 (4 variants × 1 seed each) | 4 | ~1.47 | ~0.858 | +0.020 to +0.028 | 4 | 1-seed preview of Group E | **HIGH-band omega-class candidates (band-match + finite + |dev|<0.05)**: 49 of 51 HIGH entries. The two failures: G2_no_norm (+0.357 — confirms sphere-norm is non-negotiable for omega-class) and G4_scale_only (+0.359 — same finding). G3_layer_norm at −0.217 is the third sphere-norm-disruption case. C5_lbfgs is the pre-fix Hessian-corruption casualty. **HIGH-band MSE leader**: **H2_linear_matched at MSE 0.0456 with dev +0.005**. This is the **canonical sphere-solver template** that h2-64 was built from — 9× lower MSE than the fp64-SVD baseline (H1) at the same dev tolerance. The L2_lbfgs_pure_mse achieves MSE 0.0058 (8× better still) but only 1 of 3 seeds converged finite and geometry deviation hits −0.605, so it's not omega-class despite the MSE win. ### MID band (D=8, V=64, hidden=64, ps=16) — bulk-Gaussian attractor | group | variant | n | mse | cv | dev | fin | | --- | --- | --- | --- | --- | --- | --- | | A | baseline | 5 | 1.0115 | 0.359 | +0.005 | 5 | | B | B1_all16 | 1 | 1.0088 | 0.380 | −0.004 | 1 | | B | B2_gaussian_only | 1 | 0.9530 | 0.369 | −0.017 | 1 | | B | B3_structured | 1 | 0.7076 | 0.347 | −0.007 | 1 | | B | B4_heavy_tailed | 1 | 1.6415 | 0.352 | +0.007 | 1 | | B | B5_first_half | 1 | 0.9441 | 0.380 | −0.004 | 1 | | B | B6_even_indices | 1 | 1.1708 | 0.352 | −0.010 | 1 | | C | C1_adam | 1 | 1.0093 | 0.380 | −0.004 | 1 | | C | C2_sgd | 1 | 1.7354 | 0.381 | −0.018 | 1 | | C | C3_sgd_momentum | 1 | 1.0390 | 0.380 | +0.001 | 1 | | C | C4_adamw | 1 | 1.0095 | 0.379 | −0.005 | 1 | | C | C5_lbfgs | 1 | nan | 0.337 | −0.357 | 1 | LBFGS pre-fix | | D | D1-D5 (5 variants) | 5 | ~1.00 | ~0.380 | −0.002 to −0.009 | 5 | | E | E1_full_softhand | 3 | 0.9441 | 0.361 | +0.008 | 3 | | E | E2_pure_mse | 3 | 0.9416 | 0.362 | +0.010 | 3 | | E | E3_measure_only | 3 | 0.9430 | 0.363 | +0.009 | 3 | | E | E4_hard_cv_penalty | 3 | 0.9447 | 0.362 | +0.011 | 3 | | F | F1-F5 (5 variants) | 5 | ~1.01 | 0.375-0.388 | −0.002 to −0.009 | 5 | | G | G1_sphere_norm | 1 | 1.0105 | 0.381 | −0.005 | 1 | reference | | G | G2_no_norm | 1 | 0.9935 | 0.560 | **+0.198** | 1 | **sphere-norm removed — geometry breaks** | | G | G3_layer_norm | 1 | 1.0093 | 0.428 | +0.064 | 1 | | G | G4_scale_only | 1 | 1.0103 | 0.556 | **+0.149** | 1 | **scale-only — geometry breaks** | | H | H1_svd_fp64 | 3 | 0.9429 | 0.362 | +0.009 | 3 | | H | H2_linear_matched | 3 | **0.9195** | 0.365 | +0.015 | 3 | best MID-band MSE among finite | | H | H3_linear_unmatched | 3 | 0.9343 | 0.359 | +0.018 | 3 | | H | H4_svd_fp32 | 2 | 0.9434 | 0.368 | +0.015 | 2 | | H | H5_batch_shared_svd | 2 | 0.9439 | 0.367 | +0.016 | 2 | | H | H6_no_svd_direct | 1 | 0.9453 | 0.368 | −0.002 | 1 | | I | I1-I4 (4 variants) | 4 | 1.007-1.011 | 0.355-0.380 | −0.005 to +0.004 | 4 | | K | K1-K4 (4 variants) | 4 | 0.999-1.038 | ~0.380 | −0.002 to −0.012 | 4 | | L | L1-L4 (4 variants) | 4 | 1.011-1.340 | 0.336-0.383 | +0.006 to +0.011 | 4 | | L2 | L2_lbfgs_pure_mse | 3 | **0.8924** | 0.359 | **−0.233** | 1 | LBFGS pre-fix | | M | M1-M3 (3 variants) | 3 | 0.976-1.024 | ~0.380 | +0.002 to +0.006 | 3 | | E_preview | E1-E4 | 4 | ~1.01 | ~0.380 | −0.004 to −0.006 | 4 | **MID-band omega-class candidates**: 73 of 78 MID entries. Same sphere-norm-disruption failures (G2, G4). MSE leader at H2_linear_matched 0.9195 with dev +0.015. ### LOW band (D=16, V=64, hidden=64, ps=16) — noise-substrate attractor | group | variant | n | mse | cv | dev | fin | | --- | --- | --- | --- | --- | --- | --- | | A | baseline | 5 | 1.0080 | 0.197 | −0.000 | 5 | reference (5-seed) | | B | B1_all16 | 1 | 1.0104 | 0.203 | −0.003 | 1 | | B | B2_gaussian_only | 1 | 0.9389 | 0.210 | +0.001 | 1 | | B | B3_structured | 1 | 0.7051 | 0.207 | +0.006 | 1 | | B | B4_heavy_tailed | 1 | 1.6091 | 0.191 | +0.009 | 1 | | B | B5_first_half | 1 | 0.9401 | 0.201 | −0.005 | 1 | | B | B6_even_indices | 1 | 1.1678 | 0.211 | −0.001 | 1 | | C | C1-C4 (4 finite) | 4 | 1.008-1.626 | ~0.204 | −0.004 to +0.008 | 4 | | C | C5_lbfgs | 1 | nan (excluded) | — | — | 0 | | D | D1-D5 | 5 | 0.974-1.010 | 0.203-0.205 | −0.000 to −0.003 | 5 | | E | E1_full_softhand | 3 | 0.9338 | 0.199 | +0.002 | 3 | | E | E2_pure_mse | 3 | 0.9340 | 0.200 | +0.001 | 3 | | E | E3_measure_only | 3 | 0.9342 | 0.200 | +0.002 | 3 | | E | E4_hard_cv_penalty | 3 | 0.9336 | 0.199 | +0.001 | 3 | | F | F1-F5 | 5 | 1.003-1.011 | 0.199-0.204 | −0.001 to −0.004 | 5 | | G | G1_sphere_norm | 1 | 1.0085 | 0.204 | −0.003 | 1 | | G | G2_no_norm | 1 | 0.9914 | 0.353 | **+0.174** | 1 | | G | G3_layer_norm | 1 | 1.0048 | 0.211 | +0.001 | 1 | | G | G4_scale_only | 1 | 1.0092 | 0.347 | **+0.159** | 1 | | H | H1_svd_fp64 | 3 | 0.9345 | 0.200 | +0.002 | 3 | | H | H2_linear_matched | 3 | **0.9128** | 0.204 | +0.005 | 3 | | H | H3_linear_unmatched | 3 | 0.9195 | 0.202 | +0.010 | 3 | | H | H4_svd_fp32 | 2 | 0.9327 | 0.202 | +0.002 | 2 | | H | H5_batch_shared_svd | 2 | 0.9331 | 0.202 | +0.003 | 2 | | H | H6_no_svd_direct | 1 | 0.9359 | 0.202 | +0.003 | 1 | | I | I1-I4 | 4 | 1.004-1.014 | 0.201-0.210 | −0.001 to +0.012 | 4 | | **J** | **J1_V64_h64** | **1** | **1.0094** | **0.204** | **−0.002** | **1** | **V=64 h=64 — Group J is V-sweep on D=16** | | **J** | **J2_V32_h32** | **1** | **1.1314** | **0.205** | **−0.028** | **1** | V=32 h=32 — capacity floor probe | | **J** | **J3_V16_h32** | **1** | **1.1862** | **0.234** | **−0.002** | **1** | V=16 h=32 — sub-capacity test | | **J** | **J4_V64_h32** | **1** | **1.0957** | **0.208** | **−0.008** | **1** | V=64 h=32 — hidden-floor test | | **J** | **J5_V128_h128** | **1** | **0.9595** | **0.199** | **−0.000** | **1** | V=128 h=128 — over-capacity, dev exactly zero | | K | K1-K4 | 4 | 0.997-1.047 | ~0.204 | −0.003 to −0.006 | 4 | | L | L1-L4 | 4 | 1.009-1.242 | 0.197-0.204 | +0.002 to +0.012 | 4 | | M | M1-M3 | 3 | 0.977-1.011 | 0.204 | +0.002 to +0.006 | 3 | | E_preview | E1-E4 | 4 | ~1.01 | ~0.204 | −0.001 to −0.003 | 4 | **LOW-band omega-class candidates**: 76 of 79 LOW entries. Same sphere-norm-disruption failures (G2, G4). MSE leader: H2_linear_matched at 0.9128 with dev +0.005 — the same H2 template that wins on every band. **Group J — V × hidden capacity sweep at D=16**: only 5 configs (one of the smallest groups), but the only group that varies V away from the standard {32, 64} options at the LOW-band. Confirms V=128 (J5) drives dev to exactly 0.000 and produces the lowest MSE in the LOW group at 0.9595. V=16 (J3) at h=32 still maintains dev −0.002 but loses 23% on MSE relative to V=128. **J5_V128_h128 is the strongest unverified noise-substrate candidate** in the catalog — params 1.46M (estimated from V=128, h=128), dev exactly zero against the uniform-RP^15 baseline, lowest MSE in the LOW band among non-H-group entries. Worth a U5-style projective probe. ### Phase 1 ablation — engineering invariants surfaced * **Sphere-norm (Group G) is non-negotiable**: removing it (G2, G4) blows up dev across all three bands by 0.15-0.36. Confirmed independently in HIGH/MID/LOW. * **H2_linear_matched is the canonical template**: best MSE in 2 of 3 bands (LOW + HIGH), within 0.003 of best in MID. Forms the architectural template behind h2-64, all H2a/H2b Q-sweep candidates, and the substrate prototypes (bintree/SP-bit/byte-trigram). * **L2_lbfgs_pure_mse achieves the lowest MSE on HIGH** (0.0058) at the cost of 67% non-convergence and broken geometry — the pre-fix Hessian-corruption pattern that 000099 diagnosed. * **CV bands quantize cleanly by D**: HIGH ≈ 0.86 (D=4), MID ≈ 0.36 (D=8), LOW ≈ 0.20 (D=16). Bulk of variants land within ±0.05 of band center; the failures are sphere-norm disruption + LBFGS Hessian corruption. * **Group A 5-seed reference** establishes within-config variance: HIGH dev variance ±0.018, MID ±0.005, LOW ±0.000. Within-config noise is comparable to or smaller than the |dev|<0.05 projective-clean threshold, so single-seed entries are still informative for the omega-class criterion. ### Phase 1 ablation — direct architecture comparisons | comparison | HIGH | MID | LOW | | --- | --- | --- | --- | | H2_linear_matched (canonical) MSE | 0.0456 ± 0.0076 | 0.9195 ± 0.0026 | 0.9128 ± 0.0043 | | H1_svd_fp64 (full SVD) MSE | 0.4198 ± 0.0154 | 0.9429 ± 0.0041 | 0.9345 ± 0.0023 | | H2 advantage over H1 | **9.2× lower MSE** | 2.5% lower | 2.3% lower | | H3_linear_unmatched MSE | 0.0948 ± 0.0123 | 0.9343 ± 0.0025 | 0.9195 ± 0.0011 | | H6_no_svd_direct MSE | 0.4606 | 0.9453 | 0.9359 | The HIGH-band H2 advantage (9.2× lower MSE than full-SVD H1) is what made H2_linear_matched the canonical template. At MID and LOW the H-group converges (linear-matched, full-SVD, and direct-readout all within 3% of each other), so the H2 win is specifically a HIGH-band phenomenon. This matches the architectural reading: at D=4 the SVD dimension is small enough that linear readout matches the spectral bandwidth without information loss; at D=8 and D=16 the spectral residual matters. --- ## TIER 4 — P-sweep small-battery floor grid (600 configs at 20 batches) `group_P_small_battery_floor` from `ablation_configs.py` line 615. **Full product: 5 × 5 × 3 × 2 × 2 × 2 = 600 configs.** This is the parent grid the Q-sweep top-10 came out of. It runs after Phase 1 ablation establishes H2_linear_matched as the canonical template, and varies the architecture axes around that template at a budget-minimal 20 batches per config. ### P-sweep grid axes | axis | values | count | | --- | --- | --- | | hidden | {4, 8, 16, 32, 64} | 5 | | V | {2, 4, 8, 16, 32} | 5 | | D | {2, 3, 4} | 3 | | depth | {0, 1} | 2 | | n_cross | {0, 1} | 2 | | optimizer | {Adam, LBFGS} | 2 | ### Pins (H2_linear_matched baseline) * `svd='none'`, `linear_readout=True`, `match_params=True` (the H2 ablation winner) * HIGH band: `patch_size=4`, `img_size=64` * `batch_size=256`, `batch_limit=20` (~5,120 samples seen per config) * `n_heads=1` (since D varies down to 2, default n_heads=4 would fail) * `grad_clip=1.0` (defensive — see TIER 4a) * `soft_hand=False`, `cv_measure_every=2` * Adam: `lr=3e-3` (Phase-2 default scaled to 20-batch budget) * LBFGS: `lr=1.0` (default unit-Wolfe-step, lib's own line search) * Training: gaussian-only (`noise_types=[0]`) * Testing: 16-noise per-noise generalization (`test_noise_types=list(range(16))`, 256 samples per noise) ### P-sweep outcomes by optimizer split | optimizer | total configs | finite | NaN/divergent | Q-sweep top-10 representation | | --- | --- | --- | --- | --- | | Adam | 300 | 300 (100%) | 0 | 6 of top-10 (Q-ranks 02, 03, 06, 07, 08, 09, 10) | | LBFGS | 300 | 291 (97%) | 9 | 3 of top-10 (Q-ranks 01, 04, 05) | | **Total** | **600** | **591** | **9** | **10 advanced to Q-sweep at 1000 batches** | The 9 NaN/divergent LBFGS configs all matched the **000099 Hessian-corruption profile**: depth=1 + n_cross=1 architectures where gradient clipping inside the LBFGS closure caused the (s_k, y_k) Hessian approximation to underestimate, generating runaway H⁻¹ steps over enough iterations. 20 batches is the threshold below which divergence is incipient but not yet catastrophic; the 1000-batch Q-sweep would have surfaced 30 LBFGS divergences had it not been pre-fixed. ### P-sweep geometric attractor split (observed across all 600 configs, confirmed in Q) | D class | typical CV (post-20-batch) | attractor | members | | --- | --- | --- | --- | | **D=4** | 0.86 - 1.07 (HIGH band) | **sphere-solver** (H2 family) | 200 configs (5 hidden × 5 V × 2 depth × 2 n_cross × 2 opt) | | **D=3** | ~0.03 (LOW band) | **projective-clean on RP²** (P-class, originally framed "polynomial") | 200 configs | | **D=2** | undefined (V<5 cannot form pentachoron) | failed geometric validity | 200 configs | 200 D=4 configs are theoretically all H2 candidates. Survival of the 6 Adam + 3 LBFGS into Q-sweep top-10 depends on `continued_training_potential`, which combines convergence-rate-at-20-batches with extrapolated-MSE-at-1000-batches. Q-sweep's full table (Tier 2) gives the actual 1000-batch outcomes for those 10. ### TIER 4a — P-sweep extrapolation rankings (the top-10 source data) Each Q-sweep entry started as a P-sweep entry. The "P-MSE" column from `group_Q_h2_candidates` shows the MSE that ranked it at 20 batches: | Q-rank | source P config | P-MSE (20 batch) | Q-MSE (1000 batch) | improvement | optimizer regime change | | --- | --- | --- | --- | --- | --- | | 01 | h64_V32_D4_dp1_nx0_lbfgs | 0.053 | 0.00421 | 13× | LBFGS clean at 1000 (post-fix) | | 02 | h64_V32_D4_dp0_nx0_adam | 0.572 | **0.00205** | 279× | **strongest extrapolation** | | 03 | h64_V32_D4_dp0_nx1_adam | 0.584 | 0.00250 | 234× | | | 04 | h64_V32_D4_dp0_nx1_lbfgs | 0.041 | 0.00391 | 10× | LBFGS started low, gained little | | 05 | h64_V16_D4_dp1_nx1_lbfgs | 0.115 | 0.03117 | 4× | V=16 hits H2b ceiling | | 06 | h64_V32_D3_dp1_nx1_adam | 0.656 | 0.02497 | 26× | D=3 P-class | | 07 | h64_V32_D3_dp0_nx1_adam | 0.641 | 0.03151 | 20× | D=3 P-class | | 08 | h64_V32_D4_dp1_nx1_adam | 0.620 | 0.00231 | 268× | h2-64 single-bank arch | | 09 | h64_V32_D3_dp0_nx0_adam | 0.638 | 0.02782 | 23× | D=3 P-class smallest | | 10 | h64_V32_D2_dp0_nx1_adam | 0.736 | 0.16139 | 5× | D=2 — failed geometric validity | **Reading**: Adam configs started P-sweep with high MSE (0.57-0.74) and gained 20-280× by 1000 batches. LBFGS configs started P-sweep with low MSE (0.04-0.12) and gained only 4-13×. This confirms the optimizer regime shift that 000100 logged: **LBFGS dominates at short budgets (≤20 batches), Adam dominates at long budgets (≥500 batches)**. The architectural template (h64_V32_D4 with depth/n_cross combinations) is unchanged across optimizers; only the optimization trajectory differs. ### P-sweep coverage gaps * **9 NaN configs never re-run** with the post-fix trainer (parked open item from 000100). Could surface 9 additional Tier 2 candidates at the LBFGS+depth=1+n_cross=1 architecture frontier. * **V<32 LBFGS coverage** is thin — only Q-rank05 represented from 300 LBFGS configs across V ∈ {2, 4, 8, 16, 32}. The full P-sweep contains 60 LBFGS configs at each V; their ranking among each other is not surfaced into the catalog. * **hidden < 64 LBFGS** coverage is sparse — Q-sweep's top-10 are all hidden=64. A dedicated lower-hidden LBFGS sub-sweep was never run. Test of the natural-axis-count hypothesis: V matched to known polytope vertex counts on S^(D-1) should produce **static** sphere-solver rows (no rotating antipodal frame). | variant | V | D | polytope | predicted | params | | --- | --- | --- | --- | --- | --- | | R_h64_V16_D4_16cell_orthoplex_adam | 16 | 4 | 16-cell (4-orthoplex) | H2-LIKE static | — | | R_h64_V8_D4_8cell_or_16cell_subset_adam | 8 | 4 | 8-cell (tesseract) | H2-LIKE static | — | | R_h64_V20_D3_dodecahedron_adam | 20 | 3 | dodecahedron | H2-LIKE static | — | **Pins**: same H2_linear_matched baseline as Q, Adam @ lr=3e-3, depth=0, n_cross=0, 1000 batches, gaussian-only training, 16-noise per-noise test. **Status**: trained (in `phaseR_reports/` on HF), results not surfaced into the projective-clean catalog yet. Worth probing against the omega-class criterion since natural-axis-count framework predicts they should land cleanly. --- ## TIER 4b — R-sweep polytope packing test (3 configs at 1000 batches) `group_R_packed_polytope_test` from `ablation_configs.py`. Predicted-H2-LIKE configs where V is matched to known polytope vertex counts on S^(D-1). Same H2_linear_matched baseline as P/Q, Adam @ lr=3e-3, 1000 batches, gaussian-only training. Ran 2026-04-24 alongside Q-sweep. | variant | V | D | polytope | predicted | architecturally implies | | --- | --- | --- | --- | --- | --- | | R_h64_V16_D4_16cell_orthoplex_adam | 16 | 4 | 16-cell (4-orthoplex) | H2-LIKE static rows | natural axis count for D=4 = 16 | | R_h64_V8_D4_8cell_or_16cell_subset_adam | 8 | 4 | 8-cell (tesseract) | H2-LIKE static rows | sub-polytope vertex count | | R_h64_V20_D3_dodecahedron_adam | 20 | 3 | dodecahedron | H2-LIKE static rows | natural axis count for D=3 = 20 | **Hypothesis tested**: when V matches a known regular polytope vertex count for S^(D-1), training should produce **static** sphere-solver rows (no antipodal pair rotation). Phil's framing (000100): the 32-row × D=3 G-Class behavior emerged because 32 points cannot be uniformly arranged on S² — geometric frustration. Match V to polytope, frustration disappears. **Status**: trained, weights in `phaseR_reports/` on HF, but **codebooks were never extracted and probed against the projective-clean threshold**. Worth running through `extract_codebook` to surface them into the verified-omega-class tier; the natural-axis-count framework (ft2 §9.1) predicts they should land cleanly. Open item from 000101. ## TIER 5 — Phase S D=5 architecture floor map (1600 configs at 20 batches) | axis | values | count | | --- | --- | --- | | hidden | {4, 8, 16, 32, 64} | 5 | | V | {2, 4, 8, 16, 32} | 5 | | D | {5} | 1 | | depth | {0, 1} | 2 | | n_cross | {0, 1} | 2 | | noise_type | {0..15} | 16 | | optimizer | {Adam} | 1 (LBFGS too slow for sweep) | **Total**: 5 × 5 × 1 × 2 × 2 × 16 × 1 = 1600 runs. **Headline finding (000105)**: cross-noise rank correlation +0.954. Architectures rank near-identically across all 16 noise types — what changes per noise is achievable floor MSE, not which model achieves it. Top-4 universal architectures all hidden=64, V=32. **Top-4 architectures from S analysis** (mean rank across 16 noise types): | rank | architecture | mean rank | | --- | --- | --- | | 1 | h64_V32_dp0_nx1_D5 | 1.1 | | 2 | h64_V32_dp1_nx0_D5 | (close to 1.1) | | 3 | h64_V32_dp1_nx1_D5 | 1.9 | | 5 | h64_V16_dp1_nx1_D5 | (the V<32 entry) | **Note**: the 1391 individual config directories were lost to HF rate-limiting (87% of submitted commits failed). The 1600-config aggregate JSON survived; per-config artifacts mostly did not. Engineering invariant logged (000108): batch-sync uploads from this point forward. --- ## TIER 6 — Phase T D=5 convergence sweep (64 configs at 1000 batches) Top-4 S architectures × 16 noise types, run at A3-reference budget. **The D=5 walk-back (000106) lives here.** | arch | hidden | V | depth | n_cross | optimizer | % projective-clean across 16 noises | | --- | --- | --- | --- | --- | --- | --- | | h64_V16_dp1_nx1 | 64 | 16 | 1 | 1 | Adam | **62% (10/16)** — D=5 sweet spot | | h64_V32_dp0_nx1 | 64 | 32 | 0 | 1 | Adam | 50% (8/16) | | h64_V8_dp1_nx0 | 64 | 8 | 1 | 0 | Adam | 25% (4/16) | | h64_V32_dp1_nx1 | 64 | 32 | 1 | 1 | Adam | 19% (3/16) | **Headline**: 23/64 (~36%) configs converged within ±0.05 of uniform RP⁴ baseline. **V=16 was the geometric sweet spot at D=5, not V=32** — overturning the V=32 universality reading from A3's three runs. **Per-V deviation summary**: | V | mean dev | p25-p75 | in band? | | --- | --- | --- | --- | | 8 | +0.115 | [0.07, 0.18] | No | | **16** | **+0.040** | **[0.01, 0.07]** | **Yes** (only V whose mean lands inside) | | 32 | +0.057 | [0.04, 0.07] | Just outside | **Salt_pepper anomaly** (000106): 100% projective-clean across all 4 archs in Phase T despite being the *worst* noise to reconstruct in S-sweep (best MSE 2.51, ~34× worse than pink). **Geometry decouples from MSE** — a bank that fails to reconstruct can still produce a clean projective codebook. This matters for downstream cross-bank analysis: "the worst-fitting bank" might still produce the most useful projective representation. **Four_quadrant anomaly**: 0% projective-clean across all 4 archs. Spatially structured noise where no architecture in T converged. Open in ft3 §10 as a deeper-probe candidate. --- ## TIER 7 — A-set verification probes (the 19-model count from ft2) These are the projective-codebook verification runs that produced the n=19 count cited in ft2's Section 5 table. **All entries are individual probes** with explicit deviation measurements. | probe | model | D | V | n_axes | pairs | mean projective angle | uniform baseline | dev | result | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | A0 | G-Cand | 3 | 32 | 22 | 10 | 1.011 | 1.015 | **−0.004** | PROJECTIVE-CLEAN | | A1 | H2a | 4 | 32 | 26 | 6 | 1.116 | 1.114 | **+0.002** | PROJECTIVE-CLEAN | | A2 (×16 banks) | h2-64 single-noise (banks 0-15) | 4 | 32 | 24-27 | 5-8 | mean 1.115 | 1.105 | **+0.010 mean, ±0.013** | ALL 16 PROJECTIVE-CLEAN | | A3 (×3 runs) | A3 D=5 (single-arch single-noise) | 5 | 16/32/64 | 16/29/51 | 0/3/13 | varies | varies | -0.015 / +0.016 / +0.019 | PROJECTIVE-CLEAN at A3, **QUALIFIED by Phase T** — generalization fails | **Probe count: 1 (A0) + 1 (A1) + 16 (A2) + 3 (A3) = 21 individual probe runs.** ft2 cited "19 models" because the A3 three runs were treated as one architectural data point. Phase T (000106) re-classified A3 as a single-arch test, leaving 17 projective-clean instances at D=3/4 robustness level. --- ## TIER 8 — Substrate prototype trained models (this week's runs) | HF path | params | D | V | content | regime | result | | --- | --- | --- | --- | --- | --- | --- | | `AbstractPhil/geolip-SVAE/bintree_proto_v1` | 57,215 | 4 | 32 | depth-4 binary tree, i.i.d. Bernoulli ±1, BFS-encoded | **PASSTHROUGH** | best test_mse 3.5e-5 ep 20, 100% bits/trees from ep 1, CV 0.80-1.00, erank 4.00, ratio 1.00 | | `AbstractPhil/geolip-SVAE/sentencepiece_proto_v1` | 57,215 | 4 | 32 | t5-base SP token IDs as 16-bit ±1 floats | **PASSTHROUGH** | best test_mse 5.78e-6 ep 18, 100% bits/tokens from ep 1, α=0.023 throughout, CV 0.85-1.11, erank 4.00, ratio 0.99 | | `AbstractPhil/geolip-SVAE/byte_trigram_proto_v1` | 57,215 | 4 | 32 | UTF-8 byte trigrams as RGB pixels at 256×256 | **ENGAGED** | best test_mse 1.7e-5 ep 19, 83.9% byte / 61.3% trigram from 0% floor, α 0.024→0.043, CV left band ep 6, ratio 1.07, erank 3.9955 dip | The bintree + SP-bit pair establishes the **passthrough control**. Byte-trigram is the **first text-engaged omega-class candidate** but its codebook hasn't been formally probed against the projective threshold yet. Pending follow-up: byte_trigram_proto_128 (img_size decision pending, 100M sample-view target). --- ## TIER 9 — Architectural templates with no measured instance Documented architectures the catalog does not yet cover. Each is omega-class *eligible* under the architecture criterion but lacks a verification probe. | template | arch | params | D | V | status | what's needed | | --- | --- | --- | --- | --- | --- | --- | | Johanna D=16 | PatchSVAE-F | 8.7M (estimate) | 16 | 256 | not yet U5-tested | run extract_codebook on Johanna checkpoint, compute deviation from uniform RP^15 | | Grandmaster omega tokens | concept | — | — | — | paper-level reference | needs trained instance + verification | | `geolip-svae-nosvd-ablation` repo | svd_mode='none' variants | varies | varies | varies | independent repo, omega verification not surfaced into main catalog | inventory the trained checkpoints, run U5 across them | | D=6, D=7, D=8 with V matched to natural axis count | predicted by Phase T framework | — | 6/7/8 | ~22/28/34 (predicted) | not run | sweep with natural-axis-count V matching | --- ## TIER 10 — Explicitly excluded from omega-class | exclusion | reason | source | | --- | --- | --- | | D=2 configs | Cannot form pentachoron (needs ≥5 points), CV undefined | Q-rank 10 | | Q-rank 10 (h64_V32_D2_dp0_nx1_adam) | D=2, MSE 0.16 = essentially failed reconstruction | Q-sweep | | 9 P-sweep NaN configs | LBFGS Hessian-corruption casualties (000099 bug profile) | P-sweep, never re-run | | bintree_proto_v1 | Passthrough regime, codebook not engaged | 000111 | | sentencepiece_proto_v1 | Passthrough regime, cross-attn idle | 000112 | | Phase T D=5 V=32 cells (most) | V over-counted vs natural axis count ~16, fails projective-clean | 000106 | | Phase T D=5 four_quadrant (all 4 archs) | 0% projective-clean, spatial-structured noise | 000106 | | ablation Group H, M, L SVD-removal variants | Spectrum-degenerate; not sphere-solvers in proper sense | Phase 1 ablation | --- ## Smallest-instance benchmarks across the catalog For when minimal-parameter operation matters: | tier | smallest config | params | D | V | MSE | regime | use case | | --- | --- | --- | --- | --- | --- | --- | --- | | absolute smallest projective-clean | **Q-rank09 (P-class)** | **28,899** | 3 | 32 | 0.02782 | LOW-band projective-clean on RP² | minimum-parameter omega | | smallest H2a (canonical sphere-solver) | **Q-rank02** | **40,227** | 4 | 32 | 0.00205 | HIGH-band sphere-solver on RP³ | canonical sphere-solver baseline | | Phase T D=5 V=16 sweet spot | h64_V16_dp1_nx1_D5 | ~36,607 | 5 | 16 | varies | D=5 partial projective-clean | D=5 representative | | h2-64 single bank (production) | bank_idx 0..63 | 57,215 | 4 | 32 | varies | per-noise sphere-solver | bank-level training composition | The 28,899-param P-class candidate (Q-rank09) is the absolute floor for projective-clean. The 40,227-param H2a (Q-rank02) is the floor for canonical sphere-solver behavior at D=4. Under 30K and under 41K respectively. --- ## What's missing from this catalog 1. **byte_trigram_proto_v1 codebook investigation** — engagement signature confirmed in trajectory, but `extract_codebook` against the trained checkpoint hasn't been run for the formal projective-clean verification. Tier 1 entry should move to "verified" once this is done. 2. **byte_trigram_proto_128** — pending img_size=64-vs-128 decision + 100M-sample-view run completion. 3. **9 P-sweep NaN re-runs** — never executed with the LBFGS-fixed trainer (parked open item from 000100). Could surface 9 additional Tier 2 candidates. 4. **R-sweep results probed against projective-clean criterion** — the polytope-packing predictions (16-cell, 8-cell, dodecahedron) trained but their codebooks weren't surfaced into the catalog. 5. **Johanna D=16 verification** — the only large-D representative of the noise-substrate line; not yet probed. 6. **Cross-substrate kNN graph** — bintree, SP-bit, byte-trigram, h2-64-noise codebook-similarity matrix. The "what survives the universal-substrate-hope death" finding (000111) requires this measurement. 7. **Disproof candidates for Omega** — the methodological pivot from 000108 demands negative-result candidates the catalog doesn't yet contain (non-spherical bottleneck variants, no-spatial-coherence content, byte-misaligned content).