k-tuplet-search

CUDA CLI Reference

This is the long-form reference for src/cuda/kt_filter_v8, the production GPU engine. The built-in --help output is intentionally compact; this file explains the operational meaning of each public option.

Build first:

cd src/cuda
make kt_filter_v8

Then run commands from src/cuda/ unless the example says otherwise.

Discovery Commands

Option	Meaning
`--help`, `-h`	Print compact CLI help and exit.
`--version`, `-V`	Print binary name, build SHA, and compile date.
`--list-patterns`	Print the compiled admissible-pattern catalog and exit.
`--test`	Run the embedded CUDA self-test suite.
`--smoke`	Run a short one-batch sanity check and exit when counters prove the kernel ran.

`--list-patterns`

Use this to discover legal --pattern names:

./kt_filter_v8 --list-patterns

Output columns are:

Column	Meaning
pattern name	Name passed to `--pattern`, for example `KT19_P0`.
`k`	Tuple length.
diameter	Last offset in the pattern.
offsets	Additive shifts `{b_i}`; a hit proves all `n + b_i` prime.

The list is sorted by pattern name and currently contains 97 CUDA-engine patterns across k=3..28. The catalog is generated from tools/patterns/catalog/*.json; use Norman Luhn’s https://pzktupel.de/ktpatt_hl.php page as the mathematical reference before making external claims.

Search Identity

Option	Meaning
`--pattern NAME`	Select an exact compiled pattern, e.g. `KT19_P0`. Prefer this for real searches.
`--k N`, `--target N`	Select tuple length. If no pattern is given, the engine chooses the first matching pattern for that `k`; this is mainly a compatibility path.
`--bits N`	Search bases in the bit band `[2^(N-1), 2^N)`. Required for normal search.
`--primorial N`	Select the wheel by upper-prime index in the built-in prime table. Default is `11`, meaning `37#`.
`--wheel-expr X#[/Y...]`	Select a structural wheel expression. Examples: `47#`, `47#/31`, `47#/17/31`. Overrides `--primorial`.
`--threads N`	Parsed for CPU-CLI compatibility. The current GPU survivor-prove path is effectively serial; do not use this as a performance knob.

Useful primorial values:

Flag	Wheel	Typical use
`--primorial 11`	`37#`	Default; fast wheel build, useful for replay and lower-memory checks.
`--primorial 12`	`41#`	Intermediate wheel for experiments.
`--primorial 13`	`43#`	Recommended for `KT19_P0` and related `k=19/21` examples.
`--primorial 14`	`47#`	Recommended for `k=20`, `k=21_P0`, and exploratory `k>=22` examples.

Values below the implementation floor or above 14 are clamped with a warning. Larger wheels reduce candidate density but take more startup time and memory.

Wheel expressions use the built-in prime catalog up to 47. X# means all primes up to X; /Y drops a smaller prime from that wheel. For example, 47#/17/31 builds the Stage-0 wheel from all primes up to 47, except 17 and 31. Use 43#, not 47#/47, for the lower plain wheel.

Dropped primes are not lost as correctness checks: they are simply no longer part of Stage-0. Dropped primes that also appear in the L2 sieve are still caught there; with the current 47# ceiling that means 41 or 43. Dropped primes <=37 bypass the GPU sieve stages and are rejected only by Fermat-2 and/or host GMP/BPSW proving, which is correct but more expensive. This is useful for non-plain-primorial campaign cells and future wheel-pool runners, but small-prime drops can cause a throughput cliff.

Grammar and validation:

Form	Meaning
`X#`	Plain primorial wheel through prime `X`; equivalent to the matching `--primorial` index.
`X#/Y`	Wheel through `X`, with prime `Y` omitted from Stage-0.
`X#/Y/Z/...`	Wheel through `X`, with multiple distinct primes omitted from Stage-0.

X and every dropped prime must be present in the built-in wheel catalog through 47. Drops must be distinct and strictly smaller than X; drops outside X# and dropping X itself are rejected. Expressions that would leave an empty wheel or overflow the implementation modulus are rejected before search starts. The parser normalizes drop order in logs and checkpoints; for example 47#/31/17 is stored as 47#/17/31, so resume works with either input order.

Only one wheel expression is active in a process. To cover a pool such as 47#/31, 47#/29, and 47#/23, run separate processes, typically pinned to separate GPUs with --gpu-device.

Memory and startup time depend on the pattern-specific admissible residue count. For a new expression, run a bounded smoke first:

./kt_filter_v8 --pattern KT19_P0 --bits 99 --wheel-expr '47#/31' \
    --smoke --max-batches 1 --gpu-device 0

The startup log prints the predicted peak allocation and final Stage-0 count:

[wheel] KT19_P0 @ 47#/31 streaming: predicted peak alloc = 2220.539 MiB
Stage 0: 47#/31 wheel admissibility, n_primes=14, n_admissible=281014272

Search Modes

Option	Meaning
`--sequential`	Compatibility flag for the default sequential search mode.
`--random`	Alias for `--prefix-mode random`; randomizes chunk anchors for broad sampling.
`--prefix 0bXXX`	Restrict search to bases whose high bits match the binary prefix.
`--prefix-mode sequential`	Walk the selected range deterministically.
`--prefix-mode random`	Draw deterministic or random chunk anchors inside the selected range.
`--random-seed HEX`, `--seed HEX`	Set the random-mode seed explicitly. Use this for reproducible runs.
`--chunk-tiles N`	Number of tiles per random anchor interval/chunk. Default is `500` in random mode.
`--prefix-lanes N`	Split one prefix/range into `N` non-overlapping lanes for multi-GPU coverage.
`--prefix-lane-id ID`	Select this process’s lane, from `0` to `N-1`.
`--exhaustive`	Sequentially walk the selected range and print a `PREFIX EXHAUSTED` banner when the lane completes.

Use --random for open-ended sampling. Use --exhaustive only when the range is intentionally small enough to finish or when you are running a controlled prefix-lane coverage job.

Run Limits and Resume

Option	Meaning
`--max-time SEC`	Stop after this wall-clock budget.
`--max-batches N`	Stop after this many GPU batches. Useful for smoke tests.
`--checkpoint FILE`	Periodically write an atomic checkpoint with cursor, counters, identity, and seed.
`--resume [FILE]`	Resume from a checkpoint. If no file follows `--resume`, the engine uses `--checkpoint FILE`.
`--ckpt-interval N`	Checkpoint interval in seconds. Default is `60`; minimum is clamped to `1`.

Checkpoint identity must match the current command’s important search identity fields: pattern, bits, prefix, lane count, lane id, normalized wheel expression, and compatible mode/seed. Mismatches print a clear message and start fresh rather than silently resuming the wrong search.

Output and Reporting

Option	Meaning
`--output FILE`	Append human-readable found-record lines to `FILE`.
`--log-file FILE`	Alias for `--output`.
`--quiet`	Reduce nonessential output.
`--full-quiet`	Suppress progress reporter output; useful for scripts.
`--report N`	Set progress report interval in seconds.
`--report-interval-sec N`	Same as `--report`; `0` disables periodic reports.
`--bench-jsonl FILE`	In `--validate-known`, emit one JSON row per replay record.

Confirmed finds are always printed to stdout and persisted as novel_records_gpu<N>.jsonl in the working directory, where N is the CUDA device index. Novel-record JSONL includes wheel_expr and primorial_n so a find can be attributed to the exact wheel campaign without cross-referencing stdout. Files under bench/ are generated runtime artifacts and are ignored by Git.

Known-Record Replay and KPI Modes

Option	Meaning
`--validate-known [k]`	Replay known records from `known/records.json` / `tools/records_manifest.tsv`. With no `k`, defaults to the fast replay scope.
`--validate-known --k N`	Equivalent way to select the replay tuple length.
`--validate-known-require-coverage`	Fail if records are skipped because the selected wheel cannot cover them under current limits. Use this when validating a production primorial.
`--validate-per-record-budget SEC`	Override the default 60-second per-record replay budget. Also available through `KT_VALIDATE_PER_RECORD_BUDGET_SEC`.
`--kpi-target-base DEC`	KPI harness mode: stop when the specified base is emitted. Mutually exclusive with `--validate-known`.

For correctness checks, prefer --validate-known first. For production-path validation after engine changes, use longevity_gpu/scripts/validate_records_external.sh; that script exercises both sequential and random prefix modes against known records.

GPU Controls

Option	Meaning
`--gpu-device N`	Select CUDA device. Default is `0`.
`--gpu-batch-size N`	Requested threads per kernel launch. Default is `524288`; RTX 5090 runs generally use larger values such as `2097152`.
`--gpu-streams N`	Concurrent CUDA streams. Default is `3`; maximum is `8`.
`--gpu-arch sm_NN`	Informational label only. Actual architecture is chosen at compile time by `NVCC_ARCH`.

Stage Gates

These are mostly for debugging and A/B checks. Normal searches should leave all stages enabled.

Option	Meaning
`--no-stage-l2`	Disable the first post-wheel forbidden-residue stage.
`--no-stage-ext-l2`	Disable the extended L2 forbidden-residue stage.
`--no-stage-line`	Disable the line-sieve stage.
`--no-stage-fermat`	Disable the Fermat-2 prefilter before host proving.

The normal cascade is:

wheel -> L2 -> ext-L2 -> line-sieve -> Fermat-2 -> host GMP/BPSW prove

Observability and Experimental Flags

These flags are default-off instrumentation or host-side validators. They are useful for diagnostics but should not be treated as performance improvements.

Option	Meaning
`--enable-f18-yield`	Write per `(line_prime, residue)` yield counters to `bench/yield_counters_<bin>_<pid>.jsonl`.
`--enable-f22-reservoir`	Write a 1000-slot post-Fermat survivor reservoir to `bench/reservoir_<bin>_<pid>.jsonl`.
`--enable-f23-texture`	Write a 128-slot survivor texture reservoir to `bench/texture_<bin>_<pid>.jsonl`.
`--enable-f24-cascade`	Enumerate and score k-2 sub-tuplets of the active pattern at startup; host-side diagnostic.
`--enable-f1-validator`	Validate baked KT19_P0/37# masks against runtime-generated masks, then exit. This is a correctness check, not a hot-kernel optimization.
`--enable-f2-rcu`	Enable dual-bank constant-memory mirror infrastructure for RCU-style filter-table updates; currently infrastructure/diagnostic scope.

Compatibility and Debug Flags

The following CPU flags are accepted so wrapper scripts can share arguments between CPU and GPU binaries. They are echoed or stored but do not change the GPU hot path:

--no-line-sieve
--no-bitvec
--bitvec
--opt-fermat
--no-opt-fermat
--opt-mont-fermat
--no-opt-mont-fermat
--opt-prefetch
--no-opt-prefetch
--opt-bitscan
--no-opt-bitscan
--opt-line-cap N
--pin
--pin-base N
--sieve-only

Debug-only options:

Option	Meaning
`--verbose-rotation`	Print high-volume random-anchor rotation diagnostics.
`--inject-cursor-offset HEX`	Misalignment-injection regression-test hook. Do not use for normal searches.

Unknown options are rejected with a non-zero exit code.

This site is open source. Improve this page.