Qwen3.6 AEON RYS 15/20 + SignalLatch

What this is Claim boundaries Runtime profile Evidence Known tradeoffs Files

What this is

This is a Qwen3.6-27B AEON-derived model line built around a specific RYS branch: duplicate the source layer window 15..19 and insert that copy after source layer 19. The resulting logical stack has 69 layers instead of the original 64.

The non-finetuned release is the practical base: an AEON RYS 15/20 model exported to a custom ik-llama-compatible IQ4_NL GGUF. SignalLatch is the behavior-finetuned release: checkpoint-386 LoRA merged into that RYS base at strength 0.10, then exported as the same practical GGUF deployment target.

The main project claim is narrow: in our tested custom runtime path, this RYS branch made the Q4_NL deployment much more resistant to reasoning/coding degradation than earlier non-RYS quant attempts, and SignalLatch improved practical coding-agent behavior on top of that base.

Technical deployment path from AEON base to RYS, SignalLatch merge, GGUF export, and ik-llama serving. — The release path is a chain: source model, RYS checkpoint transformation, quantized GGUF, custom runtime, then behavior fine-tune for SignalLatch.

What is claimed, and what is not

Claimed

The released RYS IQ4_NL artifact stayed close to the RYS BF16 source on the mixed probe snapshot: 0.7299 to 0.7244, or about -0.75% relative.

Claimed

SignalLatch is trained around a Review -> Align -> Latch -> Repair -> Confirm loop for coding agents: scoped context review, goal alignment, waiting for concrete tool signals, targeted repair, and focused validation.

Not claimed

This is not a stock llama.cpp drop, not a general leaderboard claim, and not proof that every RYS window helps every model.

Not claimed

This is not memory-optimal RYS. The current public GGUF is materialized RYS, not a runtime-aliased representation.

Required runtime

Use the custom AEON ik-llama fork. The GGUFs in this release line are intended for that runtime path, especially for graph split, Qwen3.6/Qwen3.5 hybrid handling, Jinja chat formatting, and DeepSeek-style reasoning extraction.

github.com/noonr48/qwen36-aeon-ik-llama

./build/bin/llama-server \
  -m /path/to/Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf \
  -c 65536 \
  -ngl 999 \
  -np 1 \
  -fa on \
  -sm graph \
  --temp 0.7 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget 0 \
  -cram 0 \
  --ctx-checkpoints 0

For long-context use, increase context as your system allows. The default KV type is f16; FP16/default KV has been tested to about 160k context without showing the earlier failure pattern.

# Long-context shape
-c 131072

# Conservative validation shape used for the canvas comparison
-c 131072 -ctk f32 -ctv f32

Practical single-GPU deployment: the SignalLatch Q4_NL release is small enough for practical use on a single RTX 3090 / 24 GB-class card. In an observed reference profile, roughly 160k context with default/FP16 KV fit at about 20.3 GiB total VRAM. Treat this as a deployment reference point, not a guaranteed memory benchmark.

Why `--ctx-checkpoints 0`?

This flag keeps recurrent context checkpointing disabled explicitly. The current fork also guards against unstable recurrent checkpoints under graph split, but the public command keeps the tested no-checkpoint runtime profile visible.

Why FP32 KV in the canvas run?

It was a conservative validation setting to isolate model/task behavior from KV precision. It should not be read as “FP16 KV is bad.” For normal practical use, start with default/FP16 KV and adjust based on your memory and quality needs.

Evidence summary

The model cards carry the compact tables. This page collects the readout in one place so the release does not depend on ad hoc author replies.

Base RYS compression result

The following table describes the non-finetuned AEON RYS 15/20 base artifact only. It is not a SignalLatch fine-tune score.

Probe	RYS BF16	Released RYS IQ4_NL
mixed 4-probe mean	0.7299	0.7244
math_16	0.8421	0.7897
eq_16	0.7123	0.7111
math_4	0.4851	0.5170
gsm8k_5	0.8800	0.8800

Snapshot change: -0.0055 absolute, about -0.75% relative. This is not a broad benchmark suite; it is a release sanity snapshot for the tested deployment path.

SignalLatch fine-tune result

The fine-tune claim comes from the merged ckpt386 Q4_NL practical coding-agent matrix, not from the base BF16-to-Q4_NL compression table above.

Comparison	Strict pass	Mean	Timeout-like	Scope
Previous AEON RYS Q4_NL	1/5	0.550	4	first deploy-format run
SignalLatch ckpt386 `s0.10` IQ4_NL	4/5	0.950	0	first deploy-format run
SignalLatch ckpt386 `s0.10` repeat stability	9/15	0.842	n/a	three runs
SignalLatch ckpt386 `s0.10` crash-adjusted	9/14	0.884	n/a	excludes one invalid server-crash row

For the complete fine-tune method, strength sweep, task rows, and caveats, read the SignalLatch fine-tune record.

Practical canvas-agent comparison

This canvas comparison is not evidence that SignalLatch beats Unsloth IQ4_NL. SignalLatch IQ4_NL and Unsloth IQ4_NL both completed cleanly with verifier 1.0 and rc=0; the useful SignalLatch read is that it completed cleanly where the non-finetuned AEON RYS base needed a retry.

Run	Result	Read
SignalLatch IQ4_NL	`rc=0`, verifier `1.0`	Clean completion on the isolated canvas app task.
AEON RYS IQ4_NL attempt 1	`rc=1`, verifier `0.0417`	Invalid tool/diff failure before usable files.
AEON RYS IQ4_NL retry 1	`rc=0`, verifier `1.0`	Clean retry, showing the base can complete the task but was less reliable in the first attempt.
unsloth/Qwen3.6-27B-GGUF IQ4_NL	`rc=0`, verifier `1.0`	Clean external comparison pass.
unsloth/Qwen3.6-27B-GGUF Q8_0	`rc=124`, verifier `1.0`	Files passed verification but the agent timed out during final wrap-up.

Exploratory frontend-design observation

A separate Open Design-style workstation prompt produced a dense SLOANE CLI Agent interface concept across desktop, artifact-open desktop, and mobile views. These were one-shot UI generations: the model was instructed to inspect the existing code and intended function, understand the workflow, and produce a frontend shell. There was no manual design iteration beyond that instruction.

This was not run as a scored benchmark, but it is a useful qualitative note: the output preserved a clear command layer, attached-context rail, evidence table, inspector panel, artifact dock, and responsive mobile hierarchy without turning the interface into a generic marketing layout.

Sanitized first-shot SignalLatch workstation UI showing attached context, command lifecycle, evidence, artifact dock, and inspector panels. — Selected first-shot UI example from the artifact-open desktop output. The local footer was cropped before publication.

Read this as an early design-oriented observation only. The stronger SignalLatch claim remains coding-agent discipline and completion behavior; frontend/UI taste should get its own repeatable test suite before it becomes a headline claim.

Scoreboard comparing base Q4_NL and SignalLatch merge strengths. — SignalLatch strength selection favored s0.10 because it was less flashy than stronger first runs but more deployable across repeats.

Testing ladder showing direct adapter tests, merged GGUF tests, strength sweeps, and practical validation. — The testing process moved from adapter probes to merged GGUF runs, then practical coding-agent tasks.

Known tradeoffs

Materialized RYS weights

The current release uses materialized RYS. The copied 15,20 window is exported as ordinary GGUF tensors, so output layers 20..24 have their own copied tensors from source layers 15..19.

This keeps the HF checkpoint, GGUF conversion, quantization, and downstream fine-tune workflow explicit and stable. It also means the copied weights are loaded like normal weights.

Known optimization: a procedural or runtime-aliased RYS implementation could reuse source-layer weight buffers and reduce duplicate model-weight memory. That is future runtime work, not the current tested release.

Practical impact in the released Q4_NL

The copied window is about 0.994 GiB, around 6.45% of the GGUF tensor bytes.

Runtime aliasing would not remove the extra compute or KV-cache cost from the five additional logical layers. At 131072 context, those five layers add about 2.5 GiB FP16 KV or 5.0 GiB FP32 KV.

The current artifact remains materialized because that is the version tested and released.

How the release was selected

RYS branch selection

Several candidate windows were tested. The public release centered on 15,20 because it held up as the practical IQ4_NL release target.

Custom runtime path

The fork added the Qwen3.6/Qwen3.5 hybrid handling, graph-split serving fixes, Jinja and DeepSeek reasoning-format support, and runtime guardrails needed for this line.

Quantized deployment target

The project target is the compact IQ4_NL GGUF, not the largest possible BF16 artifact. The fine-tuned BF16 GGUF exists as a source-quality artifact for inspection, conversion, and downstream work.

SignalLatch behavior merge

The selected fine-tune is checkpoint 386 at merge strength 0.10, chosen because repeated practical checks favored its stability over stronger but less reproducible strengths.

Which file should you use?

SignalLatch practical default

Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-IQ4_NL.gguf

Use this when you want the behavior-finetuned coding-agent profile.

Non-finetuned RYS default

Qwen3.6-27B-AEON-RYS-MaxThinkCoder-IQ4_NL-ik-llama-custom-mixed.gguf

Use this when you want the base RYS Q4_NL deployment without SignalLatch behavior tuning.

BF16 source-quality artifact

Qwen3.6-27B-AEON-RYS-SignalLatch-ckpt386-s010-BF16.gguf

Use this source-quality fine-tuned GGUF for exploration, inspection, conversion, training, and downstream work. It is not the normal inference target.

FAQ

Can I run this in stock llama.cpp?

No. This release line requires the custom AEON ik-llama fork built for this RYS model. Stock llama.cpp and default upstream ik-llama are not the supported runtime path. The custom fork includes the model-specific RYS/Qwen3.6 handling, graph-split work, Jinja/DeepSeek formatting support, and speed refinements used for the tested release.

Does SignalLatch make the model solved?

No. It improved practical coding-agent reliability in the tested sweep, but variance remains. The claim is practical and bounded.

Where did the RYS idea come from?

For broader RYS research context, see David Noel Ng's article LLM Neuroanatomy II. This page documents this release line specifically.

Qwen3.6 AEON RYS 15/20 and SignalLatch

Contents

What this is

What is claimed, and what is not

Claimed

Claimed

Not claimed

Not claimed

Required runtime

Why `--ctx-checkpoints 0`?

Why FP32 KV in the canvas run?

Evidence summary

Base RYS compression result

SignalLatch fine-tune result

Practical canvas-agent comparison

Exploratory frontend-design observation

Known tradeoffs

Materialized RYS weights

Practical impact in the released Q4_NL

How the release was selected

RYS branch selection

Custom runtime path

Quantized deployment target

SignalLatch behavior merge

Which file should you use?

SignalLatch practical default

Non-finetuned RYS default

BF16 source-quality artifact

FAQ

Can I run this in stock llama.cpp?

Does SignalLatch make the model solved?

Where did the RYS idea come from?

Qwen3.6 AEON RYS 15/20 and SignalLatch

Contents

What this is

What is claimed, and what is not

Claimed

Claimed

Not claimed

Not claimed

Required runtime

Why --ctx-checkpoints 0?

Why FP32 KV in the canvas run?

Evidence summary

Base RYS compression result

SignalLatch fine-tune result

Practical canvas-agent comparison

Exploratory frontend-design observation

Known tradeoffs

Materialized RYS weights

Practical impact in the released Q4_NL

How the release was selected

RYS branch selection

Custom runtime path

Quantized deployment target

SignalLatch behavior merge

Which file should you use?

SignalLatch practical default

Non-finetuned RYS default

BF16 source-quality artifact

FAQ

Can I run this in stock llama.cpp?

Does SignalLatch make the model solved?

Where did the RYS idea come from?

Why `--ctx-checkpoints 0`?