NeurIPS 2026 Submission arXiv Preprint ↗

Solver-Aware Decompositions
for Programming by Example

When Dividing Requires Knowing How to Conquer - aligning decomposition structure with a bounded synthesizer's tractable distribution, rather than imitating annotator factorisation choices.

Decomposition Quality Is Solver-Relative

Decomposition-based PBE scales synthesis by splitting tasks into subtasks - a decomposer predicts intermediate subgoals, and a synthesizer generates programs conditioned on them. Standard training optimises the decomposer to imitate ground-truth (GT) subgoals, treating decomposition quality as intrinsic to the task structure.

We argue this framing is incomplete. A bounded synthesizer develops its own search dynamics and inductive biases - it does not become a faithful replica of the annotator's decomposition choices. Subgoals that are structurally valid can fall outside the synthesizer's tractable distribution, causing synthesis failures that imitation-only training cannot correct.

Core Insight

Decomposition quality is solver-relative, not intrinsic to the task. The same intermediate representation that is tractable for one solver may be search-intractable for another. Training must account for the specific synthesizer that will act on the subgoals.

We propose Solver-Aware Decomposition (SAD), which retains supervised training on GT subgoals as a structural scaffold while additionally optimising the decomposer via direct feedback from a frozen synthesizer. Subgoals are rewarded based on the synthesizer's loss on the target program - a dense signal of subtask difficulty that steers the decomposer toward the solver's tractable region.

+22%
relative gain on Deepcoder length generalisation
SAD vs. solver-blind baseline
2-3×
higher GT alignment from solver-blind training
yet consistently lower synthesis accuracy

Solver-Aware Decomposition (SAD)

SAD treats decomposition as a solver-dependent decision problem: intermediate representations should be optimized for downstream usability under a fixed solver, rather than for similarity to GT structure.

LEARNABLE
Decomposer πθ
SAMPLED
Osampled ~ πθ
FROZEN
Synthesizer
REWARD
–CE(Synth, p*)
Greedy decomposition Ogreedy serves as the self-critical baseline for SCST
Advantage A = R(Osampled) − R(Ogreedy) captures relative tractability
CE loss against GT program provides dense signal — partial correctness still rewarded

The training objective combines a supervised scaffold with the RL signal and an entropy bonus to prevent collapse:

L = Lsup + LRL − λ H(πθ),   λ = 0.01
LRL = −EO ~ πθ[ A · log πθ(O | x) ]
R(O) = −CE( Synth(I, O), p* )   A = R(Osampled) − R(Ogreedy)
Lsup

Structural Scaffold

Supervised imitation of GT subgoals keeps the decomposer within a meaningful decomposition space. Without it, the policy collapses to degenerate outputs and task accuracy falls below 20%.

LRL

Solver Alignment

SCST feedback from the frozen synthesizer steers the decomposer toward subgoals in the solver's tractable distribution - including subgoals the annotator never explicitly specified.

H

Entropy Bonus

Prevents premature convergence to deterministic decomposition patterns, preserving the diversity needed to discover solver-tractable alternative paths.

Results

SAD consistently outperforms the solver-blind baseline across both domains and evaluation settings. All SAD vs. solver-blind differences are statistically significant (paired t-test, p < 0.05). SAD also improves over ExeDec in both domains.

Task Accuracy - Deepcoder
SAD (in-dist.)
74%
Solver-blind (in-dist.)
67%
SAD (length gen.)
41%
Solver-blind (length gen.)
33%
Task Accuracy - Lambdabeam
SAD (in-dist.)
65%
Solver-blind (in-dist.)
59%
SAD (length gen.)
32%
Solver-blind (length gen.)
28%

The Accuracy Paradox

A key finding is that higher GT decomposition alignment does not translate into synthesis success. SAD achieves consistently higher synthesis accuracy despite matching GT subgoals 2–3× less often than the solver-blind baseline - confirming that GT alignment is a poor proxy for solver utility.

Domain Method Decomp. Acc. (in-dist.) Decomp. Acc. (len. gen.) Synth. Acc. (in-dist.) Synth. Acc. (len. gen.)
Deepcoder SAD 31.7 ± 0.7 21.5 ± 0.5 87.2 ± 1.3 79.8 ± 1.1
Solver-blind 79.2 ± 0.4 63.3 ± 0.6 73.0 ± 1.2 64.3 ± 0.8
Lambdabeam SAD 33.7 ± 0.3 21.4 ± 0.3 61.9 ± 1.0 49.0 ± 1.2
Solver-blind 69.9 ± 0.7 48.5 ± 0.7 54.2 ± 0.5 41.3 ± 0.5

Compounding Gains

The performance gap grows with program length despite SAD being trained on single-step tasks only. Solver alignment compounds across decomposition steps.

Coverage, Not Ranking

Beam oracle analysis confirms the gap is generative - the solver-blind model fails to propose tractable decompositions at all, not merely rank existing ones poorly.

Robustfill Null Result

Where decomposition order is structurally fixed and ambiguity is absent, SAD matches the solver-blind baseline exactly - a designed falsifiability check that confirms the mechanism.

Solver-Relative Diversity

SAD solves a consistent subset of tasks that the GT decomposition oracle cannot - using genuinely alternative programs (fewer than 2% overlap with GT solutions, differing systematically in length and DSL primitive usage). This confirms that decomposition quality is solver-relative: the synthesizer's inductive bias, not the annotator's choices, should determine what constitutes a good intermediate subgoal.

Evaluation Domains

Three standard PBE domains, chosen to vary the key structural property: the degree of valid decomposition ambiguity available for solver-aware training to exploit.

List Domain

Deepcoder

Integer list transformations via a DSL of first- and higher-order functions. Intermediate outputs accumulate as named variables - multiple valid programs and decompositions exist per task, making solver preferences over this ambiguity non-trivial.

Extended List

Lambdabeam

Extends Deepcoder with dynamically constructed lambda functions and a conditional If-operation. A larger, less constrained search space amplifies the cost of solver misalignment at every step.

Falsifiability Check

Robustfill

String transformations with a residual state update that structurally fixes decomposition order. Any deviation from the correct sequential prefix propagates failure irrecoverably - designed to produce a null result that validates the causal claim.

Setup

All models share the same Transformer architecture (3 layers, embedding dim. 512, hidden dim. 1024), beam size 10, 5 random seeds, 1 000 test tasks per seed. Evaluated under in-distribution and length generalisation (trained on lengths 1–4, tested on length 5) settings. Training on NVIDIA RTX A6000; SAD adds ~30% wall-clock time over solver-blind training due to two additional frozen synthesizer forward passes per update step.

What We Contribute

1

Solver-Aware Decomposition (SAD)

A training framework combining supervised GT subgoal imitation as a structural scaffold with SCST using direct feedback from a frozen synthesizer. The reward - synthesizer cross-entropy against the GT program - provides a dense signal that steers decompositions toward the solver's tractable distribution.

2

The Accuracy Paradox

Empirical demonstration that higher GT decomposition alignment does not translate into synthesis success. SAD achieves higher synthesis and task accuracy in every setting despite matching GT subgoals 2–3× less often - and solves tasks that a GT decomposition oracle cannot.

3

Decomposition as a Solver-Dependent Decision Problem

A conceptual reframing with implications beyond PBE: any system that learns intermediate representations independently of the bounded component that executes them faces the same structural misalignment. Closing the feedback loop is a general strategy for this class of problem.

Limitations

SAD optimises the decomposer for a specific frozen synthesizer - retraining is required when the synthesizer changes.
The single-step training constraint fixes decomposition granularity; extending SAD to multi-step reward settings remains important future work.
SAD requires GT programs at training - standard in PBE but not universally available. Execution-based rewards would be needed for specification-only settings.

Citation

This paper is an anonymous submission under review. A preprint is available on arXiv.

@article{sad2025, title = {Solver-Aware Decompositions for Programming-by-Example: When Dividing Requires Knowing how to Conquer}, author = {Janis Zenkner, Tobias Sesterhenn, Tim Grams, Christian Bartelt}, journal = {arXiv preprint}, year = {2025}, url = {https://arxiv.org/abs/XXXX.XXXXX} }