SAD - Solver-Aware Decompositions for PBE

Overview

Decomposition Quality Is Solver-Relative

Decomposition-based PBE scales synthesis by splitting tasks into subtasks - a decomposer predicts intermediate subgoals, and a synthesizer generates programs conditioned on them. Standard training optimises the decomposer to imitate ground-truth (GT) subgoals, treating decomposition quality as intrinsic to the task structure.

We argue this framing is incomplete. A bounded synthesizer develops its own search dynamics and inductive biases - it does not become a faithful replica of the annotator's decomposition choices. Subgoals that are structurally valid can fall outside the synthesizer's tractable distribution, causing synthesis failures that imitation-only training cannot correct.

Core Insight

Decomposition quality is solver-relative, not intrinsic to the task. The same intermediate representation that is tractable for one solver may be search-intractable for another. Training must account for the specific synthesizer that will act on the subgoals.

We propose Solver-Aware Decomposition (SAD), which retains supervised training on GT subgoals as a structural scaffold while additionally optimising the decomposer via direct feedback from a frozen synthesizer. Subgoals are rewarded based on the synthesizer's loss on the target program - a dense signal of subtask difficulty that steers the decomposer toward the solver's tractable region.

+22%

relative gain on Deepcoder length generalisation

SAD vs. solver-blind baseline

2-3×

higher GT alignment from solver-blind training

yet consistently lower synthesis accuracy

APPROACH

Solver-Aware Decomposition (SAD)

SAD treats decomposition as a solver-dependent decision problem: intermediate representations should be optimized for downstream usability under a fixed solver, rather than for similarity to GT structure.

LEARNABLE

Decomposer π_θ

→

SAMPLED

O_sampled ~ π_θ

→

FROZEN

Synthesizer

→

REWARD

–CE(Synth, p*)

Greedy decomposition O_greedy serves as the self-critical baseline for SCST

Advantage A = R(O_sampled) − R(O_greedy) captures relative tractability

CE loss against GT program provides dense signal — partial correctness still rewarded

The training objective combines a supervised scaffold with the RL signal and an entropy bonus to prevent collapse:

L = L_sup + L_RL − λ H(π_θ), λ = 0.01
L_RL = −E_{O ~ π_θ}[ A · log π_θ(O | x) ]
R(O) = −CE( Synth(I, O), p* ) A = R(O_sampled) − R(O_greedy)

L_sup

Structural Scaffold

Supervised imitation of GT subgoals keeps the decomposer within a meaningful decomposition space. Without it, the policy collapses to degenerate outputs and task accuracy falls below 20%.

L_RL

Solver Alignment

SCST feedback from the frozen synthesizer steers the decomposer toward subgoals in the solver's tractable distribution - including subgoals the annotator never explicitly specified.

Entropy Bonus

Prevents premature convergence to deterministic decomposition patterns, preserving the diversity needed to discover solver-tractable alternative paths.

Evaluation

Results

SAD consistently outperforms the solver-blind baseline across both domains and evaluation settings. All SAD vs. solver-blind differences are statistically significant (paired t-test, p < 0.05). SAD also improves over ExeDec in both domains.

Task Accuracy - Deepcoder

SAD (in-dist.)

74%

Solver-blind (in-dist.)

SAD (length gen.)

41%

Solver-blind (length gen.)

Task Accuracy - Lambdabeam

SAD (in-dist.)

65%

Solver-blind (in-dist.)

SAD (length gen.)

32%

Solver-blind (length gen.)

The Accuracy Paradox

A key finding is that higher GT decomposition alignment does not translate into synthesis success. SAD achieves consistently higher synthesis accuracy despite matching GT subgoals 2–3× less often than the solver-blind baseline - confirming that GT alignment is a poor proxy for solver utility.

Domain	Method	Decomp. Acc. (in-dist.)	Decomp. Acc. (len. gen.)	Synth. Acc. (in-dist.)	Synth. Acc. (len. gen.)
Deepcoder	SAD	31.7 ± 0.7	21.5 ± 0.5	87.2 ± 1.3	79.8 ± 1.1
Deepcoder	Solver-blind	79.2 ± 0.4	63.3 ± 0.6	73.0 ± 1.2	64.3 ± 0.8
Lambdabeam	SAD	33.7 ± 0.3	21.4 ± 0.3	61.9 ± 1.0	49.0 ± 1.2
Lambdabeam	Solver-blind	69.9 ± 0.7	48.5 ± 0.7	54.2 ± 0.5	41.3 ± 0.5

↑

Compounding Gains

The performance gap grows with program length despite SAD being trained on single-step tasks only. Solver alignment compounds across decomposition steps.

⊕

Coverage, Not Ranking

Beam oracle analysis confirms the gap is generative - the solver-blind model fails to propose tractable decompositions at all, not merely rank existing ones poorly.

∅

Robustfill Null Result

Where decomposition order is structurally fixed and ambiguity is absent, SAD matches the solver-blind baseline exactly - a designed falsifiability check that confirms the mechanism.

Solver-Relative Diversity

SAD solves a consistent subset of tasks that the GT decomposition oracle cannot - using genuinely alternative programs (fewer than 2% overlap with GT solutions, differing systematically in length and DSL primitive usage). This confirms that decomposition quality is solver-relative: the synthesizer's inductive bias, not the annotator's choices, should determine what constitutes a good intermediate subgoal.

Experimental Setup

Evaluation Domains

Three standard PBE domains, chosen to vary the key structural property: the degree of valid decomposition ambiguity available for solver-aware training to exploit.

List Domain

Deepcoder

Integer list transformations via a DSL of first- and higher-order functions. Intermediate outputs accumulate as named variables - multiple valid programs and decompositions exist per task, making solver preferences over this ambiguity non-trivial.

Extended List

Lambdabeam

Extends Deepcoder with dynamically constructed lambda functions and a conditional If-operation. A larger, less constrained search space amplifies the cost of solver misalignment at every step.

Falsifiability Check

Robustfill

String transformations with a residual state update that structurally fixes decomposition order. Any deviation from the correct sequential prefix propagates failure irrecoverably - designed to produce a null result that validates the causal claim.

Setup

All models share the same Transformer architecture (3 layers, embedding dim. 512, hidden dim. 1024), beam size 10, 5 random seeds, 1 000 test tasks per seed. Evaluated under in-distribution and length generalisation (trained on lengths 1–4, tested on length 5) settings. Training on NVIDIA RTX A6000; SAD adds ~30% wall-clock time over solver-blind training due to two additional frozen synthesizer forward passes per update step.

Contributions

What We Contribute

Solver-Aware Decomposition (SAD)

A training framework combining supervised GT subgoal imitation as a structural scaffold with SCST using direct feedback from a frozen synthesizer. The reward - synthesizer cross-entropy against the GT program - provides a dense signal that steers decompositions toward the solver's tractable distribution.

The Accuracy Paradox

Empirical demonstration that higher GT decomposition alignment does not translate into synthesis success. SAD achieves higher synthesis and task accuracy in every setting despite matching GT subgoals 2–3× less often - and solves tasks that a GT decomposition oracle cannot.

Decomposition as a Solver-Dependent Decision Problem

A conceptual reframing with implications beyond PBE: any system that learns intermediate representations independently of the bounded component that executes them faces the same structural misalignment. Closing the feedback loop is a general strategy for this class of problem.

Limitations

◦ SAD optimises the decomposer for a specific frozen synthesizer - retraining is required when the synthesizer changes.

◦ The single-step training constraint fixes decomposition granularity; extending SAD to multi-step reward settings remains important future work.

◦ SAD requires GT programs at training - standard in PBE but not universally available. Execution-based rewards would be needed for specification-only settings.

Solver-Aware Decompositions
for Programming by Example

Decomposition Quality Is Solver-Relative

Solver-Aware Decomposition (SAD)

Structural Scaffold

Solver Alignment

Entropy Bonus

Results

The Accuracy Paradox

Compounding Gains

Coverage, Not Ranking

Robustfill Null Result

Evaluation Domains

Deepcoder

Lambdabeam

Robustfill

What We Contribute

Solver-Aware Decomposition (SAD)

The Accuracy Paradox

Decomposition as a Solver-Dependent Decision Problem

Limitations

Citation

Solver-Aware Decompositionsfor Programming by Example

Decomposition Quality Is Solver-Relative

Solver-Aware Decomposition (SAD)

Structural Scaffold

Solver Alignment

Entropy Bonus

Results

The Accuracy Paradox

Compounding Gains

Coverage, Not Ranking

Robustfill Null Result

Evaluation Domains

Deepcoder

Lambdabeam

Robustfill

What We Contribute

Solver-Aware Decomposition (SAD)

The Accuracy Paradox

Decomposition as a Solver-Dependent Decision Problem

Limitations

Citation

Solver-Aware Decompositions
for Programming by Example