When Dividing Requires Knowing How to Conquer - aligning decomposition structure with a bounded synthesizer's tractable distribution, rather than imitating annotator factorisation choices.
Decomposition-based PBE scales synthesis by splitting tasks into subtasks - a decomposer predicts intermediate subgoals, and a synthesizer generates programs conditioned on them. Standard training optimises the decomposer to imitate ground-truth (GT) subgoals, treating decomposition quality as intrinsic to the task structure.
We argue this framing is incomplete. A bounded synthesizer develops its own search dynamics and inductive biases - it does not become a faithful replica of the annotator's decomposition choices. Subgoals that are structurally valid can fall outside the synthesizer's tractable distribution, causing synthesis failures that imitation-only training cannot correct.
Decomposition quality is solver-relative, not intrinsic to the task. The same intermediate representation that is tractable for one solver may be search-intractable for another. Training must account for the specific synthesizer that will act on the subgoals.
We propose Solver-Aware Decomposition (SAD), which retains supervised training on GT subgoals as a structural scaffold while additionally optimising the decomposer via direct feedback from a frozen synthesizer. Subgoals are rewarded based on the synthesizer's loss on the target program - a dense signal of subtask difficulty that steers the decomposer toward the solver's tractable region.
SAD treats decomposition as a solver-dependent decision problem: intermediate representations should be optimized for downstream usability under a fixed solver, rather than for similarity to GT structure.
The training objective combines a supervised scaffold with the RL signal and an entropy bonus to prevent collapse:
Supervised imitation of GT subgoals keeps the decomposer within a meaningful decomposition space. Without it, the policy collapses to degenerate outputs and task accuracy falls below 20%.
SCST feedback from the frozen synthesizer steers the decomposer toward subgoals in the solver's tractable distribution - including subgoals the annotator never explicitly specified.
Prevents premature convergence to deterministic decomposition patterns, preserving the diversity needed to discover solver-tractable alternative paths.
SAD consistently outperforms the solver-blind baseline across both domains and evaluation settings. All SAD vs. solver-blind differences are statistically significant (paired t-test, p < 0.05). SAD also improves over ExeDec in both domains.
A key finding is that higher GT decomposition alignment does not translate into synthesis success. SAD achieves consistently higher synthesis accuracy despite matching GT subgoals 2–3× less often than the solver-blind baseline - confirming that GT alignment is a poor proxy for solver utility.
| Domain | Method | Decomp. Acc. (in-dist.) | Decomp. Acc. (len. gen.) | Synth. Acc. (in-dist.) | Synth. Acc. (len. gen.) |
|---|---|---|---|---|---|
| Deepcoder | SAD | 31.7 ± 0.7 | 21.5 ± 0.5 | 87.2 ± 1.3 | 79.8 ± 1.1 |
| Solver-blind | 79.2 ± 0.4 | 63.3 ± 0.6 | 73.0 ± 1.2 | 64.3 ± 0.8 | |
| Lambdabeam | SAD | 33.7 ± 0.3 | 21.4 ± 0.3 | 61.9 ± 1.0 | 49.0 ± 1.2 |
| Solver-blind | 69.9 ± 0.7 | 48.5 ± 0.7 | 54.2 ± 0.5 | 41.3 ± 0.5 |
The performance gap grows with program length despite SAD being trained on single-step tasks only. Solver alignment compounds across decomposition steps.
Beam oracle analysis confirms the gap is generative - the solver-blind model fails to propose tractable decompositions at all, not merely rank existing ones poorly.
Where decomposition order is structurally fixed and ambiguity is absent, SAD matches the solver-blind baseline exactly - a designed falsifiability check that confirms the mechanism.
SAD solves a consistent subset of tasks that the GT decomposition oracle cannot - using genuinely alternative programs (fewer than 2% overlap with GT solutions, differing systematically in length and DSL primitive usage). This confirms that decomposition quality is solver-relative: the synthesizer's inductive bias, not the annotator's choices, should determine what constitutes a good intermediate subgoal.
Three standard PBE domains, chosen to vary the key structural property: the degree of valid decomposition ambiguity available for solver-aware training to exploit.
Integer list transformations via a DSL of first- and higher-order functions. Intermediate outputs accumulate as named variables - multiple valid programs and decompositions exist per task, making solver preferences over this ambiguity non-trivial.
Extends Deepcoder with dynamically constructed lambda functions and a conditional If-operation. A larger, less constrained search space amplifies the cost of solver misalignment at every step.
String transformations with a residual state update that structurally fixes decomposition order. Any deviation from the correct sequential prefix propagates failure irrecoverably - designed to produce a null result that validates the causal claim.
All models share the same Transformer architecture (3 layers, embedding dim. 512, hidden dim. 1024), beam size 10, 5 random seeds, 1 000 test tasks per seed. Evaluated under in-distribution and length generalisation (trained on lengths 1–4, tested on length 5) settings. Training on NVIDIA RTX A6000; SAD adds ~30% wall-clock time over solver-blind training due to two additional frozen synthesizer forward passes per update step.
A training framework combining supervised GT subgoal imitation as a structural scaffold with SCST using direct feedback from a frozen synthesizer. The reward - synthesizer cross-entropy against the GT program - provides a dense signal that steers decompositions toward the solver's tractable distribution.
Empirical demonstration that higher GT decomposition alignment does not translate into synthesis success. SAD achieves higher synthesis and task accuracy in every setting despite matching GT subgoals 2–3× less often - and solves tasks that a GT decomposition oracle cannot.
A conceptual reframing with implications beyond PBE: any system that learns intermediate representations independently of the bounded component that executes them faces the same structural misalignment. Closing the feedback loop is a general strategy for this class of problem.
This paper is an anonymous submission under review. A preprint is available on arXiv.