The Win Ratio's Methodology Bill Comes Due
A simultaneous wave of papers formalizes the win ratio's pathologies — including a 2×2 reversal in which treatment beats control on every component and still loses overall — while cardiovascular sponsors keep pre-specifying it as primary.
- Survival & time-to-event methods
- Regulatory
- Methodology Frontier
Start with the cleanest result. In a new Statistics in Medicine paper, the authors prove that with two binary endpoints arranged hierarchically, treatment can have higher marginal success probabilities on both components and still produce a win ratio below 1 and a negative net treatment benefit. No distributional assumptions, no censoring tricks, no simulated edge case — a 2×2 table. The mechanism is structural: the secondary endpoint is consulted only inside primary-tie strata, and that conditioning reweights the comparison toward strata where treatment underperforms. If you have been carrying a quiet intuition that “better on each component” implies “better overall” for hierarchical composites, that intuition is now formally wrong.
That paper does not arrive alone. At least eight methodology papers in the current publication cycle — across Statistics in Medicine, JBS, and SBR — are pressure-testing the win ratio / win odds / net benefit family from different directions, at exactly the moment a J Clin Epi meta-epidemiological review documents its rapid uptake as a primary endpoint in cardiovascular and heart-failure trials. The gap between adoption curve and methodological characterization is the story.
What the new papers actually establish
The pathologies being formalized are not a single complaint. A JBS paper on noncollapsibility extends the standard odds-ratio/hazard-ratio critique to the entire win-statistics family and proposes standardization to recover marginal estimates — meaning the unadjusted win ratio is not generally a weighted average of stratum-specific win ratios, even under randomization. A Stats in Medicine NPH comparison shows the win ratio wins on power when treatment effects emerge early, while RMST is more sensitive to late-onset effects; crucially, the unstratified win ratio fails to control type I error under confounding, while the stratified version preserves it. The authors recommend dual reporting of win ratio and RMST difference, not either alone. A proportional win-fractions regression paper demonstrates that PW assumption violations meaningfully bias predictions, and that a constant win ratio can entirely mask a diminishing-returns pattern across a biomarker gradient — surfaced here on HF-ACTION data via the WR R package. A covariate-adjustment paper links the win odds to the marginal probabilistic index to derive adjusted estimators, but explicitly flags small-sample type I error inflation — relevant for cardiovascular outcomes trials with modest event counts. And a direct JBS challenge titled, with admirable economy, “Is the win ratio approach always the best?” benchmarks it against weighted log-rank, cause-specific hazards, and time-to-first-event, with companion JBS papers tackling censoring and missing data, distance-based weights, and cluster-randomized inference.
Read together, the picture is not that the win ratio is wrong. It is that the win ratio is a less settled estimator than its SAP frequency implies. Noncollapsibility, non-monotonicity across components, tie-stratum reweighting, power profiles that flip with the timing of the treatment effect, and adjustment theory that only just arrived — these are properties one would normally finish characterizing before mass pre-specification, not during.
What biometrics teams should actually do
The operational implication is not “stop using the win ratio.” Regulators are receiving it and, in cardiovascular outcomes, expecting it. The implication is that SAPs pre-specifying a win ratio in 2026 need to do more work than SAPs pre-specifying a hazard ratio did in 2016. At minimum: pre-specify component-level marginal summaries and tie-stratum decompositions as supplementary outputs (the “Better is Worse” paper supplies the diagnostic template); pre-specify stratification when randomization is stratified, given the type I error result; consider RMST difference as a co-primary or sensitivity analysis when the treatment-effect timing is uncertain; and, if covariate-adjusted win odds are on the table, document the sample-size regime where the small-sample inflation is acceptable.
The parallel non-proportional hazards subplot — informative event rate, event size re-estimation, and group-sequential design with weighted log-rank and MaxCombo — is the same story in a different dialect: the Schoenfeld-era machinery is being retrofitted for a world where the proportional hazards assumption is, in oncology at least, more aspiration than premise.
Protocol read: The win ratio is not unsafe, but it is under-characterized relative to its uptake; SAPs that pre-specify it as primary without component-level diagnostics, stratification logic, and a sensitivity estimand on a different timing profile are now visibly behind the methodology literature.
What to do now:
- Audit any active or upcoming SAP that pre-specifies win ratio or net benefit as primary; add component-level marginal summaries and tie-stratum decomposition as pre-specified supplementary outputs.
- Pre-specify stratified rather than unstratified win ratio when randomization is stratified, citing the type I error result under confounding.
- Add RMST difference (or a weighted log-rank with a late-emphasis weight) as a sensitivity analysis when the treatment-effect timing profile is uncertain.
- Defer covariate-adjusted win odds in small-sample cardiovascular trials until the type I error inflation regime is quantified for your event count.