ACL 2026 · Main Conference

HARPO: Hierarchical Agentic Reasoning for
User-Aligned Conversational Recommendation

An agentic framework that combines deliberative reasoning with hierarchical preference optimization for high-quality, user-aligned recommendations.

Subham Raj1, Aman Vaibhav Jha1, Mayank Anand2, Sriparna Saha1
IIT Patna1  ·  IIIT Allahabad2

The Problem & Solution

Conversational recommender systems are often trained and evaluated using proxy metrics (Recall@K, BLEU) that weakly reflect true user-aligned recommendation quality. HARPO reframes conversational recommendation as a structured decision-making problem, explicitly optimizing for user satisfaction, relevance, diversity, and engagement.

Abstract: Conversational recommender systems (CRSs) operate under incremental preference revelation, requiring systems to make recommendation decisions under uncertainty. While recent approaches, particularly those built on large language models, achieve strong performance on standard proxy metrics such as Recall@K and BLEU, they often fail to deliver high-quality, user-aligned recommendations in practice. This gap arises because existing methods primarily optimize for intermediate objectives like retrieval accuracy, fluent generation, or tool invocation, rather than recommendation quality itself.

HARPO integrates: (i) CHARM — hierarchical preference learning that decomposes recommendation quality into interpretable dimensions (relevance, diversity, predicted user satisfaction, and engagement) and learns context-dependent weights; (ii) STAR — deliberative tree-search reasoning guided by a learned value network; (iii) BRIDGE — domain-agnostic reasoning abstractions enabling cross-domain transfer; and (iv) MAVEN — multi-agent refinement through collaborative critique.

Overall architecture of the HARPO framework
📐 Place figures/Quantum_CRS.png in the same directory to display the architecture diagram.
Figure 2: Overall architecture of the HARPO framework. The model integrates four components: STAR for structured agentic reasoning, CHARM for hierarchical preference optimization, BRIDGE for cross-domain transfer, and MAVEN for multi-agent refinement, all built on a shared language model backbone.

Framework 4 Components

HARPO integrates four tightly coupled modules built on a shared language model backbone (DeepSeek-R1-Distill-Qwen-7B).

🌳
STAR

Structured Tree-of-Thought Agentic Reasoning
Beam search over structured reasoning states guided by a learned value network that predicts multi-dimensional recommendation quality rather than task completion.

🎯
CHARM

Contrastive Hierarchical Alignment with Reward Marginalization
Decomposes recommendation quality into four reward dimensions with context-dependent meta-learned weights and margin-based preference optimization.

🌉
BRIDGE

Cross-Domain Transfer
Adversarial domain adaptation with learnable domain gates — preserves domain-invariant reasoning patterns while retaining domain-specific information.

🤝
MAVEN

Multi-Agent Refinement
Three specialized agents (Recommender, Critic, Explainer) collaborate through shared representations with an agreement loss promoting coherent consensus.

Evaluation Metrics User-Aligned

HARPO introduces a quality-centric evaluation perspective separating user-aligned measures from standard proxy metrics.

Primary · User-Aligned
User Satisfaction
CHARM reward score for predicted user satisfaction, validated via Pearson correlation with human judgments (r=0.73).
Primary · User-Aligned
Engagement
Predicts follow-up rate and continued interaction. Pearson r=0.64 with human follow-up behavior.
Primary · User-Aligned
Diversity-adj. Relevance
Combined relevance (CHARM r=0.71) and diversity (r=0.68) reward, measuring breadth alongside precision.
Primary · Human
Human Preference
Expert annotator Overall score (1–5 Likert, 200 samples/dataset, Fleiss' κ > 0.72), averaged across Rec.Q and Exp.Q.
Secondary · Proxy
Recall@K
Standard retrieval metric over 100 candidates (99 negatives). Reported for K ∈ {1, 10, 50}.
Secondary · Proxy
NDCG / MRR
Ranking quality metrics at K=10. Reported alongside Recall as complementary proxy signals.

Leaderboard

HARPO Benchmark
ReDial (Movies)
INSPIRED (Movies)
MUSE (Fashion)
Type:

Rankings across three datasets on user-aligned metrics. Click any column header to sort. Filter by model type or search by name. Higher is better for all metrics.

# Model Satisfaction ↕ Engagement ↕ Div.-Adj. Relevance ↕ Human Pref. ↕ Overall Score ↓

† Text-only adaptation. ‡ Fine-tuned per Wang et al. 2025. Scores normalized [0,1]. Human Pref. = Table 8 Overall score (1–5, normalized).

Submit Your System

Paste your results JSON below. Evaluated against the official HARPO benchmark API at github.com/harpo-bench/harpo.

📋 Paste Results JSON

All fields required. Results are verified server-side using the HARPO evaluation API.

View full JSON schema
{
  "method_name": string,
  "team": string,
  "dataset": "redial"|"inspired"|"muse",
  "predictions": [{ "conv_id": string, "recommended_items": number[] }],
  "paper_url": string | null,
  "code_url": string | null,
  "description": string  // ≤ 200 chars
}

Benchmark Results

HARPO demonstrates consistent improvements across three conversational recommendation benchmarks (ReDial, INSPIRED, MUSE) with particularly strong gains on user-aligned metrics. All improvements significant at p < 0.01 (paired t-test, Bonferroni correction).

ReDial
INSPIRED
MUSE (Multimodal)
MethodR@1R@10R@50MRR@10NDCG@10User Sat.Engage.
KBRD Open-source2.9±0.216.7±0.436.2±0.77.4±0.210.2±0.30.42±0.020.38±0.02
KGSF Open-source3.8±0.218.1±0.537.4±0.78.4±0.311.6±0.40.45±0.020.41±0.02
BARCOR Open-source3.0±0.216.8±0.436.8±0.67.8±0.210.8±0.30.44±0.020.40±0.02
LLaMA-2-7B Open-source2.2±0.313.6±0.633.4±0.96.2±0.38.6±0.40.38±0.020.34±0.02
LLaMA-2-13B Open-source2.8±0.315.4±0.635.6±1.07.2±0.49.9±0.50.43±0.020.39±0.02
UniCRS Open-source4.8±0.321.2±0.540.8±0.810.1±0.313.8±0.40.51±0.020.47±0.02
DCRS Agent7.5±0.325.1±0.643.6±0.912.2±0.415.2±0.50.56±0.020.52±0.02
ChatGPT GPT3.3±0.417.0±0.737.8±1.18.0±0.411.0±0.50.49±0.030.45±0.03
GPT-4 GPT4.5±0.419.4±0.840.2±1.29.6±0.513.2±0.60.55±0.030.51±0.03
RecMind Agent5.8±0.322.6±0.642.2±0.911.2±0.415.3±0.50.54±0.020.50±0.02
InteRecAgent Agent5.2±0.321.4±0.641.0±0.810.4±0.414.3±0.50.52±0.020.48±0.02
HARPO Ours9.1±0.329.8±0.750.2±1.015.6±0.521.2±0.60.68±0.020.64±0.02

† Text-only adaptation. ‡ Fine-tuned following Wang et al. (2025).

Cite This Work

If you use HARPO or the HARPO evaluation suite, please cite our ACL 2026 paper:

@inproceedings{raj2026harpo,
  title={HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation},
  author={Raj, Subham and Jha, Aman Vaibhav and Anand, Mayank and Saha, Sriparna},
  booktitle={Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={35580--35599},
  year={2026}
}

Ready to Try HARPO?

View on GitHub

HARPO reframes conversational recommendation as a quality-centric decision-making problem — optimizing for what users actually care about, not just what metrics can measure.

Accepted at ACL 2026  ·  © 2026 HARPO Authors  ·  IIT Patna · IIIT Allahabad