Looking for vulnerable customer support Datasets for building a Multi-turn Chatbot

There are too many candidates, so let’s first ask the AI to list them.


There is no single “vulnerable-customer support” dataset. You get best results by combining (1) customer-support dialogs, (2) empathy/mental-health counseling dialogs, (3) document-grounded support dialogs, and (4) safety/PII tooling and preference data. Sources below.

1) Customer-support and task-oriented dialog corpora

  • Customer Support on Twitter (brand↔user threads). Good for realistic tone and handoffs. Kaggle. (Kaggle)

  • MSDialog (multi-turn tech support with utterance-level intent labels). (arXiv, GitHub)

  • Schema-Guided Dialogue (SGD) (18K multi-domain goal-oriented dialogs, incl. banking). (Hugging Face)

  • Taskmaster-1/2 (Google, realistic multi-turn scripts; several customer-service domains). (arXiv, ResearchGate)

  • TicketTalk (transactional support in movie-ticketing). (ResearchGate)

  • AIRDialogue (flight-booking, long multi-turn). (Hugging Face)

  • MultiWOZ 2.4 (large multi-domain TOD; use as a proxy for slots, escalation, DST). (ACL archive)

2) Empathy / counseling datasets for “vulnerability-aware” behavior

  • EmpatheticDialogues and ESConv (emotional-support conversations; many papers build on ESConv). (PMC, ACL archive)

  • PESConv (persona-augmented ESConv; evaluates empathy, relevance, emotion intensity). (ACL archive)

  • CounselChat / mental-health counseling Q&A (therapist answers; several scrapes). Check licensing. (GitHub, Kaggle)

  • MedDialog-EN and MediTOD (doctor-patient; for health-risk triage patterns). Note the scraping terms on sources; review before use. (Hugging Face, par.nsf.gov, ACL archive)

  • Motivational Interviewing corpora: AnnoMI, MI-TAGS, EMMI (empathy-rich, annotated therapist behaviors). (GitHub, Kaggle, ACL archive, arXiv)

3) Document-grounded “support from policy/KB” dialogs

  • Doc2Dial and MultiDoc2Dial (agent responses grounded in public-service docs; matches enterprise KB+policy flow). (doc2dial.github.io, ACL archive, GitHub)

  • DialDoc shared tasks for open-book, multi-doc response generation baselines and metrics. (ACL archive)

4) Safety, guardrails, and preference data

  • HH-RLHF preference + red-teaming data for helpfulness/harmlessness; useful for refusal and de-escalation behavior. (GitHub, Hugging Face, arXiv)

  • Plain-language and readability resources for low literacy users: textstat and government plain-language guides. (GitHub, GOV.UK, Service manual)

  • PII/PHI de-identification for training data: Microsoft Presidio (+NVIDIA NeMo Guardrails integration) or Phileas. (GitHub, Microsoft GitHub, NVIDIA Docs)

  • Regulatory framing for “vulnerable customers” (clear definitions and actions): UK FCA FG21/1. Use it to design escalation and “tell-us-once” patterns. (FCA)

5) Text simplification for accessibility

  • Datasets and ready models to simplify bot outputs: GEM/wiki_auto + ASSET/TURK; mrm8488/t5-small-finetuned-text-simplification; agentlans/flan-t5-small-simplifier. (Hugging Face)

6) Training recipes that work in practice

7) Evaluation you’ll actually need

  • Task success on support corpora (dst/slot accuracy, resolution rate) + grounding faithfulness on Doc2Dial/MultiDoc2Dial. (ACL archive)

  • Human eval for empathy and helpfulness: ACUTE-Eval; note known evaluator-group variance. (arXiv, ar5iv, ACL archive)

  • Operational empathy metrics exist in CCaaS tooling; use as inspiration, not ground truth. (Genesys Cloud Resource Center)

8) System blueprint (minimum viable path)

  1. Collect & clean: start with Customer Support on Twitter + MSDialog; redact PII; label “risk cues” (bereavement, disability, financial hardship). (Kaggle, arXiv, GitHub)

  2. Grounding: load internal policies/KB; pretrain the bot on Doc2Dial/MultiDoc2Dial patterns; then RAG over your docs. (doc2dial.github.io)

  3. Empathy: SFT on ESConv/PESConv; add text-simplification post-processor for low literacy. (ACL archive, Hugging Face)

  4. Safety: add HH-RLHF-style preferences and refusal exemplars; implement crisis keywords→human handoff. (GitHub)

  5. Evaluate: run ACUTE-Eval for empathy/helpfulness; run grounding checks on Doc2Dial tasks; pilot with FCA-style vulnerable-customer journeys. (arXiv, ACL archive, FCA)

Caveats

  • Some health counseling datasets have restrictive terms or scraping concerns. Always verify license/ToS before training. (Hugging Face)