Datasets & Tools

Datasets

Expert-Generated Privacy Q&A Dataset for Conversational AI and User Study Insights

A dataset of 42 privacy-related questions and answers designed through an iterative process with the expertise of legal professionals and conversational designers.

Get the dataset here.

Link to the paper.

Tools

SITool - Speech Intelligibility Toolkit for Subjective Evaluation

SITool is a toolkit designed for evaluating speech intelligibility through subjective testing. It provides a Flask-based application designed to conduct intelligibility tests like the Diagnostic Rhyme Test (DRT) and Modified Rhyme Test (MRT) as well as a Python script for analyzing the results. The web application allows participants to listen to audio samples, select answers and submit their responses for analysis.

Link to the tool here. Link to example files here.

bt4vt - Bias Tests for Voice Tech

bt4vt is a python library to diagnose performance discrepancies (i.e. bias) in speaker verification models. The library provides evaluation measures and visualisations to interrogate model performance and can be integrated into development pipelines to test for bias.

Link to the tool here. Link to the documentation here.

The development of this framework is part of the Fair EVA open source project.