What is subjective audio evaluation?

Purple Flower
Purple Flower

Subjective audio evaluation is the assessment of generated (e.g., by using generative AI models), processed (noise reduction, compression, echo cancellation, and so on) audio or speech by human listeners. The human evaluators play a crucial role in determining the effectiveness and quality of the audio output.

The main goals of subjective audio evaluation include:

  1. Naturalness: How closely does the AI-generated voice resemble a human voice?

  2. Quality: What is the overall quality level of the audio or speech? How much noise do you hear?

  3. Similarity: How similar is the AI-generated speech to the intended target or original speech? How similar is the voice of the French-speaking Elon Musk to his original English-speaking voice?

  4. Preferences: What are the listeners' preferences among different versions of AI-generated audio?

The ultimate goal is to gain insights into the usability of the output, and to find ways to further improve generative AI models, speech enhancement techniques, noise reduction algorithms, and other related technologies.

However, is this process simple? In reality, it is not unfortunately. Before executing such evaluations, you need to address numerous preliminary questions:

  • What is the goal of the evaluation?

  • Who will participate in the evaluation session?

  • How will you find and recruit the human evaluators?

  • How to qualify/disqualify the human evaluators before & after the evaluation?

  • What acoustic environment is relevant or acceptable?

  • Which evaluation type and scale should you use?

  • How will you compensate the evaluators from all over the world?

  • How will you analyze the data collected?

- And many more logistical and methodological considerations.

Assuming that you have relevant answers to most of the questions, let’s delve deeper into the evaluation types and scales. There are established subjective evaluation standards, recommended by the International Telecommunication Union (ITU) such ITU-T P.835, ITU-R BS.1534 (MUSHRA), and de facto standards used within the industry.

Assume you want to evaluate how natural the AI-generated human voice is. One widely used evaluation method for this type of assessment is the Mean Opinion Score (MOS). In this method, evaluators rate the naturalness of the audio on a scale from excellent, good, fair, poor, to bad. For statistically meaningful results, you must:

  • Ask multiple human evaluators to listen to each audio file.

  • Compute various statistical analyses including mean, median, standard deviation, and confidence intervals.

Typically, you would evaluate multiple audio files containing generated speech. By compiling all the data, you can compute overall naturalness statistics and create a table such as:

Voilà! Now you can draw a first insight into how natural the output speech is.

This example, however, overlooks many sophisticated details as the actual evaluation process is much more complex and cumbersome. Evaluations must consider the diversity of listeners, potential biases, the context in which the audio will be used, and the specific attributes of speech quality and naturalness that are most relevant to the application at hand.

In conclusion, subjective audio evaluation is a critical process in the development and refinement of AI-generated audio technologies. By thoroughly planning and executing these evaluations, we can gather valuable insights that drive the improvement of AI models and audio processing techniques, ultimately leading to more natural and high-quality audio experiences.


Other readings

Product Update: Podonos Wizard launch

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

July 28, 2025

|

2 min read

Why Post-Refining Matters in Voice AI: Making Sense of Raw Evaluation Data

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

July 21, 2025

|

2 min read

Prescreening Human Evaluators: The First Step Toward Reliable Voice AI Evaluation

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

July 7, 2025

|

3 min read

Podonos TTS Voice AI Model Evaluation Multilanguage
Podonos TTS Voice AI Model Evaluation Multilanguage

Beyond English: Expanding TTS Evaluation into Multi-languages

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

June 19, 2025

|

2 min read

Gemini vs ElevenLabs Podonos Voice AI Evaluation
Gemini vs ElevenLabs Podonos Voice AI Evaluation

Gemini 2.5 TTS vs. ElevenLabs: A Side-by-side Performance

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

June 12, 2025

|

2 min read

[Case Study] How Resemble AI Used Podonos to Benchmark Chatterbox

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

May 28, 2025

|

2 min read

Image
Image

Evaluate leading text-to-speech models – US English

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

November 24, 2024

|

4 min read

Teal Flower
Teal Flower

Podonos joins Google for AI Academy program

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

October 18, 2024

|

1 min read

Pink Flower
Pink Flower

Speech Synthesis Performance: OpenAI Text To Speech for Korean

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

September 23, 2024

|

3 min read

Yellow Flower
Yellow Flower

Podonos joins NVidia Inception program

Quickly uncover deep insights into your voice AI's strengths and drive faster development, smarter marketing, and flawless delivery.

August 1, 2024

|

1 min read

Ready to unlock the potential of your voice AI Model?

Ready to unlock the potential of your voice AI Model?

Improve your model with trust

Improve your model with trust