Speech Buddy // Designing with AI 2024/25

This project is a comprehensive real-time speech analysis tool designed to evaluate and enhance spoken communication. By leveraging machine learning and signal processing, the system analyzes audio input to extract meaningful insights, including speech rate, pauses, filler words, emotional tone, and engagement levels. The tool is equipped with multiple components that assess audio dynamics and provide actionable feedback to improve communication skills.

How does it work?

The system captures audio in real-time, processing it in chunks to evaluate various speech attributes. Key components include:

Pause Detection: Identifies and categorizes pauses in speech based on energy levels and duration. It distinguishes between acceptable and excessive pauses, offering insights into pacing and flow.
Speech Pattern Analysis: Examines the rate of speech, patterns such as repetitions and hesitations, and evaluates whether the speaking pace falls within an optimal range.
Emotion Detection: Infers the speaker’s emotional tone using a placeholder model, providing feedback on emotional resonance and variability.
Engagement and Attention Metrics: Calculates metrics like volume (RMS) and engagement based on volume variance. These are combined to estimate attention scores, helping users gauge their audience's perceived involvement.

Who is it for?

The tool is designed for public speakers, educators, professionals, and anyone looking to refine their communication skills. It is particularly useful for those in fields requiring high levels of audience engagement, such as teaching, broadcasting, and leadership roles.

Why did the research lead to this concept?

The idea emerged from the need to bridge the gap between self-perceived communication effectiveness and actual audience reception. Research highlighted common issues in spoken communication, such as inconsistent pacing, overuse of filler words, and disengaging monotones. This tool addresses these challenges by providing precise, real-time feedback to help users self-correct and improve.

What is the value?

The system offers significant value by fostering better communication habits. Users gain insights into their speaking patterns, which can improve confidence, clarity, and audience engagement. The tool is also an affordable and accessible alternative to in-person coaching or feedback sessions.

What were the learnings?

Developing this project underscored the importance of balancing technical precision with user-friendly design. The team learned that real-time feedback must be intuitive and actionable, ensuring users can immediately apply insights. Additionally, incorporating diverse metrics like emotion and engagement adds depth to the analysis, making the tool more holistic and impactful.

Designing with AI

2024/25

Speech Buddy

Sahil Islam

Dishank Gandhi

Your Speech, Improved

What is it?

How does it work?

Who is it for?

Why did the research lead to this concept?

What is the value?

What were the learnings?

PROJECT PHOTOS