arrow_back

Improve Speech-to-Text Accuracy

로그인 가입
700개 이상의 실습 및 과정 이용하기

Improve Speech-to-Text Accuracy

실습 1시간 universal_currency_alt 크레딧 5개 show_chart 중급
info 이 실습에는 학습을 지원하는 AI 도구가 통합되어 있을 수 있습니다.
700개 이상의 실습 및 과정 이용하기

GSP1344

Google Cloud self-paced labs logo

Overview

Automated Speech Recognition (ASR), also known as machine transcription or Speech-to-Text, uses machine learning to turn audio containing speech into text.

Machine learning systems rarely achieve 100% accuracy. The Google Speech Adaptation API, a feature within Speech-to-Text, is a method to enhance accuracy by providing context and hints. This is useful for recognizing specific words or phrases that might otherwise be misrecognized or have a lower probability of correct identification.

In this lab, you further improve the results of using Speech-to-Text on a provided dataset.

Prerequisites

Before starting this lab, it’s recommended you complete or are familiar with the concepts covered in Measuring Speech-to-Text Accuracy, which are speech quality and measuring accuracy.

Objectives

In this lab, you learn about the following:

  • Use the Speech Adaptation API and understand how it works.
  • Apply speech adaptation and language biasing on your own data.

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.

This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito (recommended) or private browser window to run this lab. This prevents conflicts between your personal account and the student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab—remember, once you start, you cannot pause a lab.
Note: Use only the student account for this lab. If you use a different Google Cloud account, you may incur charges to that account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:

    • The Open Google Cloud console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).

    The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username below and paste it into the Sign in dialog.

    {{{user_0.username | "Username"}}}

    You can also find the Username in the Lab Details pane.

  4. Click Next.

  5. Copy the Password below and paste it into the Welcome dialog.

    {{{user_0.password | "Password"}}}

    You can also find the Password in the Lab Details pane.

  6. Click Next.

    Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  7. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Google Cloud console opens in this tab.

Note: To access Google Cloud products and services, click the Navigation menu or type the service or product name in the Search field. Navigation menu icon and Search field

About improving speech accuracy

Speech-to-Text lets you transcribe audio input as well as measure the accuracy of the resulting transcription.

There are many ways to improve the accuracy and lower the word error rate (WER). In this lab you consider the following three:

  • Customize the model to your domain by providing contextual information.

    Example: You are creating a bot that allows people to order pizza. You might increase the probability that words like pepperoni, olives, and mozzarella are recognized.

  • Tweak weights to address specific word / phrase issues.

    Example: Proper nouns, rare words, or made up words are unlikely to be transcribed correctly initially. Adding bias towards them can fix individual terms.

  • Use context to bias towards specific types of information or words.

    Example: You have an Interactive Voice Routing/Response (IVR) telephone system and have just asked someone for their order number. You can bias specifically towards an alphanumeric entry.

When evaluating quality, look at where the system makes errors. If you think any of the above three types of context could help improve accuracy you can use the Speech Adaptation API available in the Cloud Speech-to-Text API.

About speech adaptation

Speech-to-Text has tools that let you provide contextual information to increase accuracy of data. The Speech Adaptation API lets you pass phrases and associated weights directly to the Speech-to-Text API.

Phrases can be changed with every request, which allows for quick iteration as well as on the fly adaptation. All you do is include the terms in the request itself as part of the recognition config:

"speech_contexts": [{ "phrases": ["foo", “bar”], "boost": 10.0 },{ "phrases": ["foo bar", “bar foo”], "boost": 5.0 }] }

This type of biasing is better than other methods, such as custom language models or complex grammars. It's easier to set up, doesn’t require special training or deployment, and is included free when you use the Cloud Speech-to-Text API.

Determine speech adaptation terms

Although Speech-to-Text makes it easy to bias the system, you still have to figure out the right terms to send to the API. Consider the following when deciding what terms to include with biasing.

  • What am I doing with this transcript? Is there a downstream system that's sensitive to particular words or phrases?

    These words or phrases should be biased towards since getting them correct is very important.

  • Are there rare words or proper nouns?

    These words or phrases may not be predicted correctly since they occur infrequently and should be biased towards.

  • What contextual info can I use? Do you know what words somebody might say or what they said in the past?

    These can be biased towards to help increase accuracy even on commonly occurring words if you are sure they will be present.

  • Do you have a “strong” or “weak” context?

    You can bias heavily with “strong” context if you are sure the user is about to mention some specific words.

    You should bias less if you have “weak” context meaning you know what words will occur but not exactly sure where or when.

Task 1. Create a Vertex AI Workbench instance

This lab has curated and created a focused dataset based on public domain books and audio from the Librispeech project. All the code you need to measure and improve the accuracy of using Speech-to-Text on this dataset is provided. In this task, you learn how to set up and use this code.

  1. In the Google Cloud console, from the Navigation menu (Navigation menu), select Vertex AI > Dashboard.

  2. Click Enable All Recommended APIs.

  3. On the left-hand side, click Workbench.

  4. At the top of the Workbench page, ensure you are in the Instances view.

  5. Click add boxCreate New.

  6. Configure the Instance:

    • Name: lab-workbench
    • Region: Set the region to
    • Zone: Set the zone to
    • Advanced Options (Optional): If needed, click "Advanced Options" for further customization (e.g., machine type, disk size)

Create a Vertex AI Workbench instance

  1. Click Create.
Note: The instance will take a few minutes to create. A green checkmark will appear next to its name when it's ready.
  1. Click Open JupyterLab next to the instance name to launch the JupyterLab interface. This will open a new tab in your browser.

Workbench Instance Deployed

  1. Click the Terminal icon to open a terminal window.

Open the Jupyter Notebook

Your terminal window will open in a new tab. You can now run commands in the terminal to interact with your Workbench instance.

terminal window opened

Load the notebook

  1. Under Other, click Terminal.
Note: If your terminal window is blank, change your terminal theme to Light by navigating to Settings > Terminal Theme > Light.
  1. In the terminal, run the following commands to copy the notebooks:
gsutil cp gs://spls/gsp758/notebook/speech_adaptation.ipynb . gsutil cp gs://spls/gsp758/notebook/simple_wer_v2.py .

Perform the following tasks to Play Audio Files in an Incognito Window:

  1. Within Chrome click on the 3 dots > Settings.
  2. In the Search Settings type Incognito.
  3. In the results, click on Third-party cookies.
  4. Go to Allowed to use third-party cookies.
  5. Click Add.
  6. Copy the JUPYTERLAB domain, do not include https.

It should be something like:

[YOUR_NOTEBOOK_ID].notebooks.googleusercontent.com
  1. Check Current incognito session only click Add.

Now continue to the notebook.

  1. Open the speech_adaptation.ipynb notebook to follow the instructions inside to compute the WER on the provided dataset.

Click Check my progress to verify the objective. Create the Vertex AI Workbench Notebook instance

In the following sections, you run through the notebook cells to see how to measure the quality and accuracy of Automated Speech Recognition (ASR) systems.

Task 2. Gather audio files

  • In this task, you gather audio files and define the recognize_all method. You use this method to iterate and tune each test and observe the effect on recognition quality. Run the notebook cells (up to the Getting Started section) to gather audio files and define the recognize_all method.

Click Check my progress to verify the objective. Gather Audio Files

Task 3. Perform the baseline test

  • In this task, you perform a baseline test to measure the performance of Speech API. To perform the baseline test, run the Getting Started section of the notebook.

Click Check my progress to verify the objective. Perform baseline test

Task 4. Get started with adaptation

  • In this task, you pick out what words and phrases from the baseline test are most important for transcript accuracy and pass them to the recognize_all method. Run the Getting Started With Adaptation section of the notebook.

Click Check my progress to verify the objective. Getting started with adaptation

Task 5. Tune adaptation results

  • In this task, you tune the adaptation results. Run the Tuning Adaptation Results section of the notebook.

Click Check my progress to verify the objective. Tune Adaptation Results

Task 6. Evaluate key phrases with both clean and noisy audio

  • In this task, you evaluate key phrases with both clean and noisy audio. Run the Spoilers Ahead section of the notebook.

Click Check my progress to verify the objective. Evaluate key phrases with both clean and noisy audio

Congratulations!

You successfully measured and improved the accuracy of Speech-to-Text on a real dataset. You learned how to talk about and compare accuracy metrics and how to approach measuring accuracy. You successfully set up Python tools for performing automated speech recognition and measuring accuracy.

You are now ready to try these tools on your own data and put what you have learned into practice.

Manual Last Updated September 3, 2025

Manual Last Tested September 10, 2025

Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

시작하기 전에

  1. 실습에서는 정해진 기간 동안 Google Cloud 프로젝트와 리소스를 만듭니다.
  2. 실습에는 시간 제한이 있으며 일시중지 기능이 없습니다. 실습을 종료하면 처음부터 다시 시작해야 합니다.
  3. 화면 왼쪽 상단에서 실습 시작을 클릭하여 시작합니다.

시크릿 브라우징 사용

  1. 실습에 입력한 사용자 이름비밀번호를 복사합니다.
  2. 비공개 모드에서 콘솔 열기를 클릭합니다.

콘솔에 로그인

    실습 사용자 인증 정보를 사용하여
  1. 로그인합니다. 다른 사용자 인증 정보를 사용하면 오류가 발생하거나 요금이 부과될 수 있습니다.
  2. 약관에 동의하고 리소스 복구 페이지를 건너뜁니다.
  3. 실습을 완료했거나 다시 시작하려고 하는 경우가 아니면 실습 종료를 클릭하지 마세요. 이 버튼을 클릭하면 작업 내용이 지워지고 프로젝트가 삭제됩니다.

현재 이 콘텐츠를 이용할 수 없습니다

이용할 수 있게 되면 이메일로 알려드리겠습니다.

감사합니다

이용할 수 있게 되면 이메일로 알려드리겠습니다.

한 번에 실습 1개만 가능

모든 기존 실습을 종료하고 이 실습을 시작할지 확인하세요.

시크릿 브라우징을 사용하여 실습 실행하기

이 실습을 실행하려면 시크릿 모드 또는 시크릿 브라우저 창을 사용하세요. 개인 계정과 학생 계정 간의 충돌로 개인 계정에 추가 요금이 발생하는 일을 방지해 줍니다.