访问 700 多个实验和课程

有語音可聽！使用 Text-to-Speech 製作合成語音

实验 1 小时 universal_currency_alt 1 积分 show_chart 入门级

info 此实验可能会提供 AI 工具来支持您学习。

GSP1040
總覽
設定和需求
工作 1：啟用 Text-to-Speech API
工作 2：建立虛擬環境
工作 3：建立服務帳戶
工作 4：取得可用的語音清單
工作 5：根據文字合成語音
工作 6：從 SSML 建立合成語音
工作 7：設定音訊輸出和裝置設定檔
恭喜！

访问 700 多个实验和课程

GSP1040

Google Cloud 自學實驗室標誌

總覽

Text-to-Speech API 可建立機器生成的合成人類語音音訊檔案。您以文字或語音合成標記語言 (SSML) 提供內容，指定「語音」 (具有獨特語調和口音的語言「說話者」)，並設定輸出內容。Text-to-Speech API 會將您傳送的內容轉換成語音，以您指定的聲音透過音訊資料傳回。

在本實驗室中，您將使用 Text-to-Speech API 建立一系列音訊檔案，然後聆聽這些檔案，比較彼此的差異。

課程內容

在本實驗室中，您將使用 Text-to-Speech API 執行下列操作：

建立一系列音訊檔案。
聆聽並比較音訊檔案。
設定音訊輸出。

設定和需求

瞭解以下事項後，再點選「Start Lab」按鈕

請詳閱以下操作說明。實驗室活動會計時，且中途無法暫停。點選「Start Lab」後就會開始計時，顯示可使用 Google Cloud 資源的時間。

您將在真正的雲端環境完成實作實驗室活動，而不是模擬或示範環境。為此，我們會提供新的暫時憑證，供您在實驗室活動期間登入及存取 Google Cloud。

為了順利完成這個實驗室，請先確認：

可以使用標準的網際網路瀏覽器 (Chrome 瀏覽器為佳)。

注意事項：請使用無痕模式 (建議選項) 或私密瀏覽視窗執行此實驗室，這可以防止個人帳戶和學員帳戶之間的衝突，避免個人帳戶產生額外費用。

是時候完成實驗室活動了！別忘了，活動一旦開始將無法暫停。

注意事項：務必使用實驗室專用的學員帳戶。如果使用其他 Google Cloud 帳戶，可能會產生額外費用。

如何開始研究室及登入 Google Cloud 控制台

點選「Start Lab」按鈕。如果實驗室會產生費用，畫面上會出現選擇付款方式的對話方塊。左側的「Lab Details」窗格會顯示下列項目：
- 「Open Google Cloud console」按鈕
- 剩餘時間
- 必須在這個研究室中使用的臨時憑證
- 完成這個實驗室所需的其他資訊 (如有)
點選「Open Google Cloud console」；如果使用 Chrome 瀏覽器，也能按一下滑鼠右鍵，選取「在無痕視窗中開啟連結」。

接著，實驗室會啟動相關資源，並開啟另一個分頁，顯示「登入」頁面。

提示：您可以在不同的視窗中並排開啟分頁。
注意：如果頁面中顯示「選擇帳戶」對話方塊，請點選「使用其他帳戶」。
如有必要，請將下方的 Username 貼到「登入」對話方塊。
{{{user_0.username | "Username"}}}
您也可以在「Lab Details」窗格找到 Username。
點選「下一步」。
複製下方的 Password，並貼到「歡迎使用」對話方塊。
{{{user_0.password | "Password"}}}
您也可以在「Lab Details」窗格找到 Password。
點選「下一步」。
重要事項：請務必使用實驗室提供的憑證，而非自己的 Google Cloud 帳戶憑證。注意：如果使用自己的 Google Cloud 帳戶來進行這個實驗室，可能會產生額外費用。
按過後續的所有頁面：
- 接受條款及細則。
- 由於這是臨時帳戶，請勿新增救援選項或雙重驗證機制。
- 請勿申請免費試用。

Google Cloud 控制台稍後會在這個分頁開啟。

注意：如要使用 Google Cloud 產品和服務，請點選「導覽選單」，或在「搜尋」欄位輸入服務或產品名稱。「導覽選單」圖示和搜尋欄位

啟動 Cloud Shell

Cloud Shell 是搭載多項開發工具的虛擬機器，提供永久的 5 GB 主目錄，而且在 Google Cloud 中運作。Cloud Shell 提供指令列存取權，方便您使用 Google Cloud 資源。

點按 Google Cloud 控制台頂端的「啟用 Cloud Shell」圖示。
系統顯示視窗時，請按照下列步驟操作：
- 繼續操作 Cloud Shell 視窗。
- 授權 Cloud Shell 使用您的憑證發出 Google Cloud API 呼叫。

連線建立完成即代表已通過驗證，而且專案已設為您的 Project_ID：。輸出內容中有一行文字，宣告本工作階段的 Project_ID：

Your Cloud Platform project in this session is set to {{{project_0.project_id | "PROJECT_ID"}}}

gcloud 是 Google Cloud 的指令列工具，已預先安裝於 Cloud Shell，並支援 Tab 鍵自動完成功能。

(選用) 您可以執行下列指令來列出使用中的帳戶：

gcloud auth list

點按「授權」。

輸出內容：

ACTIVE: * ACCOUNT: {{{user_0.username | "ACCOUNT"}}} To set the active account, run: $ gcloud config set account `ACCOUNT`

(選用) 您可以使用下列指令來列出專案 ID：

gcloud config list project

輸出內容：

[core] project = {{{project_0.project_id | "PROJECT_ID"}}}

注意：如需 gcloud 的完整說明，請前往 Google Cloud 參閱 gcloud CLI 總覽指南。

設定專案區域

在 Cloud Shell 中輸入下列指令，設定要在本實驗室中執行專案的區域：

gcloud config set compute/region {{{project_0.default_region | Region}}}

工作 1：啟用 Text-to-Speech API

Cloud Text-to-Speech API 頁面會顯示詳細資料、指標等資訊。

啟用 API：

在「導覽選單」 () 中，依序點選「API 和服務」>「程式庫」。
在「搜尋 API 和服務」方塊中輸入 text-to-speech，然後在搜尋結果中點選「Cloud Text-to-Speech API」。
點選「啟用」來啟用 Cloud Text-to-Speech API。

請稍候片刻，等待 API 在專案中啟用。

點選「Check my progress」，確認目標已達成。啟用 Text-to-Speech API

工作 2：建立虛擬環境

Python 虛擬環境可用來獨立安裝套件，將其與系統區隔開來。

安裝 virtualenv 環境：

sudo apt-get install -y virtualenv

建構虛擬環境：

python3 -m venv venv

啟用虛擬環境：

source venv/bin/activate

工作 3：建立服務帳戶

您會使用服務帳戶驗證對 Text-to-Speech API 的呼叫。

建立服務帳戶：

在 Cloud Shell 中執行下列指令：

gcloud iam service-accounts create tts-qwiklab

產生金鑰來使用該服務帳戶：

gcloud iam service-accounts keys create tts-qwiklab.json --iam-account tts-qwiklab@{{{project_0.project_id | Project ID}}}.iam.gserviceaccount.com

最後，將 GOOGLE_APPLICATION_CREDENTIALS 環境變數設為金鑰檔案的位置：

export GOOGLE_APPLICATION_CREDENTIALS=tts-qwiklab.json

點選「Check my progress」，確認目標已達成。建立服務帳戶

工作 4：取得可用的語音清單

Text-to-Speech API 提供多種語音和語言，可用於建立音訊檔。你可以使用任何可用語音來朗讀內容。

注意：Text-to-Speech API 包含多種優質語音，稱為 WaveNet 語音，可生成更自然的合成語音。這些語音的價格也比其他可用語音高一些。詳情請參閱 Cloud Text-to-Speech 定價頁面。

在 Cloud Shell 中，使用下列 curl 指令，列出使用 Text-to-Speech API 建立合成語音時可用的語音：

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ "https://texttospeech.googleapis.com/v1/voices"

Text-to-Speech API 會傳回 JSON 格式的結果，看起來會像下方這樣：

{ "voices": [ { "languageCodes": [ "es-ES" ], "name": "es-ES-Standard-A", "ssmlGender": "FEMALE", "naturalSampleRateHertz": 24000 }, { "languageCodes": [ "ja-JP" ], "name": "ja-JP-Standard-A", "ssmlGender": "FEMALE", "naturalSampleRateHertz": 22050 }, { "languageCodes": [ "pt-BR" ], "name": "pt-BR-Standard-A", "ssmlGender": "FEMALE", "naturalSampleRateHertz": 24000 }, ... ] }

在 curl 指令結果中，請注意每種語音都有四個欄位：

name：要求該語音時須提供的語音 ID。
ssmlGender：SSML W3 建議中定義的語音性別，用於朗讀文字。
naturalSampleRateHertz：語音的取樣率。
languageCodes：與該語音相關的語言代碼清單。

請注意，部分語言提供多種語音選擇。

將 API 傳回的結果範圍限定在單一語言代碼：

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ "https://texttospeech.googleapis.com/v1/voices?language_code=en"

工作 5：根據文字合成語音

現在您已瞭解如何取得用來朗讀文字的語音名稱，接著來建立一些合成語音吧！

建立要求

在這個步驟您要使用 Cloud Shell 編輯器，在名為 synthesize-text.json 的文字檔中建立 Text-to-Speech API 要求。

在 Cloud Shell 中點選「開啟編輯器」，開啟 Cloud Shell 編輯器。
在動作列中，依序點選「File」>「New File」。
在「Select File Type or Enter File Name」欄位中，輸入 synthesize-text.json。
按一下「OK」。
將下列程式碼新增至 synthesize-text.json：

{ 'input':{ 'text':'Cloud Text-to-Speech API allows developers to include natural-sounding, synthetic human speech as playable audio in their applications. The Text-to-Speech API converts text or Speech Synthesis Markup Language (SSML) input into audio data like MP3 or LINEAR16 (the encoding used in WAV files).' }, 'voice':{ 'languageCode':'en-gb', 'name':'en-GB-Standard-A', 'ssmlGender':'FEMALE' }, 'audioConfig':{ 'audioEncoding':'MP3' } }

依序點選「File」>「Save」來儲存檔案。

JSON 格式的要求提供三個物件：

input：提供要轉換成合成語音的文字。
voice：用來指定合成語音所用聲音的物件。
audioConfig：指示 Text-to-Speech API 要傳回的音訊編碼。

點選「開啟終端機」返回終端機。
呼叫 Text-to-Speech API：

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @synthesize-text.json "https://texttospeech.googleapis.com/v1/text:synthesize" \ > synthesize-text.txt

這項呼叫的輸出內容會儲存至名叫 synthesize-text.txt 的檔案。

點選「開啟編輯器」，開啟 synthesize-text.txt 檔案。請注意，Text-to-Speech API 會將音訊輸出內容以 base64 編碼文字的形式，指派給 audioContent 欄位，如下所示：

{ "audioContent": "//NExAASGoHwABhGudEACdzqFXfRE4EY3AACkD/zX4ADf/6J/[...]" }

轉換回覆內容

如要將回覆內容轉換成音訊，請選取 synthesize-text.txt 中包含的音訊資料，並解碼成音訊檔案 (本實驗室將使用 MP3 檔案)。操作方法有很多種，本實驗室提供的是一些簡單的 Python 程式碼。

在動作列中，依序點選「File」>「New File」。
在「Select File Type or Enter File Name」欄位中，輸入 tts_decode.py 並按下 Enter 鍵。
按一下「OK」。
在 tts_decode.py 中輸入或複製下列內容：

import argparse from base64 import decodebytes import json """ Usage: python tts_decode.py --input "synthesize-text.txt" \ --output "synthesize-text-audio.mp3" """ def decode_tts_output(input_file, output_file): """ Decode output from Cloud Text-to-Speech. input_file: the response from Cloud Text-to-Speech output_file: the name of the audio file to create """ with open(input_file) as input: response = json.load(input) audio_data = response['audioContent'] with open(output_file, "wb") as new_file: new_file.write(decodebytes(audio_data.encode('utf-8'))) if __name__ == '__main__': parser = argparse.ArgumentParser( description="Decode output from Cloud Text-to-Speech", formatter_class=argparse.RawDescriptionHelpFormatter) parser.add_argument('--input', help='The response from the Text-to-Speech API.', required=True) parser.add_argument('--output', help='The name of the audio file to create', required=True) args = parser.parse_args() decode_tts_output(args.input, args.output)

儲存 tts_decode.py。

建立並執行音訊檔案

如要從 Text-to-Speech API 收到的回覆建立音訊檔案，請按照下列步驟操作：

點選「Open Terminal」(開啟終端機) 返回 Cloud Shell，
建立音訊檔案 synthesize-text-audio.mp3：

python tts_decode.py --input "synthesize-text.txt" --output "synthesize-text-audio.mp3"

按一下「Open Editor」(開啟編輯器) 開啟 Cloud Shell 編輯器。
點選左側窗格的 synthesize-text-audio.mp3，開啟檔案。
播放音訊即可聆聽檔案。您會聽到合成語音說出您提供的文字！

工作 6：從 SSML 建立合成語音

除了提供文字之外，您也可以用語音合成標記語言 (SSML) 的形式，向 Text-to-Speech API 提供輸入內容。SSML 透過 XML 格式，定義合成語音的呈現方式。使用 SSML 輸入內容，您可以更精確地控制合成語音輸出內容，表現出停頓、強調、發音、音調、速度和其他特質。

建立要求

首先，在名為 synthesize-ssml.json 的文字檔中，建立 Text-to-Speech API 要求。

在 Cloud Shell 編輯器的動作列中，依序點選「File」>「New File」。
在「Select File Type or Enter File Name」欄位中，輸入 synthesize-ssml.json，然後按下 Enter 鍵。
按一下「OK」。
在 synthesize-ssml.json 中輸入或貼上下列內容：

{ 'input':{ 'ssml':'<speak><s> <emphasis level="moderate">Cloud Text-to-Speech API</emphasis> allows developers to include natural-sounding <break strength="x-weak"/> synthetic human speech as playable audio in their applications.</s> <s>The Text-to-Speech API converts text or <prosody rate="slow">Speech Synthesis Markup Language</prosody> <say-as interpret-as=\"characters\">SSML</say-as> input into audio data like <say-as interpret-as=\"characters\">MP3</say-as> or LINEAR16 <break strength="weak"/> (the encoding used in WAV files).</s></speak>' }, 'voice':{ 'languageCode':'en-gb', 'name':'en-GB-Standard-A', 'ssmlGender':'FEMALE' }, 'audioConfig':{ 'audioEncoding':'MP3' } }

你會發現這次要傳送的 JSON 酬載 input 物件，包含一些不同的內容：

input 物件沒有 text 欄位，而是 ssml 欄位。
ssml 欄位包含 XML 格式的內容，並以 <speak> 元素做為根元素。

輸入內容的 XML 表示法中，每個元素都會影響合成語音的輸出結果。

具體來說，這個範例中的元素有以下效果：

<s> 包含一個句子。
<emphasis> 會強調括住的字詞或詞組。
<break> 會在語音中插入停頓。
<prosody> 可根據 rate、pitch 或 volume 屬性，自訂標記文字的音調、語速或音量。
<say-as> 可提供更多指引，說明如何解讀並說出括住的文字，例如：說出一連串數字時，應使用序數還是基數。
 會指定替代值，用於朗讀括住的文字。

注意：如要查看 Cloud Text-to-Speech 支援的 SSML 元素完整清單，請參閱 SSML 參考資料。

點選「開啟終端機」，開啟 Cloud Shell 終端機。
呼叫 Text-to-Speech API，並將輸出內容儲存至名為 synthesize-ssml.txt 的檔案：

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @synthesize-ssml.json "https://texttospeech.googleapis.com/v1/text:synthesize" \ > synthesize-ssml.txt

建立並執行音訊檔案

同樣地，您必須先解碼 Text-to-Speech API 的輸出內容，才能聽到音訊。

點選「開啟終端機」。執行下列指令，使用先前建立的 tts_decode.py 公用程式，生成名為 synthesize-ssml-audio.mp3 的音訊檔案：

python tts_decode.py --input "synthesize-ssml.txt" --output "synthesize-ssml-audio.mp3"

點選「開啟編輯器」。
點選左側窗格的 synthesize-ssml-audio.mp3，然後播放嵌入的音訊檔案。

注意 SSML 輸出內容和文字檔案輸出內容的差異：雖然兩個音訊檔案說的字詞相同，但 SSML 輸出內容的朗讀方式略有不同，會加入停頓，或是以不同方式說出縮寫字詞。

工作 7：設定音訊輸出和裝置設定檔

除了使用 SSML，您還能進一步自訂 Text-to-Speech API 產生的合成語音輸出內容。您可以指定其他音訊編碼、變更音訊輸出的音調，甚至要求針對特定類型的硬體最佳化輸出內容。

建立要求

在名為 synthesize-with-settings.json 的文字檔中，建立 Text-to-Speech API 要求：

在動作列中，依序點選「File」>「New File」。
在「Select File Type or Enter File Name」欄位中，輸入 synthesize-with-settings.json，然後按下 Enter 鍵。
按一下「OK」。
在 synthesize-with-settings.json 中輸入或貼上下列內容：

{ 'input':{ 'text':'The Text-to-Speech API is ideal for any application that plays audio of human speech to users. It allows you to convert arbitrary strings, words, and sentences into the sound of a person speaking the same things.' }, 'voice':{ 'languageCode':'en-us', 'name':'en-GB-Standard-A', 'ssmlGender':'FEMALE' }, 'audioConfig':{ 'speakingRate': 1.15, 'pitch': -2, 'audioEncoding':'OGG_OPUS', 'effectsProfileId': ['headphone-class-device'] } }

儲存檔案。

你會發現 audioConfig 物件現在包含了其他欄位：

speakingRate 欄位會指定說話者朗讀語音的速度。1.0 是正常語速，0.5 是半速，2.0 是兩倍速。
pitch 欄位會指定說出字詞時的音調差異。這裡的值指定要升 (正數)、降 (負數) 多少個半音來唸出字詞。
audioEncoding 欄位會指定資料要使用的音訊編碼。這個欄位可接受的值為 LINEAR16、MP3 和 OGG_OPUS。
effectsProfileId 欄位會要求 Text-to-Speech API 針對特定播放裝置，最佳化音訊輸出內容。API 會將預先定義的音訊設定檔套用至輸出內容，以提升指定類別裝置的音訊品質。

注意：音訊設定檔功能目前為 Beta 版。請參閱這份指南，瞭解如何在應用程式中使用這項功能。本實驗室介紹的其他所有設定皆為正式發布版，可在應用程式中正常使用。

點選「開啟終端機」。
呼叫 Text-to-Speech API：

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @synthesize-with-settings.json "https://texttospeech.googleapis.com/v1beta1/text:synthesize" \ > synthesize-with-settings.txt

這個呼叫的輸出內容會儲存至名叫 synthesize-with-settings.txt 的檔案。

建立並執行音訊檔案

執行下列指令，從 Text-to-Speech API 收到的輸出內容生成名為 synthesize-with-settings-audio.mp3 的音訊檔案：

python tts_decode.py --input "synthesize-with-settings.txt" --output "synthesize-with-settings-audio.ogg"

按一下「開啟編輯器」，開啟 Cloud Shell 編輯器。
點選左側窗格中 synthesize-with-settings-audio.mp3 開啟檔案，然後播放音訊。
播放 synthesize-with-settings-audio.mp3 音訊檔案，以及您在本實驗室中建立的其他音訊檔案。你會注意到這段音訊的語速比其他音訊快一點，音調也較低。

恭喜！

您已瞭解如何使用 Cloud Text-to-Speech API 製作合成語音。您在本實驗室中學會了以下內容：

列出 Text-to-Speech API 提供的所有合成語音
建立 Text-to-Speech API 要求，並使用 curl 呼叫 API，同時提供文字和 SSML
設定音訊輸出，包括指定音訊播放的裝置設定檔

後續步驟/瞭解詳情

前往 cloud.google.com 查看 Text-to-Speech API 的詳細說明文件。
瞭解如何使用 Text-to-Speech API 的用戶端程式庫製作合成語音。

Google Cloud 教育訓練與認證

協助您瞭解如何充分運用 Google Cloud 的技術。我們的課程會介紹專業技能和最佳做法，讓您可以快速掌握要領並持續進修。我們提供從基本到進階等級的訓練課程，並有隨選、線上和虛擬課程等選項，方便您抽空參加。認證可協助您驗證及證明自己在 Google Cloud 技術方面的技能和專業知識。

使用手冊上次更新日期：2025 年 9 月 5 日

實驗室上次測試日期：2024 年 11 月 4 日

有語音可聽！使用 Text-to-Speech 製作合成語音

有語音可聽！使用 Text-to-Speech 製作合成語音

GSP1040

總覽

課程內容

設定和需求

瞭解以下事項後，再點選「Start Lab」按鈕

如何開始研究室及登入 Google Cloud 控制台

啟動 Cloud Shell

設定專案區域

工作 1：啟用 Text-to-Speech API

工作 2：建立虛擬環境

工作 3：建立服務帳戶

工作 4：取得可用的語音清單

工作 5：根據文字合成語音

建立要求

轉換回覆內容

建立並執行音訊檔案

工作 6：從 SSML 建立合成語音

建立要求

建立並執行音訊檔案

工作 7：設定音訊輸出和裝置設定檔

建立要求

建立並執行音訊檔案

恭喜！

後續步驟/瞭解詳情

Google Cloud 教育訓練與認證

准备工作

使用无痕浏览模式

登录控制台

使用无痕浏览模式运行实验