Gemini 是 Google 开发的生成式 AI 模型系列,专为多模态应用场景而设计。Gemini API 提供针对特定应用场景进行了优化的不同模型。Gemini Pro 是 Google 旗下回答准确性最高、性能最优的强大模型,而 Gemini Flash 是延迟时间短、性能更强的多模态模型。Gemini Flash-Lite 模型经过优化,提高了成本效益并缩短了延迟时间。如需详细了解这些模型变体,请参阅此文档。
在 Google Cloud 上,Vertex AI Gemini API 提供了与多种 Gemini 模型交互的统一界面。该 API 支持将多模态提示(音频、图片、视频、文本)作为输入,并且可以生成文本、图片和音频输出。Imagen 模型可根据文本生成图片。
Vertex AI 是一个机器学习 (ML) 平台,用于训练和部署 ML 模型和 AI 应用,以及自定义大语言模型 (LLM),以在依托 AI 技术的应用中使用。借助 Vertex AI,您可通过全面的数据控制功能来自定义 Gemini,并充分利用 Google Cloud 针对企业安全、隐私以及数据治理与合规性提供的其他功能。如需详细了解 Vertex AI,请查看本实验末尾后续步骤部分提供的链接。
在本实验中,您将使用 Vertex AI SDK for Python 来调用 Vertex AI Gemini API。
目标
在本实验中,您将学习如何执行以下任务:
使用 Streamlit 框架开发 Python 应用。
安装 Vertex AI SDK for Python。
使用 Vertex AI Gemini API 开发与 Gemini 模型 (gemini-2.0-flash) 进行交互的代码。
将应用容器化,并在 Cloud Run 上部署和测试该应用。
设置
对于每个实验,您都会免费获得一个新的 Google Cloud 项目及一组资源,它们都有固定的使用时限。
cat > ~/gemini-app/app_tab1.py <<EOF
import streamlit as st
from vertexai.preview.generative_models import GenerativeModel
from response_utils import *
import logging
# create the model prompt based on user input.
def generate_prompt():
# Story character input
character_name = st.text_input("Enter character name: \n\n",key="character_name",value="Mittens")
character_type = st.text_input("What type of character is it? \n\n",key="character_type",value="Cat")
character_persona = st.text_input("What personality does the character have? \n\n",
key="character_persona",value="Mitten is a very friendly cat.")
character_location = st.text_input("Where does the character live? \n\n",key="character_location",value="Andromeda Galaxy")
# Story length and premise
length_of_story = st.radio("Select the length of the story: \n\n",["Short","Long"],key="length_of_story",horizontal=True)
story_premise = st.multiselect("What is the story premise? (can select multiple) \n\n",["Love","Adventure","Mystery","Horror","Comedy","Sci-Fi","Fantasy","Thriller"],key="story_premise",default=["Love","Adventure"])
creative_control = st.radio("Select the creativity level: \n\n",["Low","High"],key="creative_control",horizontal=True)
if creative_control == "Low":
temperature = 0.30
else:
temperature = 0.95
prompt = f"""Write a {length_of_story} story based on the following premise: \n
character_name: {character_name} \n
character_type: {character_type} \n
character_persona: {character_persona} \n
character_location: {character_location} \n
story_premise: {",".join(story_premise)} \n
If the story is "short", then make sure to have 5 chapters or else if it is "long" then 10 chapters.
Important point is that each chapter should be generated based on the premise given above.
First start by giving the book introduction, chapter introductions and then each chapter. It should also have a proper ending.
The book should have a prologue and an epilogue.
"""
return temperature, prompt
# 用于呈现“Story”(故事)标签页、调用模型,并显示模型提示和回答的函数。
def render_story_tab (text_model: GenerativeModel):
st.subheader("Generate a story")
temperature, prompt = generate_prompt()
config = {
"temperature": temperature,
"max_output_tokens": 2048,
}
generate_t2t = st.button("Generate my story", key="generate_t2t")
if generate_t2t and prompt:
# st.write(prompt)
with st.spinner("Generating your story using Gemini..."):
first_tab1, first_tab2 = st.tabs(["Story response", "Prompt"])
with first_tab1:
response = get_gemini_text_response(text_model, prompt, generation_config=config)
if response:
st.write("Your story:")
st.write(response)
logging.info(response)
with first_tab2:
st.text(prompt)
EOF
cat > ~/gemini-app/app_tab2.py <<EOF
import streamlit as st
from vertexai.preview.generative_models import GenerativeModel
from response_utils import *
import logging
# 根据用户输入的内容创建模型提示。
def generate_prompt():
product_name = st.text_input("What is the name of the product? \n\n",key="product_name",value="ZomZoo")
product_category = st.radio("Select your product category: \n\n",["Clothing","Electronics","Food","Health & Beauty","Home & Garden"],key="product_category",horizontal=True)
st.write("Select your target audience: ")
target_audience_age = st.radio("Target age: \n\n",["18-24","25-34","35-44","45-54","55-64","65+"],key="target_audience_age",horizontal=True)
# target_audience_gender = st.radio("Target gender: \n\n",["male","female","trans","non-binary","others"],key="target_audience_gender",horizontal=True)
target_audience_location = st.radio("Target location: \n\n",["Urban", "Suburban","Rural"],key="target_audience_location",horizontal=True)
st.write("Select your marketing campaign goal: ")
campaign_goal = st.multiselect("Select your marketing campaign goal: \n\n",["Increase brand awareness","Generate leads","Drive sales","Improve brand sentiment"],key="campaign_goal",default=["Increase brand awareness","Generate leads"])
if campaign_goal is None:
campaign_goal = ["Increase brand awareness","Generate leads"]
brand_voice = st.radio("Select your brand voice: \n\n",["Formal","Informal","Serious","Humorous"],key="brand_voice",horizontal=True)
estimated_budget = st.radio("Select your estimated budget ($): \n\n",["1,000-5,000","5,000-10,000","10,000-20,000","20,000+"],key="estimated_budget",horizontal=True)
prompt = f"""Generate a marketing campaign for {product_name}, a {product_category} designed for the age group: {target_audience_age}.
The target location is this: {target_audience_location}.
Aim to primarily achieve {campaign_goal}.
Emphasize the product's unique selling proposition while using a {brand_voice} tone of voice.
Allocate the total budget of {estimated_budget}.
With these inputs, make sure to follow following guidelines and generate the marketing campaign with proper headlines: \n
- Briefly describe the company, its values, mission, and target audience.
- Highlight any relevant brand guidelines or messaging frameworks.
- Provide a concise overview of the campaign's objectives and goals.
- Briefly explain the product or service being promoted.
- Define your ideal customer with clear demographics, psychographics, and behavioral insights.
- Understand their needs, wants, motivations, and pain points.
- Clearly articulate the desired outcomes for the campaign.
- Use SMART goals (Specific, Measurable, Achievable, Relevant, and Time-bound) for clarity.
- Define key performance indicators (KPIs) to track progress and success.
- Specify the primary and secondary goals of the campaign.
- Examples include brand awareness, lead generation, sales growth, or website traffic.
- Clearly define what differentiates your product or service from competitors.
- Emphasize the value proposition and unique benefits offered to the target audience.
- Define the desired tone and personality of the campaign messaging.
- Identify the specific channels you will use to reach your target audience.
- Clearly state the desired action you want the audience to take.
- Make it specific, compelling, and easy to understand.
- Identify and analyze your key competitors in the market.
- Understand their strengths and weaknesses, target audience, and marketing strategies.
- Develop a differentiation strategy to stand out from the competition.
- Define how you will track the success of the campaign.
- Use relevant KPIs to measure performance and return on investment (ROI).
Provide bullet points and headlines for the marketing campaign. Do not produce any empty lines. Be very succinct and to the point.
"""
return prompt
# 用于呈现“Story”(故事)标签页、调用模型,并显示模型提示和回答的函数。
def render_mktg_campaign_tab (text_model: GenerativeModel):
st.subheader("Generate a marketing campaign")
prompt = generate_prompt()
config = {
"temperature": 0.8,
"max_output_tokens": 2048,
}
generate_t2m = st.button("Generate campaign", key="generate_t2m")
if generate_t2m and prompt:
# st.write(prompt)
with st.spinner("Generating a marketing campaign using Gemini..."):
first_tab1, first_tab2 = st.tabs(["Campaign response", "Prompt"])
with first_tab1:
response = get_gemini_text_response(text_model, prompt, generation_config=config)
if response:
st.write("Marketing campaign:")
st.write(response)
logging.info(response)
with first_tab2:
st.text(prompt)
EOF
Gemini 2.0 Flash 模型还可以提供有关视频内容的描述。在此子任务中,您将实现 Video Playground(视频园地)标签页的代码,以及与模型交互以生成视频描述的代码。
如需编写在应用界面中呈现 Video Playground(视频园地)标签页的代码,请运行以下命令:
cat > ~/gemini-app/app_tab3.py <<EOF
import streamlit as st
from vertexai.preview.generative_models import GenerativeModel, Part
from response_utils import *
import logging
# render the Video Playground tab with multiple child tabs
def render_video_playground_tab(multimodal_model: GenerativeModel):
video_desc, video_tags, video_highlights, video_geoloc = st.tabs(["Video description", "Video tags", "Video highlights", "Video geolocation"])
with video_desc:
video_desc_uri = "gs://cloud-training/OCBL447/gemini-app/videos/mediterraneansea.mp4"
video_desc_url = "https://storage.googleapis.com/"+video_desc_uri.split("gs://")[1]
video_desc_vid = Part.from_uri(video_desc_uri, mime_type="video/mp4")
st.video(video_desc_url)
st.write("Generate a description of the video.")
prompt = """Describe what is happening in the video and answer the following questions: \n
- What am I looking at?
- Where should I go to see it?
- What are other top 5 places in the world that look like this?
"""
tab1, tab2 = st.tabs(["Response", "Prompt"])
video_desc_description = st.button("Generate video description", key="video_desc_description")
with tab1:
if video_desc_description and prompt:
with st.spinner("Generating video description"):
response = get_gemini_vision_response(multimodal_model, [prompt, video_desc_vid])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.write(prompt,"\n","{video_data}")
EOF
如需为应用添加 tab3 - Video Playgroud(视频园地)的界面内容,请运行以下命令:
cat >> ~/gemini-app/app.py <<EOF
from app_tab3 import render_video_playground_tab
with tab3:
render_video_playground_tab(multimodal_model)
EOF
测试应用 - Video Playground(视频园地)标签页
如需重新加载并查看更新后的应用界面,请刷新显示应用的浏览器标签页或窗口。
点击 Video Playground(视频园地),然后点击 Video description(视频描述)。
该标签页将显示关于某个地点的视频。点击播放视频。
等待视频播放完毕,然后再继续执行下一步。
点击 Generate video description(生成视频描述)。
如果您看到以下错误提示:google.api_core.exceptions.FailedPrecondition: 400 Service agents are being provisioned (https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents). Service agents are needed to read the Cloud Storage file provided. So please try again in a few minutes.,请等待几分钟,然后再次点击 Generate video description(生成视频描述)。
如需在应用界面的 Video Playground(视频园地)标签页中实现 Video tags(视频标签)标签页的代码,请运行以下命令:
cat >> ~/gemini-app/app_tab3.py <<EOF
with video_tags:
video_tags_uri = "gs://cloud-training/OCBL447/gemini-app/videos/photography.mp4"
video_tags_url = "https://storage.googleapis.com/"+video_tags_uri.split("gs://")[1]
video_tags_vid = Part.from_uri(video_tags_uri, mime_type="video/mp4")
st.video(video_tags_url)
st.write("Generate tags for the video.")
prompt = """Answer the following questions using the video only:
1. What is in the video?
2. What objects are in the video?
3. What is the action in the video?
4.Provide 5 best tags for this video?
Write the answer in table format with the questions and answers in columns.
"""
tab1, tab2 = st.tabs(["Response", "Prompt"])
video_tags_desc = st.button("Generate video tags", key="video_tags_desc")
with tab1:
if video_tags_desc and prompt:
with st.spinner("Generating video tags"):
response = get_gemini_vision_response(multimodal_model, [prompt, video_tags_vid])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.write(prompt,"\n","{video_data}")
EOF
上面的代码构建了 Video tags(视频标签)标签页的界面。结合使用视频和相应文本,提示模型生成标签并回答有关视频中场景的问题。
测试应用 - Video Playground(视频园地)- Video tags(视频标签)标签页
更新 Video Playground(视频园地)标签页 - Video highlights(视频重要信息)
如需在应用界面的 Video Playground(视频园地)标签页中实现 Video highlights(视频重要信息)标签页的代码,请运行以下命令:
cat >> ~/gemini-app/app_tab3.py <<EOF
with video_highlights:
video_highlights_uri = "gs://cloud-training/OCBL447/gemini-app/videos/pixel8.mp4"
video_highlights_url = "https://storage.googleapis.com/"+video_highlights_uri.split("gs://")[1]
video_highlights_vid = Part.from_uri(video_highlights_uri, mime_type="video/mp4")
st.video(video_highlights_url)
st.write("Generate highlights for the video.")
prompt = """Answer the following questions using the video only:
What is the profession of the girl in this video?
Which features of the phone are highlighted here?
Summarize the video in one paragraph.
"""
tab1, tab2 = st.tabs(["Response", "Prompt"])
video_highlights_description = st.button("Generate video highlights", key="video_highlights_description")
with tab1:
if video_highlights_description and prompt:
with st.spinner("Generating video highlights"):
response = get_gemini_vision_response(multimodal_model, [prompt, video_highlights_vid])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.write(prompt,"\n","{video_data}")
EOF
上面的代码构建了 Video highlights(视频重要信息)标签页的界面。结合使用视频和相应文本,提示模型根据视频生成重要信息。
测试应用 - Video playground(视频园地)- Video highlights(视频重要信息)标签页
如需重新加载并查看更新后的应用界面,请刷新显示应用的浏览器标签页或窗口。
点击 Video Playground(视频园地),然后点击 Video highlights(视频重要信息)。
更新 Video Playground(视频园地)标签页 - Video geolocation(视频地理定位)
如需在应用界面的 Video Playground(视频园地)标签页中实现 Video geolocation(视频地理定位)标签页的代码,请运行以下命令:
cat >> ~/gemini-app/app_tab3.py <<EOF
with video_geoloc:
video_geolocation_uri = "gs://cloud-training/OCBL447/gemini-app/videos/bus.mp4"
video_geolocation_url = "https://storage.googleapis.com/"+video_geolocation_uri.split("gs://")[1]
video_geolocation_vid = Part.from_uri(video_geolocation_uri, mime_type="video/mp4")
st.video(video_geolocation_url)
st.markdown("""Answer the following questions from the video:
- What is this video about?
- How do you know which city it is?
- What street is this?
- What is the nearest intersection?
""")
prompt = """Answer the following questions using the video only:
What is this video about?
How do you know which city it is?
What street is this?
What is the nearest intersection?
"""
tab1, tab2 = st.tabs(["Response", "Prompt"])
video_geolocation_description = st.button("Generate", key="video_geolocation_description")
with tab1:
if video_geolocation_description and prompt:
with st.spinner("Generating location information"):
response = get_gemini_vision_response(multimodal_model, [prompt, video_geolocation_vid])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.write(prompt,"\n","{video_data}")
EOF
上面的代码构建了 Video geolocation(视频地理定位)标签页的界面。结合使用视频和相应文本,提示模型回答有关视频的问题,其中包括有关视频所呈现实体的位置信息。
测试应用 - Video playground(视频园地)- Video geolocation(视频地理定位)标签页
如需重新加载并查看更新后的应用界面,请刷新显示应用的浏览器标签页或窗口。
点击 Video Playground(视频园地),然后点击 Video geolocation(视频地理定位)。
cat > ~/gemini-app/app_tab4.py <<EOF
import streamlit as st
from vertexai.preview.generative_models import GenerativeModel, Part
from response_utils import *
import logging
# render the Image Playground tab with multiple child tabs
def render_image_playground_tab(multimodal_model: GenerativeModel):
recommendations, screens, diagrams, equations = st.tabs(["Furniture recommendation", "Oven instructions", "ER diagrams", "Math reasoning"])
with recommendations:
room_image_uri = "gs://cloud-training/OCBL447/gemini-app/images/living_room.jpeg"
chair_1_image_uri = "gs://cloud-training/OCBL447/gemini-app/images/chair1.jpeg"
chair_2_image_uri = "gs://cloud-training/OCBL447/gemini-app/images/chair2.jpeg"
chair_3_image_uri = "gs://cloud-training/OCBL447/gemini-app/images/chair3.jpeg"
chair_4_image_uri = "gs://cloud-training/OCBL447/gemini-app/images/chair4.jpeg"
room_image_url = "https://storage.googleapis.com/"+room_image_uri.split("gs://")[1]
chair_1_image_url = "https://storage.googleapis.com/"+chair_1_image_uri.split("gs://")[1]
chair_2_image_url = "https://storage.googleapis.com/"+chair_2_image_uri.split("gs://")[1]
chair_3_image_url = "https://storage.googleapis.com/"+chair_3_image_uri.split("gs://")[1]
chair_4_image_url = "https://storage.googleapis.com/"+chair_4_image_uri.split("gs://")[1]
room_image = Part.from_uri(room_image_uri, mime_type="image/jpeg")
chair_1_image = Part.from_uri(chair_1_image_uri,mime_type="image/jpeg")
chair_2_image = Part.from_uri(chair_2_image_uri,mime_type="image/jpeg")
chair_3_image = Part.from_uri(chair_3_image_uri,mime_type="image/jpeg")
chair_4_image = Part.from_uri(chair_4_image_uri,mime_type="image/jpeg")
st.image(room_image_url,width=350, caption="Image of a living room")
st.image([chair_1_image_url,chair_2_image_url,chair_3_image_url,chair_4_image_url],width=200, caption=["Chair 1","Chair 2","Chair 3","Chair 4"])
st.write("Our expectation: Recommend a chair that would complement the given image of a living room.")
prompt_list = ["Consider the following chairs:",
"chair 1:", chair_1_image,
"chair 2:", chair_2_image,
"chair 3:", chair_3_image, "and",
"chair 4:", chair_4_image, "\n"
"For each chair, explain why it would be suitable or not suitable for the following room:",
room_image,
"Only recommend for the room provided and not other rooms. Provide your recommendation in a table format with chair name and reason as columns.",
]
tab1, tab2 = st.tabs(["Response", "Prompt"])
generate_image_description = st.button("Generate recommendation", key="generate_image_description")
with tab1:
if generate_image_description and prompt_list:
with st.spinner("Generating recommendation using Gemini..."):
response = get_gemini_vision_response(multimodal_model, prompt_list)
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.text(prompt_list)
EOF
如果您看到以下错误提示:google.api_core.exceptions.FailedPrecondition: 400 Service agents are being provisioned (https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents). Service agents are needed to read the Cloud Storage file provided. So please try again in a few minutes.,请等待几分钟,然后再次点击 Generate recommendation(生成建议)。
cat >> ~/gemini-app/app_tab4.py <<EOF
with screens:
oven_screen_uri = "gs://cloud-training/OCBL447/gemini-app/images/oven.jpg"
oven_screen_url = "https://storage.googleapis.com/"+oven_screen_uri.split("gs://")[1]
oven_screen_img = Part.from_uri(oven_screen_uri, mime_type="image/jpeg")
st.image(oven_screen_url, width=350, caption="Image of an oven control panel")
st.write("Provide instructions for resetting the clock on this appliance in English")
prompt = """How can I reset the clock on this appliance? Provide the instructions in English.
If instructions include buttons, also explain where those buttons are physically located.
"""
tab1, tab2 = st.tabs(["Response", "Prompt"])
generate_instructions_description = st.button("Generate instructions", key="generate_instructions_description")
with tab1:
if generate_instructions_description and prompt:
with st.spinner("Generating instructions using Gemini..."):
response = get_gemini_vision_response(multimodal_model, [oven_screen_img, prompt])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.text(prompt+"\n"+"input_image")
EOF
如需在应用界面的 Image Playground(图片园地)标签页中实现 ER diagrams(实体关系图表)标签页的代码,请运行以下命令:
cat >> ~/gemini-app/app_tab4.py <<EOF
with diagrams:
er_diag_uri = "gs://cloud-training/OCBL447/gemini-app/images/er.png"
er_diag_url = "https://storage.googleapis.com/"+er_diag_uri.split("gs://")[1]
er_diag_img = Part.from_uri(er_diag_uri,mime_type="image/png")
st.image(er_diag_url, width=350, caption="Image of an ER diagram")
st.write("Document the entities and relationships in this ER diagram.")
prompt = """Document the entities and relationships in this ER diagram."""
tab1, tab2 = st.tabs(["Response", "Prompt"])
er_diag_img_description = st.button("Generate documentation", key="er_diag_img_description")
with tab1:
if er_diag_img_description and prompt:
with st.spinner("Generating..."):
response = get_gemini_vision_response(multimodal_model,[er_diag_img,prompt])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.text(prompt+"\n"+"input_image")
EOF
上面的代码构建了 ER diagrams(实体关系图表)标签页的界面。结合使用 ER 图表的图片和相应文本,提示模型生成有关图表中的实体及相应关系的文档。
测试应用 - Image Playground(图片园地)- ER diagrams(实体关系图表)标签页
如需重新加载并查看更新后的应用界面,请刷新显示应用的浏览器标签页或窗口。
点击 Image Playground(图片园地),然后点击 ER diagrams(实体关系图表)。
更新 Image Playground(图片园地)标签页 - Math reasoning(数学推理)
在此任务中,您将使用 Gemini 2.0 Flash 模型从图片中提取并解析数学公式。
如需在应用界面的 Image Playground(图片园地)标签页中实现 Math reasoning(数学推理)标签页的代码,请运行以下命令:
cat >> ~/gemini-app/app_tab4.py <<EOF
with equations:
math_image_uri = "gs://cloud-training/OCBL447/gemini-app/images/math_eqn.jpg"
math_image_url = "https://storage.googleapis.com/"+math_image_uri.split("gs://")[1]
math_image_img = Part.from_uri(math_image_uri,mime_type="image/jpeg")
st.image(math_image_url,width=350, caption="Image of a math equation")
st.markdown(f"""
Ask questions about the math equation as follows:
- Extract the formula.
- What is the symbol right before Pi? What does it mean?
- Is this a famous formula? Does it have a name?
""")
prompt = """Follow the instructions. Surround math expressions with $. Use a table with a row for each instruction and its result.
INSTRUCTIONS:
- Extract the formula.
- What is the symbol right before Pi? What does it mean?
- Is this a famous formula? Does it have a name?
"""
tab1, tab2 = st.tabs(["Response", "Prompt"])
math_image_description = st.button("Generate answers", key="math_image_description")
with tab1:
if math_image_description and prompt:
with st.spinner("Generating answers for formula using Gemini..."):
response = get_gemini_vision_response(multimodal_model, [math_image_img, prompt])
st.markdown(response)
logging.info(response)
with tab2:
st.write("Prompt used:")
st.text(prompt)
EOF
上面的代码构建了 Math reasoning(数学推理)标签页的界面。结合使用数学方程的图片和相应文本,提示模型生成方程的答案及其他特征。
测试应用 - Image Playground(图片园地)- Math reasoning(数学推理)标签页
如需重新加载并查看更新后的应用界面,请刷新显示应用的浏览器标签页或窗口。
点击 Image Playground(图片园地),然后点击 Math reasoning(数学推理)。
SERVICE_NAME='gemini-app-playground' # Name of your Cloud Run service.
AR_REPO='gemini-app-repo' # Name of your repository in Artifact Registry that stores your application container image.
echo "SERVICE_NAME=${SERVICE_NAME}"
echo "AR_REPO=${AR_REPO}"