from google.cloud import bigquery
from google.cloud import aiplatform
import bigframes.pandas as bpd
import pandas as pd
from vertexai.language_models._language_models import TextGenerationModel
from bigframes.ml.cluster import KMeans
from bigframes.ml.model_selection import train_test_split
%%bigquery
CREATE OR REPLACE TABLE ecommerce.customer_stats AS
SELECT
user_id,
DATE_DIFF(CURRENT_DATE(), CAST(MAX(order_created_date) AS DATE), day) AS days_since_last_order, ---RECENCY
COUNT(order_id) AS count_orders, --FREQUENCY
AVG(sale_price) AS average_spend --MONETARY
FROM (
SELECT
user_id,
order_id,
sale_price,
created_at AS order_created_date
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE
created_at
BETWEEN '2022-01-01' AND '2023-01-01'
)
GROUP BY user_id;
1. Split df (using random state and test size 0.2) into test and training data for a K-means clustering algorithm store these as df_test and df_train. 2. Create a K-means cluster model using bigframes.ml.cluster KMeans with 5 clusters. 3. Save the model using the to_gbq method where the model name is project_id.dataset_name.model_name.
点击生成。您将会看到类似于以下内容的输出:
#prompt: 1. Split df (using random state and test size 0.2) into test and training data for a K-means clustering algorithm store these as df_test and df_train. 2. Create a K-means cluster model using bigframes.ml.cluster KMeans with 5 clusters. 3. Save the model using the to_gbq method where the model name is project_id.dataset_name.model_name.
df_train, df_test = train_test_split(bq_df, test_size=0.2, random_state = 42)
kmeans = KMeans(n_clusters=5)
kmeans.fit(df_train)
kmeans.to_gbq(f"{project_id}.{dataset_name}.{model_name}")
1. Call the K-means prediction model on the df dataframe, and store the results as predictions_df and show the first 10 records.
点击生成。您将会看到类似于以下内容的输出:
# prompt: 1. Call the K-means prediction model on the df dataframe, and store the results as predictions_df and show the first 10 records.
predictions_df = kmeans.predict(df_test)
predictions_df.head(10)
1. Using predictions_df, and matplotlib, generate a scatterplot. 2. On the x-axis of the scatterplot, display days_since_last_order and on the y-axis, display average_spend from predictions_df. 3. Color by cluster. 4. The chart should be titled "Attribute grouped by K-means cluster."
点击生成。
您将会看到类似于以下内容的输出:
#prompt: 1. Using predictions_df, and matplotlib, generate a scatterplot. 2. On the x-axis of the scatterplot, display days_since_last_order and on the y-axis, display average_spend from predictions_df. 3. Color by cluster. 4. The chart should be titled "Attribute grouped by K-means cluster."
import matplotlib.pyplot as plt
# Create the scatter plot
plt.figure(figsize=(10, 6)) # Adjust figure size as needed
plt.scatter(predictions_df['days_since_last_order'], predictions_df['average_spend'], c=predictions_df['cluster'], cmap='viridis')
# Customize the plot
plt.title('Attribute grouped by K-means cluster')
plt.xlabel('Days Since Last Order')
plt.ylabel('Average Spend')
plt.colorbar(label='Cluster ID')
# Display the plot
plt.show()
Replace 'cluster' or 'cluster_id' with 'CENTROID_ID' in the c=predictions_df field only.
query = """
SELECT
CONCAT('cluster ', CAST(centroid_id as STRING)) as centroid,
average_spend,
count_orders,
days_since_last_order
FROM (
SELECT centroid_id, feature, ROUND(numerical_value, 2) as value
FROM ML.CENTROIDS(MODEL `{0}.{1}`)
)
PIVOT (
SUM(value)
FOR feature IN ('average_spend', 'count_orders', 'days_since_last_order')
)
ORDER BY centroid_id
""".format(dataset_name, model_name)
df_centroid = client.query(query).to_dataframe()
df_centroid.head()
运行 该单元。
您应该会看到聚类汇总在一张表中。从该表可以看出,有些聚类的平均支出较高,而有些聚类的订单数量较多。
接下来,将 DataFrame 转换成字符串,以便将其传递给您的大语言模型调用。
在笔记本末尾添加另一个代码单元。
将以下代码段粘贴到单元:
df_query = client.query(query).to_dataframe()
df_query.to_string(header=False, index=False)
cluster_info = []
for i, row in df_query.iterrows():
cluster_info.append("{0}, average spend ${2}, count of orders per person {1}, days since last order {3}"
.format(row["centroid"], row["count_orders"], row["average_spend"], row["days_since_last_order"]) )
cluster_info = (str.join("\n", cluster_info))
print(cluster_info)
运行 该单元。
输出应类似于以下内容:
cluster 1, average spend $48.32, count of orders per person 1.36, days since last order 384.37
cluster 2, average spend $202.34, count of orders per person 1.3, days since last order 482.62
cluster 3, average spend $45.68, count of orders per person 1.36, days since last order 585.4
cluster 4, average spend $44.71, count of orders per person 1.36, days since last order 466.26
cluster 5, average spend $58.08, count of orders per person 3.92, days since last order 427.36
针对营销活动定义提示
在此步骤,您要定义一个提示,以便您的大语言模型(基于 text-bison)能理解您的要求。
在笔记本末尾添加另一个代码单元。
将以下代码段粘贴到单元:
prompt = f"""
You're a creative brand strategist, given the following clusters, come up with \
creative brand persona, a catchy title, and next marketing action, \
explained step by step.
Clusters:
{cluster_info}
For each Cluster:
* Title:
* Persona:
* Next marketing step:
"""
Use the Vertex AI language_models API to call the PaLM2 text-bison model and generate a marketing campaign using the variable prompt. Use the following model settings: max_output_tokens=1024, temperature=0.4
点击生成。
您将会看到类似于以下内容的输出:
#prompt: Use the Vertex AI language_models API to call the PaLM2 text-bison model and generate a marketing campaign using the variable prompt. Use the following model settings: max_output_tokens=1024, temperature=0.4
model = TextGenerationModel.from_pretrained("text-bison@001")
response = model.predict(prompt, max_output_tokens=1024, temperature=0.4)
print(response.text)
运行 该单元。
您应该会看到类似下面的结果:
Cluster 1:
Title: "The Occasional Shoppers"
Persona: These customers are likely to be sporadic shoppers who make infrequent purchases. They may be attracted to discounts or promotions, and they may be more likely to purchase items that are on sale.
Next marketing step: Offer discounts or promotions to entice these customers to make more frequent purchases.
Cluster 2:
Title: "The Loyal Customers"
Persona: These customers are likely to be loyal to your brand and make repeat purchases. They may be more likely to spend more money on each purchase and be less likely to be swayed by competitors' offerings.
Next marketing step: Reward these customers for their loyalty with a loyalty program or exclusive discounts.
Cluster 3:
Title: "The Lapsed Customers"
Persona: These customers have not made a purchase in a long time. They may have been lost to a competitor or simply lost interest in your brand.
Next marketing step: Reach out to these customers with a special offer or promotion to win them back.
Cluster 4:
Title: "The Bargain Hunters"
Persona: These customers are likely to be motivated by price and are more likely to purchase items that are on sale. They may be less loyal to your brand and more likely to switch to a competitor if they find a better deal.
Next marketing step: Offer discounts or promotions to entice these customers to make more frequent purchases.
Cluster 5:
Title: "The Power Buyers"
Persona: These customers are likely to be your most valuable customers. They spend the most money and make the most frequent purchases. They may be more likely to be brand advocates and refer your brand to others.
Next marketing step: Reward these customers for their loyalty with a loyalty program or exclusive discounts. Additionally, ask them to refer your brand to their friends and family.
现在,您可以轻松向营销团队分享每个聚类的信息以及为他们提供的后续营销行动建议。
通常情况下,想要收集自定义的信息以分析客户的喜好、支出和购买频率,需要团队花费大量时间进行手动工作。现在,通过结合使用生成式 AI 与 BigQuery 中的数据,数据科学家在几分钟内就能完成这些工作。
如需验证是否已完成以下目标,请点击检查我的进度:
使用 text-bison 模型生成营销活动。
任务 6. 清理项目资源(可选)
在本实验中,您在 Google Cloud 控制台中创建了资源。在生产环境中,您需要从账号中移除这些资源,因为从模型中收集到数据洞见后就不再需要这些资源了。如需从账号中移除这些资源,以免因使用这些资源产生更多费用,您有两个选择:
移除项目(请参阅下面的注意事项)
移除相应资源
通过移除项目清理资源
为避免系统因本教程中使用的资源而向您的 Google Cloud 账号收取费用,您可以删除为本教程创建的 Google Cloud 项目。