
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
This lab was developed with the partner, MongoDB. Your personal information may be shared with MongoDB, the lab sponsor, if you have opted-in to receive product updates, announcements, and offers in your Account Profile.
MongoDB Atlas is a fully managed multi-cloud database service built by the MongoDB. It allows you to deploy, scale, and monitor your MongoDB database in the cloud. Additionally, Atlas comes with built-in services for workload isolation, analytics, search, and more.
MongoDB Atlas Vector Search allows you to search vector data stored in your MongoDB database. By creating an Atlas Vector Search index on your collection, you can perform vector search queries on the indexed fields. This integration enables you to store vector data alongside your other MongoDB data within the same database or even the same collection, eliminating the need to manage separate storage systems for your vector and operational data.
This hands-on lab guides you through the process of creating a chat assistant using Gemini 2.0 Flash, Langchain, Node.js, and Angular. You'll explore the limitations of out-of-context prompts and how to overcome them by implementing Retrieval Augmented Generation (RAG) with MongoDB Atlas Vector Search.
In this lab, you perform the following tasks:
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This Qwiklabs hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.
Note: If you are using a Pixelbook, open an Incognito window to run this lab.
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.
Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Open the tabs in separate windows, side-by-side.
In the Sign in page, paste the username that you copied from the Connection Details panel. Then copy and paste the password.
Important: You must use the credentials from the Connection Details panel. Do not use your Qwiklabs credentials. If you have your own Google Cloud account, do not use it for this lab (avoids incurring charges).
Click through the subsequent pages:
After a few moments, the Cloud Console opens in this tab.
Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.
In the Cloud Console, in the top right toolbar, click the Activate Cloud Shell button.
Click Continue.
It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:
gcloud
is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
You can list the active account name with this command:
(Output)
(Example output)
You can list the project ID with this command:
(Output)
(Example output)
In the Google Cloud console, type Cloud Shell Editor in the search bar on the top and open the service from the search results. Wait for the Editor to load — it may take a minute.
Open the terminal in the Cloud Shell Editor with Ctrl + Shift + ` (backtick). Alternatively, you can click on the Open Terminal button.
Clone the GitHub repository for the lab by running the following command:
Start the application by running the following command:
Wait for the application to start. The command installs the necessary Node.js and Angular packages (dependencies), builds both the server and the client applications and runs them in development mode. This means that when you make changes, the applications will rebuild and restart automatically. Once the client's running, you'll see the following message:
Open the Angular client in the browser. Click on the web preview button on the top right in Cloud Shell.
.
Select Change port, type 4200
in the input field, and click Change and preview.
The chat assistant interface will open in a new tab. Type a message in the chat box and press Enter to send the message. You should get the following response from the chat assistant:
This is because the chat assistant is not yet connected to Langchain or the Gemini 2.0 Flash API. Don't close the chatbot interface. You will use it throughout the lab.
Go to the previous tab with the Cloud Shell Editor. Click on Open Editor if you don't see the code.
From the file tree on the left, navigate to the rag-chatbot/server/src/
directory and open the server.ts
file.
If you inspect the code, you will notice that the /messages
endpoint is not implemented yet. You will implement this endpoint to send chat prompt requests to Gemini 2.0 Flash using Langchain.
/messages
endpointFirst, you need to initialize the conversational AI model. To do this, you'll use the ChatVertexAI
class imported from the @langchain/google-vertexai
package. Along with this, you'll also set the parameters for the model, such as the model name, maximum number of tokens to generate in the response, temperature, topP, and topK values. These parameters control the randomness and diversity of the output generated by the model.
After initializing the model, you'll define a history variable with a system message describing the chatbot role and ground rules.
To do this, add the following code to the server.ts
file right above the router.post("/messages"
line:
Next, replace the /messages
endpoint with the following implementation:
Navigate to the chat assistant interface tab and type a message in the chat box, for example "What are insurance policies?". Press Enter to send the message.
Go back to the Cloud Shell Editor tab and check the logs. You might be prompted to allow the Cloud Shell Editor to make authorized requests. Click Allow to proceed.
If you see an error message, you have exceeded the time limit for allowing the Cloud Shell Editor to make authorized requests. In this case, try sending another message in the chat assistant interface.
The chat assistant will respond with a message similar to the following:
Let's imagine you're a user asking about the specific coverage of their insurance policy. Type a message in the chat assistant interface, for example "What does my insurance policy cover?". Press Enter to send the message.
You are likely to get a generic response or a response asking for more details. This is because the chat assistant does not have context about your insurance policy. However, if you are already a user of the insurance company, the chat assistant should be able to provide a more accurate response based on the company's records.
Currently, the chat assistant does not have access to the user's insurance policy details and can only provide generic responses. You will address this limitation by implementing Retrieval Augmented Generation (RAG) with MongoDB Atlas Vector Search.
RAG is a grounding technique that improves the responses coming from an LLMs by augmenting the prompt with additional context. The additional context is retrieved from an external source such as a vector database containing proprietary information. In our system, the additional context will be extracted from the user's insurance policy.
RAG consists of three main phases: - Data ingestion — In this case, the proprietary data consists of PDF documents containing insurance policies. - Chunking — The PDFs will be cleaned and split into smaller, overlapping text chunks. This step will be essential for retrieving relevant information accurately. - Embedding — Each chunk will be converted to a vector embedding using Vertex AI's Text embeddings API. - Storing and Indexing — The resulting vector embeddings will be stored and indexed in a MongoDB Atlas database. - Information retrieval - When a user submits a query, it will also be converted into a vector embedding. Then, vector search will be performed to find the most semantically similar document chunks in the database. Additional pre-filtering may be applied to speed up the vector search execution. - Generation — The retrieved information will be added to the original user query to create an augmented prompt. This prompt will then be sent to the LLM, which will generate the final response. The main goal of this component is to generate a context-aware response to the user's query.
In the following tasks, you will chunk and convert the PDF documents with the insurance policy data to vector embeddings, store and index them in MongoDB Atlas, and implement RAG to enable context-aware chatbot responses.
The insurance company has provided you with a set of PDF documents containing user insurance policy data. The documents have information about different types of insurance policies, coverage details, policy terms, and claim procedures. Based on the user query, the application will retrieve the most relevant context by performing vector search in MongoDB Atlas. Then, this context will be appended to the chat prompt requests to Gemini 2.0 Flash, enabling the chat assistant to generate context-aware responses.
In this task, you will: - create a MongoDB Atlas database deployment, - split PDF documents into text chunks, - convert the chunks to vector embeddings using Langchain and the Vertex AI Text embeddings API, - store the vector embeddings in a MongoDB Atlas database, - create a vector search index on the ingested embeddings.
Log in to your MongoDB Atlas account.
You can deploy only one free tier cluster per project. If you already have a free cluster, you will need to create a new project to deploy an additional free cluster. To do this, open the dropdown menu located in the top left corner, just below the Atlas logo, and click New Project. Then, enter a project name, click Next and then Create Project.
Click Create in the Create a cluster section.
GeminiRAG
.You will be prompted to complete the security setup of your deployment.
Open the tab with the Cloud Shell Editor and navigate to the rag-chatbot/server/
directory from the file tree on the left. Create a new file called .env
with the following content:
Replace <your_mongodb_uri>
with the connection string you copied from the MongoDB Atlas UI. Make sure to replace the <password>
placeholder with the password you set for the database user.
Let's create the script that chunks the PDF documents and converts them to vector embeddings.
Navigate to the rag-chatbot/server/src/
directory from the file tree on the left and open the embed-documents.ts
file. You will see that the file is empty.
You should start by loading all PDFs from the pdf_documents
directory. Langchain provides a helper PDFLoader
class that you can use to load PDFs from a directory.
Next, you will use the RecursiveCharacterTextSplitter
helper from Langchain to split the text content of the PDFs into chunks of text.
By splitting text in a way that preserves the context within each chunk, the splitter ensures that each piece of text remains meaningful on its own. This is essential for retrieval tasks where maintaining the integrity of information is crucial. Additionally, smaller, well-defined chunks can be indexed more efficiently. When a query is made, the system can quickly match it with the most relevant chunks rather than processing an entire document.
Add the following code right after the loading of the PDFs:
Finally, you'll instantiate a MongoDB Atlas Vector database instance. When you create the instance, you'll specify the embedding model that you want to use. Behind the scenes, Langchain will invoke the embeddings API to convert the text chunks into vector embeddings, which will then be stored in the MongoDB Atlas database. As you can see, Langchain abstracts the complexity of the embeddings API and the MongoDB Atlas Vector Search API, allowing you to focus on the core functionality of your application.
Add the following code right after splitting the PDF documents into chunks:
Stop the running process by pressing Ctrl + C
or Command + C
in the terminal emulator tab.
Run the following command to execute the script that converts PDF documents to vector embeddings:
You should see the following output in the terminal:
This output indicates that the PDF documents were successfully loaded, split into text chunks, and imported into the MongoDB Atlas vector store.
In this task, you will create an Atlas Vector Search index in MongoDB Atlas. The index will be used to perform vector search queries on the embedded chunks stored that you just stored in the Atlas database.
Open the MongoDB Atlas UI and refresh the page to acknowledge the new imported data.
Click Browse collections under the GeminiRAG deployment.
Explore the documents in the context
collection. You'll see the text chunks and their corresponding vector embeddings, along with other metadata.
Click Search Indexes or Atlas Search to be taken to the Atlas Search page. Once you're there, click Create Search Index.
Scroll up to the Search Type section and select Vector Search.
You should see the Index Name change to vector_index.
Then, select the collection that you want to index. Expand the collections in the chat-rag
database and then select the context
collection.
In the Configuration Method section, select JSON editor.
Click Next.
Paste the following configuration in the JSON editor:
embedding
field of the context collection.numDimensions
parameter sets the length of the vector embeddings, which varies depending on the embedding model used. Since you're using Vertex AI Text Embeddings, which generates vectors with 768 dimensions, this value is specified in the configuration.similarity
parameter determines the vector similarity function for vector search queries—in this case, the Euclidean distance similarity function, which measures the distance between ends of vectors.type
of the index is set to vector.Click Next and then Create Vector Search Index.
Wait for the status to change to READY. Once the index is ready, you can start performing vector search queries on the indexed field.
Next, you will implement the retriever component of your RAG system. As mentioned earlier, the retriever will use the Atlas Search Vector Database as its source. It will: - Convert the user's question into a vector embedding. - Perform a vector search against the indexed data to find the most relevant text chunks.
The retrieved chunks will serve as context for the chatbot.
Navigate to the rag-chatbot/server/src/
directory in the Cloud Shell Editor and open the server.ts
file.
First, you need to initialize the MongoDB Atlas vector store. Add the following code right above the /messages
endpoint:
The code is similar to the one you used to create the MongoDB Atlas vector store in the embed-documents.ts
script. The only difference is that you're using the asRetriever
method to create a retriever wrapper around the vector store. The retriever wrapper provides a convenient interface for performing vector search queries on the indexed fields.
Finally, you will implement the generation component of your RAG system. The generation component will prompt the LLM (Gemini 2.0 Flash) to generate context-aware responses based on the user's question and the retrieved context.
The prompt will be constructed by combining the original user query and the retrieved context chunks. Then, the system will prompt Gemini 2.0 Flash to generate the final response and return it to the user.
Let's change the generation logic we implemented earlier in the /messages
endpoint. Right under the let prompt = ...
line add the following code:
This code checks if the rag
property is set to true
in the request body. If it is, the code retrieves context from the MongoDB Atlas vector store using the invoke
method of the retriever. The context is then appended to the prompt message.
The rest of the implementation remains the same.
When constructing a prompt, you need to consider the LLM’s context window—the amount of information, including the question and additional context, that the model can process at once.
An AI model’s context window is measured in tokens, which are the fundamental units used for processing information. Tokens can represent entire words, parts of words, images, videos, audio, or code. The larger the context window, the more data the model can analyze in a single prompt, leading to more consistent, relevant, and useful responses.
Gemini 2.0 Flash has an exceptionally large context window of up to 1 million tokens. This allows it to process vast amounts of information in one go, such as an hour of video, 11 hours of audio, codebases with over 30,000 lines, or more than 700,000 words.
Another consideration is how many chunks to retrieve as context through vector search and include in your prompt. While more chunks can provide extra context, too many may include irrelevant information and lower the response accuracy. There’s no set number—it depends on your data. You can experiment with different context sizes and see how it affects the relevance of the responses.
Start the application again by running:
Wait for the application to start. Once it's running, click on the web preview button on the top right in Cloud Shell to open the web app again.
.
Select Preview on port 4200.
In the chat assistant interface, switch on the RAG toggle next to the message input. This will instruct the app to perform vector search queries on the MongoDB Atlas vector store and include the retrieved context in the chat prompt requests to Gemini 2.0 Flash.
Type a message such as "Does my car insurance cover mechanical failure?" in the chat box and press Enter to send the message.
You should get a response similar to the following:
Let's try another question such as "What does my insurance policy cover?".
Notice that the answer is more specific, takes into account the chat history, and provides context-aware responses based on the user's questions.
Great job! You have successfully implemented Retrieval Augmented Generation (RAG) with MongoDB Atlas Vector Search to create a context-aware chat assistant.
You learned how to create a chat assistant using Gemini 2.0 Flash, Langchain, Node.js, and Angular. You explored the limitations of out-of-context prompts and managed to overcome them by implementing Retrieval Augmented Generation (RAG) with MongoDB Atlas Vector Search.
Validate your expertise in Retrieval Augmented Generation by earning the official MongoDB Skill Badge!
This digital credential validates your knowledge in building Retrieval-Augmented Generation (RAG) applications with MongoDB. It recognizes your understanding of integrating vector search, optimizing retrieval workflows, and enhancing LLM-powered apps.
To keep learning MongoDB try these labs:
Be sure to check out MongoDB on the Google Cloud Marketplace!
Get free $500 credits for MongoDB on Google Cloud Marketplace - Applicable only for new customers.
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated: April 07, 2025
Lab Last Tested: April 07, 2025
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one