arrow_back

Getting Started with Splunk Cloud GDI on Google Cloud

Join Sign in

Getting Started with Splunk Cloud GDI on Google Cloud

1 hour 30 minutes 5 Credits

This lab was developed with our partner, Splunk. Your personal information may be shared with Splunk, the lab sponsor, if you have opted in to receive product updates, announcements, and offers in your Account Profile.

Note: This lab requires a partner trial account. Please follow the lab instructions to create your trial account before starting the lab.

GSP947

Google Cloud self-paced labs logo

Overview

In this hands-on lab you'll learn how to configure Google Cloud to send logging and other infrastructure data to Splunk Cloud via Dataflow, the Splunk Add-on for Google Cloud Platform, and Splunk Connect for Kubernetes (SC4K).

Although you can easily copy and paste commands from the lab to the appropriate place, students should type the commands themselves to reinforce their understanding of the core concepts.

Objectives

In this lab, you will:

  • Create a Splunk Cloud trial

  • Install the Splunk Add-on for Google Cloud Platform (GCP-TA)

  • Create Splunk indexes

  • Create Splunk HTTP Event Collectors (HECs)

  • Create log sinks

  • Create Cloud Storage buckets

  • Create Pub/Sub topics and subscriptions

  • Launch a Dataflow template deployment

  • Configure GCP-TA inputs

  • Perform sample Splunk searches across ingested data

  • Monitor and troubleshoot Dataflow pipelines

  • Deploy a demo "Online Boutique" microservice in GKE (optional)

  • Install Splunk Connect for Kubernetes (SC4K) (optional)

Prerequisites

  • Familiarity with Splunk is beneficial

Architecture you'll configure

Architecture diagram

Setup a Splunk trial account

Prior to starting the timer on this lab, please create and configure a Splunk Cloud trial.

  1. Visit splunk.com to sign up for a free trial account.

  2. Once you have logged into splunk.com, click on "Free Splunk."

  3. Then click "Access Free 14-day Trial".

  4. Then click "Start trail".

Credentials for your trial Splunk Cloud environment will be sent to your email. Please check your spam folder if you do not see the email in your inbox.

Once you’ve logged into your Splunk Cloud instance, you will be asked to change the password and accept the terms of service.

Task 1. Install the Splunk Add-on for Google Cloud

Next, install the Splunk Add-on for Google Cloud.

  1. On the left pane click on "+ Find More Apps"

  2. Search for "Google Cloud" in the search box.

  3. To narrow search results, ensure "IT Operations" and "Business Analytics" are checked under Category.

  4. Choose "Add-on" under App Type, "Splunk" under Support Type, "Inputs" under App Content, and "Yes" under Fedramp.

  5. Look for the "Splunk Add-on for Google Cloud Platform" search result under Best Match.

  6. Click on the Install button.

You will be required to enter your credentials. This should be the same account you used to initiate the Splunk Cloud trial.

  1. Ensure you have placed a check in the box indicating you have reviewed the Splunk software terms and conditions.

  2. Click the "Login and Install" button to proceed.

Once you have logged in successfully, you will see a screen indicating that the add-on is downloading and installing.

If you encounter an error with your username/password ensure that you have verified your e-mail address (a confirmation is sent via e-mail to confirm the address).

Finally, you will see a screen indicating the installation is complete.

  1. Click "Done."

You may see a warning in the "Messages" dropdown section of the top navigation bar indicating that Splunk must be restarted. For the purposes of this lab, you can safely ignore this message.

Create indexes

An index is a repository for Splunk data. Splunk Cloud transforms incoming data into events which it then persists to an index.

You will need to create the following event indexes for this lab:

  • gcp_data - For data from Splunk Dataflow template
  • gcp_ta - For data from the Splunk Add-on for Google Cloud
  • gcp_connect - For data from Splunk Connect for Kubernetes (SC4K)

You will need to create the following metric indexes for this lab:

  • gcp_metrics - For metric data from Splunk Connect for Kubernetes (SC4K)

  1. To create an index, start by selecting the "Search and Reporting App" in the top left navigation bar.

  2. Next, click on "Settings."

  3. Once this menu is expanded, click "Indexes."

  4. Click "New Index."

  5. Use an "Index name" of "gcp_data" and leave "Index Data Type" set as "Events."

  6. Set "Max raw data size" to "0" and "Searchable retention (days)" to "15."

  7. Click "Save" to create the index.

  8. Following the same aforementioned gcp_data steps above, create the other gcp_ta, and gcp_connect indexes. Use the same type, size, and searchable time.

  9. You will also need to create a metrics index for gcp_metrics. The steps are the same as previous indexes, with the exception of selecting a "Metrics" index rather than an "Events" index during creation.

Task 2. Create HECs

The HTTP Event Collector (HEC) is a fast and efficient way to send data over HTTP (or HTTPS) to Splunk Cloud from a logging source such as Splunk Connect for Kubernetes (SC4K) or the Splunk Dataflow template. In this section, you will create HEC endpoints along with corresponding authentication tokens.

SC4K HEC

  1. Click "Settings > Data Inputs" in the Splunk Cloud top navigation.

  2. Click on "Add New" next to HTTP Event Collector.

  3. Name the HEC "gcp-sc4k" and leave other fields as default.

  4. Click Next.

  5. In the "Selected Allow Indexes" chooser, select the gcp_connect and gcp_metrics indexes created in the previous step.

  6. Select the gcp_connect index as the default index.

  7. Review and submit the HEC configuration.

  8. Copy the token value to a temporary scratch file. You will be using this token later in the lab.

Dataflow HEC

  1. Click "Settings > Data Inputs" in the Splunk Cloud top navigation.

  2. Click on "Add New" next to HTTP Event Collector.

  3. Name the HEC "gcp-dataflow" and leave other fields as default.

  4. Click Next.

  5. In the "Source type" section, click "Select" and specify google:gcp:pubsub:message as the source type.

  6. In the "Selected Allow Indexes" chooser, select the gcp_data index.

  7. Select the gcp_data index as the default index.

  8. Review and submit the HEC configuration.

  9. Copy the token value to a temporary scratch file. You will be using this token later in the lab.

Open Cloud Shell

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

  1. Click Activate Cloud Shell Activate Cloud Shell icon at the top of the Google Cloud console.

  2. Click Continue.

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. The output contains a line that declares the PROJECT_ID for this session:

Your Cloud Platform project in this session is set to YOUR_PROJECT_ID

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

  1. (Optional) You can list the active account name with this command:

gcloud auth list

Output:

ACTIVE: * ACCOUNT: student-01-xxxxxxxxxxxx@qwiklabs.net To set the active account, run: $ gcloud config set account `ACCOUNT`
  1. (Optional) You can list the project ID with this command:

gcloud config list project

Output:

[core] project = <project_ID>

Example output:

[core] project = qwiklabs-gcp-44776a13dea667a6 Note: For full documentation of gcloud, in Google Cloud, refer to the gcloud CLI overview guide.

Task 3. Setting environment variables

CLI

This section is only required if you're performing lab steps via the CLI.

  1. Launch Cloud Shell.

  2. Ensure the Project ID variables is set:

echo ${GOOGLE_CLOUD_PROJECT}

If the environment variable is not set, please follow the steps above under "Google Cloud Shell".

  1. Set the environment variables.

First, assign several Splunk Cloud-specific environment variables. You will need to supply the hostname of your Splunk Cloud instance along with the HEC tokens you created in the previous HTTP Event Collector setup steps:

export SPLUNK_HOSTNAME=<INSERT SPLUNK CLOUD HOSTNAME> export SC4K_HEC_TOKEN=<INSERT HEC TOKEN> export DATAFLOW_HEC_TOKEN=<INSERT HEC TOKEN>

For example:

export SPLUNK_HOSTNAME=prd-p-saen7.splunkcloud.com export SC4K_HEC_TOKEN=bc77efcf-fc60-494f-b80c-52701d7901d4 export DATAFLOW_HEC_TOKEN=bf4bae6f-f9c8-4f5c-b349-3cf77c9baa16

Additionally, please set the following environment variables:

# Common export SINK_NAME=splunk-dataflow-sink-cli export SINK_TOPIC=splunk-dataflow-sink export DISABLE_CERT_VALIDATION=true # Dataflow export DEADLETTER_TOPIC=splunk-dataflow-deadletter export DATAFLOW_SUB=dataflow-sub export DEADLETTER_SUB=deadletter-sub export MAX_WORKERS=4 export MACHINE=n1-standard-1 export HEC_URL=https://${SPLUNK_HOSTNAME}:8088 export BATCH_COUNT=10 export PARALLELISM=4 export DATAFLOW_FORMAT_LIKE_PUBSUB=true # GCP-TA export SPLUNK_SERVICE_ACCOUNT=splunk-ta export TA_SUBSCRIPTION=ta-subscription

Task 4. Creating a log sink

The first step to getting data from Operations Logging (Stackdriver) to Splunk is to create a log sink. All logging data for Google Cloud is sent to Operations Logging; the sink exports that data real-time to another location (Pub/Sub, BigQuery, Cloud Storage). You will forward the logs on to Pub/Sub for processing.

You have to be careful that you don't create an infinite loop of logging. If you don't have exclusions on the log sink then the system will attempt to send the log event of it sending a previous event which creates another log event that it will then try to forward.

This process also creates the destination Pub/Sub topic (automatically in the UI, manually via the CLI).

Cloud Console

  1. In the Cloud Console go to Analytics > Pub/Sub > Topics.

  2. Click on Create Topic.

  1. Name the sink splunk-dataflow-sink, leave the default values, and click Create Topic.

  2. Next, in the Cloud Console go to Operation > Logging > Logs Router.

  1. Click on Create Sink.
  1. Name the sink splunk-dataflow-sink and click Next.

Sink details page displaying the populated Sink name and Sink description fields.

  1. Specify Splunk as the sink service and select the topic that you created above then click Next.

Sink destination page displaying the populated Select sink service and Select Cloud Pub/Sub topic fields.

  1. Leave the inclusion filter blank in order to send all logs to Splunk unless excluded, then click Next.

Build inclusion filter field on the Choose logs to include in sink page.

  1. Click Add Exclusion to specify an exclusion filter to omit Dataflow logs.

Choose logs to filter out of sink (optional) page displaying the populated Exclusion filter name field.

  1. Once complete click on Create Sink.

CLI

Note: You don't need to do this section if you've already performed the previous steps via the Console UI.
  1. In Cloud Shell (with the environment variables set), create the Pub/Sub topic:

gcloud pubsub topics create ${SINK_TOPIC}
  1. Create the log sink:

gcloud logging sinks create ${SINK_NAME} \ pubsub.googleapis.com/projects/${GOOGLE_CLOUD_PROJECT}/topics/${SINK_TOPIC} \ --log-filter="resource.type!=\"dataflow_step\""
  1. Set the environment variable for the correct service account:

export SERVICE_ACCOUNT=`gcloud logging sinks describe ${SINK_NAME} --format="value(writerIdentity)"`
  1. Modify the IAM permissions of Pub/Sub topic to allow the log sink to publish:

gcloud pubsub topics add-iam-policy-binding ${SINK_TOPIC} \ --member="${SERVICE_ACCOUNT}" --role="roles/pubsub.publisher"

Click Check my progress to verify the objective. Creating a log sink

Task 5. Deploying the Pub/Sub to Splunk Dataflow Template

The next part of getting the logging sent to Splunk is the deployment of the Pub/Sub to Splunk Dataflow Template. This deploys a Dataflow pipeline that streams the events from a Pub/Sub subscription, batches them up, and sends them to Splunk HEC. Optionally (although not done in the lab), you can add an inline UDF function that manipulates the log messages. This could be a process to remove sensitive information or augment the message with additional data from another source.

The Pub/Sub to Splunk Dataflow template is just one way of sending data to Splunk. In a later section you will also explore using the Splunk Add-on for Google Cloud.

Enable Dataflow Service Account

  1. Go to Navigation menu > IAM & Admin > IAM.

  2. Click the pencil icon on the compute@developer.gserviceaccount.com service account.

  3. Select the Dataflow Admin role (in the Role dropdown menu) and click Save.

Cloud Console

Now you will create a bucket for the Dataflow template used during deployment.

  1. Go to Navigation menu > Cloud Storage > Browser.

  2. Click on Create Bucket.

  • Give your bucket a globally unique name (<project-id>-dataflow would be unique).
  • Select Region for Location type and choose ‘us-central1' for the Location.
  • Click on Create (leaving the rest of the options as default).

Now you will create a Pub/Sub Topic for the dead letter queue.

Note: The dead letter queue is used to store events that are not processed successfully by the Dataflow pipeline. This will allow us to reprocess these events at a later time and will also be useful for troubleshooting the pipeline if any issues are encountered.
  1. Go to Navigation Menu > Analytics > Pub/Sub > Topics.

You'll see the one topic created for the logs router.

  1. Click Create Topic to create another.

  • Type splunk-dataflow-deadletter for Topic ID.
  • Leave the default values and click Create Topic.

You'll be forwarded to the topic page for the dead letter queue.

  • Scroll down to subscriptions.

  • Click on Create Subscription > Create Subscription to create a subscription to store items forwarded to the dead letter queue. If a topic doesn't have a subscription anything sent to it is discarded.

  • Type deadletter in the Subscription ID field and leave all else as default.

  • Click Create.

    Note: The "enable dead lettering" option on the subscription creation page is used when the subscription (usually a push) fails to send the message to its target.

    This is not to be confused with the dead letter Pub/Sub topic you created for Dataflow, which is used in the scenario where the Dataflow template is unable to send the data to Splunk HEC.

    Therefore the "enable dead lettering" option here can be ignored. The Dataflow pipeline to Splunk has built-in capability to handle message retries.
  1. Create the Pub/Sub subscription for Dataflow to process logs. While the log sink step created the topic to dump the logs into you have to create a subscription to that topic so that the message in that topic are delivered to Dataflow.

  • Back under Topics click on the initial Pub/Sub topic created (splunk-dataflow-sink) by the Log Router.

  • Scroll down on the topic page.

  • Click on Create Subscription as you did for the dead letter topic.

  • Type dataflow for Subscription ID and create the subscription for Dataflow (leave all other defaults).

  • Click Create.

  • Click on the newly created subscription and note down the subscription name as you will need it in a later step (format should be projects/<your-project-id>/subscription/dataflow).

  1. Next, deploy the Dataflow Template:
  • Go to Navigation menu > Analytics > Dataflow > Jobs.
  • Click on Create Job from Template.
  • Type splunk-dataflow in the Job name field and select the Pub/Sub to Splunk template in the Dataflow template dropdown menu.
  • Enter the main required parameters:

Input Cloud Pub/Sub subscription

projects/<your-project-id>/subscriptions/dataflow

HEC URL

https://<splunk-cloud-host-name>:8088

Output deadletter Pub/Sub topic

projects/<your-project-id>/topics/splunk-dataflow-deadletter

Temporary location

<your-project-id>-dataflow/tmp

  • Click on Show Optional Parameters.

  • Set the parameters as shown:

    HEC Authentication token

    <your-dataflow-hec-token>

    Batch size

    10

    Maximum number of parallel requests

    4

    Disable SSL certification validation

    true

    Include full Pub/Sub message in the payload

    true

    Max workers

    2

  1. Click on Run Job.

The job should take a few minutes to deploy. Once deployed, you can monitor the job throughput and other metrics using the Job Graph and Job Metrics tabs. For example, you can monitor the throughput metric to track the number of events processed over time.

Note: If you get a failure for insufficient quota, ensure that you set the max-workers and machine-type to the values above. The default values are too large for the smaller lab environment.

CLI

Note: You don't need to do this section if you've already performed the previous steps via the Console UI.
  1. Enable the Dataflow API:

gcloud services enable dataflow.googleapis.com
  1. Create a bucket for the Dataflow template use during deployment:

gsutil mb -l us-central1 gs://${GOOGLE_CLOUD_PROJECT}-dataflow
  1. Create a Pub/Sub topic for the deadletter queue and a subscription:

gcloud pubsub topics create ${DEADLETTER_TOPIC} gcloud pubsub subscriptions create ${DEADLETTER_SUB} \ --topic ${DEADLETTER_TOPIC}
  1. Create the Pub/Sub subscription for Dataflow to process logs:

gcloud pubsub subscriptions create ${DATAFLOW_SUB} \ --topic ${SINK_TOPIC}
  1. Deploy the Dataflow Template:

gcloud dataflow jobs run splunk-dataflow-`date +%s` \ --region us-central1 \ --gcs-location=gs://dataflow-templates/latest/Cloud_PubSub_to_Splunk \ --staging-location=gs://${GOOGLE_CLOUD_PROJECT}-dataflow/tmp \ --max-workers=${MAX_WORKERS} \ --worker-machine-type=${MACHINE} \ --parameters="\ inputSubscription=projects/${GOOGLE_CLOUD_PROJECT}/subscriptions/${DATAFLOW_SUB},\ token=${DATAFLOW_HEC_TOKEN},\ url=${HEC_URL},\ outputDeadletterTopic=projects/${GOOGLE_CLOUD_PROJECT}/topics/${DEADLETTER_TOPIC},\ batchCount=${BATCH_COUNT},\ parallelism=${PARALLELISM},\ includePubsubMessage=${DATAFLOW_FORMAT_LIKE_PUBSUB},\ disableCertificateValidation=${DISABLE_CERT_VALIDATION}"

Click Check my progress to verify the objective. Deploying the Pub/Sub to Splunk Dataflow Template

Task 6. Configure the Splunk TA for Google Cloud

Create a Service Account for Splunk TA

In order to connect the Splunk Add-on for Google Cloud (TA) to Google Cloud to pull data, create a service account with appropriate permissions. In this lab you are providing the service account an exhaustive list of permissions as you are connecting all input methods. If you're using the TA only for a few of the inputs, a reduced list of permissions can be set.

Cloud Console

  1. Go to Navigation menu > IAM & Admin > Service Accounts.
  1. Click on Create Service Account.
  1. Type splunk-ta in the Service account name field

  2. Click Create & Continue.

  3. Add the permissions needed (you can search to make it easier to find them):

  • Compute Admin

  • Logs Configuration Writer

  • Logs Viewer

  • Monitoring Viewer

  • Storage Admin

  • Storage Object Viewer

  • Viewer

  • Pub/Sub Viewer

  1. Click Continue.

  2. Click Done.

  3. Click on your newly provisioned service account.

  4. Click the Keys tab on the top bar.

  5. Click Add Key > Create new key.

  6. On the Create private key for "splunk-ta" page, leave the default JSON key type selected and click Create. This will download the JSON key file to your system - store this for later.

CLI

Note: You don't need to do this section if you've already performed the previous steps via the Console UI.
  1. Create the service account:

gcloud iam service-accounts create ${SPLUNK_SERVICE_ACCOUNT} --description "Splunk account for TA"
  1. Give the correct permissions:

gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/compute.admin gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/storage.admin gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/storage.objectViewer gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/logging.configWriter gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/monitoring.viewer gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/logging.viewer gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/viewer gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/pubsub.viewer
  1. Create and download the service key JSON:

gcloud iam service-accounts keys create \ ${SPLUNK_SERVICE_ACCOUNT}.json \ --iam-account=${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com cat ${SPLUNK_SERVICE_ACCOUNT}.json

Click Check my progress to verify the objective. Create a Service Account for Splunk TA

Add JSON credential to GCP-TA

  1. Open the Splunk Cloud console in a new browser tab.
  2. Under Apps, select Splunk Add-on for Google Cloud Platform.
  3. Click on Configuration.
  4. Under the Google Credentials tab, click on Add.
  5. On the Add Google Credentials page, type gcp_creds in the Name field and paste the credentials from the JSON file that was downloaded when you created the service account in the Google Service Account Credentials field.
  6. Then click Add.

If you encounter an invalid JSON error, verify that you haven't pasted any additional carriage returns (line ending) in the private key. If you copy the text from a terminal it may lead to extra lines. The private_key should be a single line that will line wrap in the UI.

Explore GCP-TA Inputs

  1. Click on Inputs.

  2. Expand the Create New Input dropdown menu.

You should see the different options, including Cloud Pub/Sub, Cloud Monitoring, Google Cloud BigQuery Billing, Cloud Storage Bucket, and Resource Metadata.

The table below details their purposes.

Input

Purpose

Cloud Pub/Sub

Logging events and other Pub/Sub generated events

Cloud Monitor

Metrics such as CPU and DISK usages of Instances.

Google Cloud BigQuery Billing

Pull billing information from a Cloud Storage bucket. Note, this doesn't work currently work for everyone due to the decomissioning of File Export for Billing (only billing accounts with previous CSV configuration will work).

Cloud Storage Bucket

Pulls data from a Cloud Storage bucket such as application logs but could be any CSV, JSON or raw text.

Resource Metadata

Information on the resources in an organization/project.

The Splunk Add-on for Google Cloud supports ingesting a variety of data sources from Google Cloud. You've already set up Dataflow to send Google Cloud logs to Splunk, so next you'll configure the Splunk TA to pull:

  • Resource Metadata

  • Cloud Monitoring

Resource Metadata

The Resource Metadata input can be configured to pull metadata from various Compute Engine resources and enable Splunk users to monitor and set up analytics for their Compute Engine deployments.

  1. Click Create New Input > Resource Metadata.

  2. Configure a Resource Metadata input with the following (again, each project will be unique):

Name

resources

Credentials

select the gcp_creds credentials that you created earlier

Project

<your-project-id>

Zones

us-central1-a, b, c, f

APIs

leave all checked

Index

gcp_ta

Sourcetype

keep default

Your configuration should resemble the following:

Add Resource Metadata Input page displaying the input taken from the table

  1. Click Add.

Cloud Monitoring

Cloud Monitoring collects metrics from a wide range of services on Google Cloud, as well as a variety of third-party software. A complete list of all predefined metrics can be found in the Metrics list reference. If you need something that isn't already defined, you can create your own custom metrics.

  1. Click Create New Input > Cloud Monitoring.

  2. Configure Cloud Monitoring Input with the following parameters:

Name

monitoring

Credentials

select the gcp_creds credentials that you created earlier

Project

<your-project-id>

Cloud Monitor Metrics

compute.googleapis.com/instance/cpu/usage_time

Interval

keep default

Start Date Time

keep default

Index

gcp_ta

Your configuration should resemble the following:

Add Cloud Monitoring Input page displaying the input taken from the table

  1. Click Add.

Bonus - Pub/Sub Input

This section requires usage of the CLI. Ensure the environment variables provided in the Setup section of this lab are present in your shell.

You can also ingest logs via the Pub/Sub input using the TA.

Note: See "Comparison of methods" in Task 7 for a detailed comparison of ingestion methods.
  1. To do this, run the following command in Cloud Shell to create a second subscription to the original Pub/Sub topic previously created. In this case, it's called ta_subscription:

gcloud pubsub subscriptions create ${TA_SUBSCRIPTION} \ --topic ${SINK_TOPIC}
  1. Then you'll have to give the TA service account explicit subscriber access to the subscription you created:

gcloud pubsub subscriptions add-iam-policy-binding ${TA_SUBSCRIPTION} \ --member=serviceAccount:${SPLUNK_SERVICE_ACCOUNT}@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/pubsub.subscriber

Click Check my progress to verify the objective. Bonus - Pub/Sub Input

  1. Finally, navigate back to the Splunk Cloud console.

  2. Click Create New Input > Cloud Pub/Sub.

  3. Fill out the configuration as follows, you can add the Cloud Pub/Sub input pointing to the subscription created:

Name

logging

Credentials

select the gcp_creds credentials that you created earlier

Project

<your-project-id>

Pub/Sub Subscriptions

ta_subscription

Index

gcp_ta

Your configuration should resemble the following:

Add Cloud Pub/Sub Input page displaying the input taken from the table

Task 7. Sample searches

Now that you have various types of Google Cloud data hooked up to Splunk, take a look at some common Splunk searches you can use to get value out of this data.

  • First, from the top navigation bar navigate to Apps > Search & Reporting.
Note: Search queries in Splunk are composed using Search Processing Language, more commonly referred to as SPL. SPL is a very powerful feature. The details aren't covered in this lab, but for a quick primer on SPL, refer to the Search command primer documentation.

Who is exporting JSON credential keys?

  • Copy/paste the following in the search box:

index="gcp_data" data.resource.type="service_account" data.protoPayload.methodName="google.iam.admin.v1.CreateServiceAccountKey" | rename data.protoPayload.authenticationInfo.principalEmail as "Principal Email" | rename data.protoPayload.requestMetadata.callerIp as "Source IP" | rename data.protoPayload.requestMetadata.callerSuppliedUserAgent as "User Agent" | rename data.protoPayload.response.name as "Key Name" | rename data.protoPayload.response.valid_after_time.seconds as "Valid After" | rename data.protoPayload.response.valid_before_time.seconds as "Valid Before" | eval "Valid After"=strftime('Valid After', "%F %T") | eval "Valid Before"=strftime('Valid Before', "%F %T") | eval "Private Key Type" = case('protoPayload.request.private_key_type' == 0, "Unspecified", 'protoPayload.request.private_key_type' == 1, "PKCS12", 'protoPayload.request.private_key_type' == 2, "Google JSON credential file") | table _time, "Principal Email", "Source IP", "User Agent", "Key Name", "Private Key Type", "Valid After", "Valid Before"

What service accounts have been created and by whom?

This SPL will generate a table of those events.

  • Copy/paste the following in the search box:

index="gcp_data" data.resource.type="service_account" data.protoPayload.methodName="google.iam.admin.v1.CreateServiceAccount" | rename data.protoPayload.authenticationInfo.principalEmail as "Principal Email" | rename data.protoPayload.requestMetadata.callerIp as "Source IP" | rename data.protoPayload.requestMetadata.callerSuppliedUserAgent as "User Agent" | rename data.protoPayload.response.email as "Service Account Email" | rename data.protoPayload.response.project_id as Project | table _time, "Principal Email", "Source IP", "User Agent", Project, "Service Account Email"

Display instances in the project

  • Copy/paste the following in the search box:

index="gcp_ta" sourcetype="google:gcp:resource:metadata" | search(kind="compute#instance")

Task 8. Monitor and troubleshoot pipelines

While Splunk can be used for most troubleshooting scenarios, there is one situation where that cannot be done: when Dataflow has failed to send data to Splunk. To troubleshoot and monitor in these situations you'll have to rely on the built-in monitoring and logging in the Google Cloud console.

Monitoring Dataflow via Operations Monitor Dashboard

This section requires usage of the CLI. Ensure the environment variables provided in the Setup section of this lab are present in your shell.

  1. Navigate to Navigation menu > Monitoring > Dashboards in the Cloud Console.

The first time the monitoring console is opened it may prompt you to either create a new workspace (default select) or to add it to an existing workspace.

  1. Accept the default select and click on Add.

  2. Once the workspace has been prepared, open Cloud Shell and run the following to deploy a custom dashboard. You're free to edit and tweak the dashboards:

gsutil cp \ gs://${GOOGLE_CLOUD_PROJECT}-dashboard/SplunkExportDashboard.json . gcloud alpha monitoring dashboards create \ --config-from-file=SplunkExportDashboard.json \ --project=${GOOGLE_CLOUD_PROJECT}
  1. Go back to dashboards. You should see a dashboard in the list called Splunk Dataflow Export Monitor. If you don't, refresh the webpage.

  2. After some time you should see the metrics for the Dataflow and Pub/Sub jobs.

Note: Some graphs may show errors if opened too early or if there are no errors.

Click Check my progress to verify the objective. Monitor and troubleshoot pipelines

Finding errors in Dataflow jobs

  1. To see errors in Dataflow you can open up the Dataflow job by going to Navigation menu > Analytics > Dataflow > Jobs and clicking on the running job.

  2. At the bottom of the page you can click on either Job Logs or Worker Logs to see logging relating to either the deployment and running of the stream itself (Job Logs) or logging relating to the function of the workers individually (Worker Logs).

Job Logs tabbed page displaying several log messages

Task 9. Comparison of methods

In this lab you've seen two different ways to ingest logging data into Splunk.

The following table compares each method:

Method

Pro

Con

Splunk Add-on for Google Cloud (TA)

  • No additional infrastructure needed in Google Cloud
  • Google Cloud cost minimized
  • Supported by Splunk
  • Support all data types (assets, logging, metrics, etc..)
  • TA infrastructure must be scaled to handle ingestion volume
  • Data is not pushed from the sources but rather periodically pulled

Dataflow

  • Supported by Google
  • Supports batching of message to ease impact on HEC
  • Exponential backoff support to ease load on HEC
  • Data is pushed (fresher data in Splunk)
  • Only supports logging and asset data (today)
  • Operational management of Dataflow

Task 10. (Optional) Online Boutique Demo

This section requires usage of the CLI. Ensure the environment variables provided in the Setup section of this lab are present in your shell.

The Online Boutique Demo deploys numerous microservices and a simulated work log that will generate realistic log entries that you'll be able to inspect in Splunk. Refer to the GitHub repo for full details as to what is getting deployed.

Build GKE cluster

  1. Enable required services:

gcloud services enable container.googleapis.com gcloud services enable containerregistry.googleapis.com gcloud services enable cloudbuild.googleapis.com
  1. Create a GKE cluster and verify nodes creation. This process can take about 2 minutes:

gcloud container clusters create demo \ --enable-autoupgrade --enable-autoscaling \ --min-nodes=2 --max-nodes=4 --num-nodes=3 \ --machine-type=n1-standard-4 --zone=us-central1-a kubectl get nodes
  1. Configure gcloud for docker auth:

gcloud auth configure-docker -q

Install boutique

  1. Clone the Online Boutique shop demo repository:

git clone https://github.com/GoogleCloudPlatform/microservices-demo cd microservices-demo
  1. Deploy using pre-built container images:

kubectl apply -f ./release/kubernetes-manifests.yaml
  1. Get the external IP of the Online Boutique once deployed:

kubectl get service/frontend-external

The deployment can take a few minutes to fully spin up and show the external IP.

Install Splunk Connect for Kubernetes (SC4K)

In the next few steps you will configure Splunk Connect for Kubernetes to send data to HEC. While much of this logging information is available via the standard Google Cloud logging, SC4K can provide deeper insight and allow Kubernetes visibility outside of Google Cloud.

  1. First, create a namespace for the SC4K pods:

kubectl create namespace splunk
  1. Add the SC4K repo to helm:

helm repo add splunk https://splunk.github.io/splunk-connect-for-kubernetes/
  1. Create a YAML file for the SC4K configuration:

cat << EOF > values.yaml global: splunk: hec: host: ${SPLUNK_HOSTNAME} port: 8088 token: ${SC4K_HEC_TOKEN} protocol: https indexName: gcp_connect insecureSSL: ${DISABLE_CERT_VALIDATION} kubernetes: clusterName: "demo" prometheus_enabled: true splunk-kubernetes-logging: containers: logFormatType: cri logFormat: "%Y-%m-%dT%H:%M:%S.%NZ" splunk-kubernetes-metrics: splunk: hec: indexName: gcp_metrics EOF
  1. Install Splunk Connect for Kubernetes via the Helm Chart:

helm install splunk-connect \ --namespace splunk \ -f values.yaml \ splunk/splunk-connect-for-kubernetes

Explore boutique logs

Data should now be streaming into Splunk. Here's a couple of samples searches:

  1. Average response time by request path:

index="gcp_connect" http.req.method="GET" earliest=-15m | search http.resp.status=200 | timechart avg(http.resp.took_ms) by http.req.path
  1. See the type of metrics that are being reported:

| mcatalog values(_dims) WHERE "index"="gcp_metrics" GROUPBY metric_name index | rename values(_dims) AS dimensions | table metric_name dimensions

Congratulations!

In this lab you used the Splunk Add-on for Google Cloud to create Splunk indexes, HTTP Event Collectors (HECs), log sinks, Cloud Storage buckets, and create Pub/Sub topics and subscriptions. You then launched a Dataflow template deployment, configure GCP-TA inputs, performed sample Splunk searches across ingested data, and monitored and troubleshooted Dataflow pipelines.

Next steps / learn more

Check out the following for more information on Splunk with Google Cloud:

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated September 29, 2022

Lab Last Tested September 29, 2022

Creator Content available herein, is owned by Splunk Inc. and is provided "AS IS" without warranty of any kind.

Splunk, Splunk>, Turn Data Into Doing, Data-to-Everything and D2E are trademarks or registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2021 Splunk Inc. All rights reserved.

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.