Spring Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Google Professional-Cloud-DevOps-Engineer - Google Cloud Certified - Professional Cloud DevOps Engineer Exam

You are part of an organization that follows SRE practices and principles. You are taking over the management of a new service from the Development Team, and you conduct a Production Readiness Review (PRR). After the PRR analysis phase, you determine that the service cannot currently meet its Service Level Objectives (SLOs). You want to ensure that the service can meet its SLOs in production. What should you do next?

A.

Adjust the SLO targets to be achievable by the service so you can bring it into production.

B.

Notify the development team that they will have to provide production support for the service.

C.

Identify recommended reliability improvements to the service to be completed before handover.

D.

Bring the service into production with no SLOs and build them when you have collected operational data.

You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an important subset of the application’s functionality – a data intensive reporting feature – is consistently failing with an HTTP 500 error. When you investigate your application’s dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing the backend’s persistent disk (PD). How you need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?

A.

As the I/O wait times aggregated across all report generation backends

B.

As the proportion of report generation requests that result in a successful response

C.

As the application’s report generation queue size compared to a known-good threshold

D.

As the reporting backend PD throughout capacity compared to a known-good threshold

Your application images are built and pushed to Google Container Registry (GCR). You want to build an automated pipeline that deploys the application when the image is updated while minimizing the development effort. What should you do?

A.

Use Cloud Build to trigger a Spinnaker pipeline.

B.

Use Cloud Pub/Sub to trigger a Spinnaker pipeline.

C.

Use a custom builder in Cloud Build to trigger a Jenkins pipeline.

D.

Use Cloud Pub/Sub to trigger a custom deployment service running in Google Kubernetes Engine (GKE).

You are running an application in a virtual machine (VM) using a custom Debian image. The image has the Stackdriver Logging agent installed. The VM has the cloud-platform scope. The application is logging information via syslog. You want to use Stackdriver Logging in the Google Cloud Platform Console to visualize the logs. You notice that syslog is not showing up in the "All logs" dropdown list of the Logs Viewer. What is the first thing you should do?

A.

Look for the agent's test log entry in the Logs Viewer.

B.

Install the most recent version of the Stackdriver agent.

C.

Verify the VM service account access scope includes the monitoring.write scope.

D.

SSH to the VM and execute the following commands on your VM: ps ax I grep fluentd

Your team deploys applications to three Google Kubernetes Engine (GKE) environments development staging and production You use GitHub reposrtones as your source of truth You need to ensure that the three environments are consistent You want to follow Google-recommended practices to enforce and install network policies and a logging DaemonSet on all the GKE clusters in those environments What should you do?

A.

Use Google Cloud Deploy to deploy the network policies and the DaemonSet Use Cloud Monitoring to trigger an alert if the network policies and DaemonSet drift from your source in the repository.

B.

Use Google Cloud Deploy to deploy the DaemonSet and use Policy Controller to configure the network policies Use Cloud Monitoring to detect drifts from the source in the repository and Cloud Functions tocorrect the drifts

C.

Use Cloud Build to render and deploy the network policies and the DaemonSet Set up Config Sync to sync the configurations for the three environments

D.

Use Cloud Build to render and deploy the network policies and the DaemonSet Set up a Policy Controller to enforce the configurations for the three environments

You support a web application that runs on App Engine and uses CloudSQL and Cloud Storage for data storage. After a short spike in website traffic, you notice a big increase in latency for all user requests, increase in CPU use, and the number of processes running the application. Initial troubleshooting reveals:

After the initial spike in traffic, load levels returned to normal but users still experience high latency.

Requests for content from the CloudSQL database and images from Cloud Storage show the same high latency.

No changes were made to the website around the time the latency increased.

There is no increase in the number of errors to the users.

You expect another spike in website traffic in the coming days and want to make sure users don’t experience latency. What should you do?

A.

Upgrade the GCS buckets to Multi-Regional.

B.

Enable high availability on the CloudSQL instances.

C.

Move the application from App Engine to Compute Engine.

D.

Modify the App Engine configuration to have additional idle instances.

You support a service that recently had an outage. The outage was caused by a new release that exhausted the service memory resources. You rolled back the release successfully to mitigate the impact on users. You are now in charge of the post-mortem for the outage. You want to follow Site Reliability Engineering practices when developing the post-mortem. What should you do?

A.

Focus on developing new features rather than avoiding the outages from recurring.

B.

Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.

C.

Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.

D.

Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.

You recently created a Cloud Build pipeline for deploying Terraform code stored in a GitHub repository. You make Terraform code changes in short-lived branches and sometimes use tags during development. You tag releases with a semantic version when they are ready for deployment. You require your pipeline to apply the Terraform code whenever there is a new release, and you need to minimize operational overhead. What should you do?

A.

Create a build trigger with the * branch pattern.

B.

Create a build trigger with the \d+\.\d+\.\d* tag pattern.

C.

Create a build trigger with the .* tag pattern.

D.

Create a build trigger with the \d*\.\d+\.\d* branch pattern.

You are managing an MLOps platform on a Google Kubernetes Engine (GKE) cluster that serves two teams:

    The Research team runs hundreds of small, daily experiments that can run as capacity allows and are highly cost-sensitive.

    The Production team must retrain the company's main language model every Friday at 10:00 PM. This job is business-critical, cannot be interrupted, and must start on time.

You need to configure the modes for Dynamic Workload Scheduler to meet each team's requirements. What should you do?

A.

Use Flex Start mode for both the Research and Production teams.

B.

Use Flex Start mode for the Research team and Calendar mode for the Production team.

C.

Use Calendar mode for both the Research and Production teams.

D.

Use Calendar mode for the Research team and Flex Start mode for the Production team.

You are running an experiment to see whether your users like a new feature of a web application. Shortly after deploying the feature as a canary release, you receive a spike in the number of 500 errors sent to users, and your monitoring reports show increased latency. You want to quickly minimize the negative impact on users. What should you do first?

A.

Roll back the experimental canary release.

B.

Start monitoring latency, traffic, errors, and saturation.

C.

Record data for the postmortem document of the incident.

D.

Trace the origin of 500 errors and the root cause of increased latency.