Google Professional-Cloud-DevOps-Engineer - Google Cloud Certified - Professional Cloud DevOps Engineer Exam

Google Professional-Cloud-DevOps-Engineer Premium Access Download Demo

Page: 2 / 6
Total 194 questions

Your company uses a CI/CD pipeline with Cloud Build and Artifact Registry to deploy container images to Google Kubernetes Engine (GKE). Images are tagged with the latest commit hash and promoted to production after successful testing in the development and pre-production environments. A recent production deployment caused the application to fail due to untested integration functionality, requiring a disruptive manual rollback. During the rollback, you noticed many old and unused container images accumulating in Artifact Registry. You need to improve rollout and rollback management and clean up the old container images. What should you do?

Adopt Cloud Deploy for managing deployments, and schedule a Cloud Build job for container image cleanup.

Deploy Cloud Service Mesh across the GKE clusters, and manually clean up Artifact Registry images.

Adopt Cloud Deploy for managing deployments, and implement an Artifact Registry cleanup policy.

Set up a rollback pipeline in Cloud Build, and implement an Artifact Registry cleanup policy.

Question # 12

You are working with a government agency that requires you to archive application logs for seven years. You need to configure Stackdriver to export and store the logs while minimizing costs of storage. What should you do?

Create a Cloud Storage bucket and develop your application to send logs directly to the bucket.

Develop an App Engine application that pulls the logs from Stackdriver and saves them in BigQuery.

Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent storage for seven years.

Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived logs, and then select the bucket as the log export destination.

Question # 13

Your applicationâ€™s performance in Google Cloud has degraded since the last release. You suspect that downstream dependencies might be causing some requests to take longer to complete. You need to investigate the issue with your application to determine the cause. What should you do?

Configure Cloud Trace in your application.

Configure Error Reporting in your application.

Configure Cloud Profiler in your application.

Configure Google Cloud Managed Service for Prometheus in your application.

Question # 14

You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices. What should you do first?

Call individual stakeholders lo explain what happened.

Develop a post-mortem to be distributed to stakeholders.

Send the Incident State Document to all the stakeholders.

Require the engineer responsible to write an apology email to all stakeholders.

Question # 15

Your company allows teams to self-manage Google Cloud projects, including project-level Identity and Access Management (IAM). You are concerned that the team responsible for the Shared VPC project might accidentally delete the project, so a lien has been placed on the project. You need to design a solution to restrict Shared VPC project deletion to those with the resourcemanager.projects.updateLiens permission at the organization level. What should you do?

Enable VPC Service Controls for the container.googleapis.com API service.

Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project.

Enable the compute.restrictXpnProjectLienRemoval organization policy constraint.

Instruct teams to only perform IAM permission management as code with Terraform.

Explanation:

Comprehensive and Detailed Explanation From General Google Cloud IAM and Organization Policy Knowledge:

The core requirement is to prevent accidental deletion of a Shared VPC host project, even by project owners, by ensuring that only users with a specific permission at the organization level can remove the lien that protects the project.

A lien (resourcemanager.projects.delete) has already been placed on the project. This prevents its deletion. The challenge is to prevent the removal of this lien by project-level administrators.

The permission to remove a lien is resourcemanager.projectLiens.update (or resourcemanager.projects.updateLiens as stated in the question, which implies a broader update capability including liens).

Option A (Enable VPC Service Controls for the container.googleapis.com API service): VPC Service Controls are for data exfiltration prevention by creating service perimeters. They do not directly control IAM permissions for lien management or project deletion.

Option B (Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project): While this would prevent project-level users from removing the lien, it doesn't enforce therequirement that only users with this permission at the organization level can remove it. A project owner could potentially re-grant themselves this permission at the project level if not otherwise restricted. The goal is a stronger, centrally enforced restriction.

Option C (Enable the compute.restrictXpnProjectLienRemoval organization policy constraint): This is specifically designed for the scenario described.Organization Policies allow centralized control over resource configurations across the organization.

The compute.restrictXpnProjectLienRemoval constraint, when enforced (set to True), restricts the removal of liens on Shared VPC host projects. Only users who have the resourcemanager.projectLiens.update permission (or resourcemanager.projects.updateLiens) granted at the organization level can then remove such liens. This prevents project owners or other project-level principals from removing the lien unless they also have this specific permission at the org level.

Option D (Instruct teams to only perform IAM permission management as code with Terraform): While Infrastructure as Code (IaC) is a good practice for managing IAM, it's an operational guideline and doesn't technically enforce the restriction on lien removal. A user with sufficient project-level IAM permissions could still manually remove the lien via the console or gcloud if not prevented by an organization policy.

Therefore, enabling the compute.restrictXpnProjectLienRemoval organization policy is the direct and most effective way to meet the requirement.

Reference (Based on Google Cloud Organization Policy and Shared VPC documentation):

Google Cloud documentation on Resource Manager Liens: https://cloud.google.com/resource-manager/docs/project-liens

Google Cloud documentation on Organization Policy Constraints: https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints

Specifically, the compute.restrictXpnProjectLienRemoval constraint: "When set to true, liens on Shared VPC host projects can only be removed by users that have resourcemanager.projectLiens.update permission on the organization." (or similar wording indicating org-level permission is required). This constraint ensures that the protection afforded by the lien on a critical Shared VPC host project cannot be easily circumvented at the project level.

Question # 16

You are running a web application deployed to a Compute Engine managed instance group Ops Agent is installed on all instances You recently noticed suspicious activity from a specific IP address You need to configure Cloud Monitoring to view the number of requests from that specific IP address with minimal operational overhead. What should you do?

Configure the Ops Agent with a logging receiver Create a logs-based metric

Create a script to scrape the web server log Export the IP address request metrics to the Cloud Monitoring API

Update the application to export the IP address request metrics to the Cloud Monitoring API

Configure the Ops Agent with a metrics receiver

Question # 17

You are configuring connectivity across Google Kubernetes Engine (GKE) clusters in different VPCs You notice that the nodes in Cluster A are unable to access the nodes in Cluster B You suspect that the workload access issue is due to the network configuration You need to troubleshoot the issue but do not have execute access to workloads and nodes You want to identify the layer at which the network connectivity is broken What should you do?

Install a toolbox container on the node in Cluster A Confirm that the routes to Cluster B are configured appropriately

Use Network Connectivity Center to perform a Connectivity Test from Cluster A to Cluster

Use a debug container to run the traceroute command from Cluster A to Cluster B and from Cluster B to Cluster A Identify the common failure point

Enable VPC Flow Logs in both VPCs and monitor packet drops

Question # 18

Your company is developing applications that are deployed on Google Kubernetes Engine (GKE). Each team manages a different application. You need to create the development and production environments for each team, while minimizing costs. Different teams should not be able to access other teamsâ€™ environments. What should you do?

Create one GCP Project per team. In each project, create a cluster for Development and one for Production. Grant the teams IAM access to their respective clusters.

Create one GCP Project per team. In each project, create a cluster with a Kubernetes namespace for Development and one for Production. Grant the teams IAM access to their respective clusters.

Create a Development and a Production GKE cluster in separate projects. In each cluster, create a Kubernetes namespace per team, and then configure Identity Aware Proxy so that each team can only access its own namespace.

Create a Development and a Production GKE cluster in separate projects. In each cluster, create a Kubernetes namespace per team, and then configure Kubernetes Role-based access control (RBAC) so that each team can only access its own namespace.

Question # 19

You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OLJ and Communications Lead (CL). What should you do next?

Look for ways to mitigate user impact and deploy the mitigations to production.

Contact the affected service owners and update them on the status of the incident.

Establish a communication channel where incident responders and leads can communicate with each other.

Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input.

Question # 20

Your company is migrating its production systems to Google Cloud. You need to implement site reliability engineering (SRE) practices during the migration to minimize customer impact from potential future incidents. Which two SRE practices should you implement?

Choose 2 answers

Ensure that full autonomy and permissions are only granted to the on-call team.

Automate common tasks to analyze key impact information and intelligently suggest mitigating actions for the on-call team.

Ensure that all teams can modify the production environment to resolve issues.

Create an alerting mechanism for your SRE team based on your system's internal behavior.

Create up-to-date playbooks with instructions for debugging and mitigating issues.

Explanation:

Comprehensive and Detailed Explanation From General SRE Principles and Google Cloud Knowledge:

Site Reliability Engineering (SRE) emphasizes reliability, automation, and a data-driven approach to operations. The goal is to minimize the "time to detect" (TTD) and "time to resolve" (TTR) for incidents.

Option A (Ensure that full autonomy and permissions are only granted to the on-call team): While the on-call team needs appropriate permissions to act decisively during an incident, granting full autonomy and only to them can be a bottleneck and goes against the principle of least privilege if not carefully scoped. Broader teams might need specific, controlled access for their responsibilities. SRE encourages empowering teams but within a structured framework.

Option B (Automate common tasks to analyze key impact information and intelligently suggest mitigating actions for the on-call team): This is a core SRE practice. Automation reduces toil, speeds up response, and ensures consistency. Analyzing impact and suggesting mitigations helps the on-call team resolve issues faster and more effectively.

Option C (Ensure that all teams can modify the production environment to resolve issues): This is generally a bad practice and against SRE principles of controlled changes and reducing the blast radius of errors. Production changes should be managed, audited, and ideally automated, not open to modification by all teams, as this increases the risk of unintended incidents.

Option D (Create an alerting mechanism for your SRE team based on your system's internal behavior): While alerting is crucial, SRE emphasizes alerting on symptoms that affect users (Service Level Objectives - SLOs) rather than just internal behavior or causes. Alerting solely on internal behavior can lead to alert fatigue and may not correlate directly with user impact. Good alerting focuses on user-facing impact first.

Option E (Create up-to-date playbooks with instructions for debugging and mitigating issues): Playbooks (or runbooks) are essential in SRE. They document known issues, troubleshooting steps, and mitigation procedures. Keeping them up-to-date ensures that on-call engineers can respond to incidents quickly and consistently, even for less common issues, thereby minimizing customer impact.

Therefore, automating incident response tasks (B) and maintaining clear, actionable playbooks (E) are two key SRE practices to implement for minimizing customer impact.

Reference (Based on SRE principles):

The SRE books by Google (e.g., "Site Reliability Engineering: How Google Runs Production Systems") heavily emphasize automation to reduce toil and the importance of playbooks for incident management.

Google Cloud SRE solutions: https://cloud.google.com/sre

Specifically, regarding playbooks and automation:"Playbooks should be living documents, updated regularly as systems change and new incidents provide new lessons."

"SREs aim to automate repetitive tasks (toil) to free up time for engineering projects that improve reliability."

Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Google Professional-Cloud-DevOps-Engineer - Google Cloud Certified - Professional Cloud DevOps Engineer Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: