11.11 Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Google Professional-Data-Engineer - Google Professional Data Engineer Exam

Page: 1 / 8
Total 387 questions

You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?

A.

Set a retention policy. Lock the retention policy.

B.

Set a retention policy. Set the default storage class to Archive for long-term digital preservation.

C.

Enable the Object Versioning feature. Add a lifecycle rule.

D.

Enable the Object Versioning feature. Create a copy in a bucket in a different region.

You are designing a data warehouse in BigQuery to analyze sales data for a telecommunication service provider. You need to create a data model for customers, products, and subscriptions All customers, products, and subscriptions can be updated monthly, but you must maintain a historical record of all data. You plan to use the visualization layer for current and historical reporting. You need to ensure that the data model is simple, easy-to-use. and cost-effective. What should you do?

A.

Create a normalized model with tables for each entity. Use snapshots before updates to track historical data

B.

Create a normalized model with tables for each entity. Keep all input files in a Cloud Storage bucket to track historical data

C.

Create a denormalized model with nested and repeated fields Update the table and use snapshots to track historical data

D.

Create a denormalized, append-only model with nested and repeated fields Use the ingestion timestamp to track historical data.

You want to schedule a number of sequential load and transformation jobs Data files will be added to a Cloud Storage bucket by an upstream process There is no fixed schedule for when the new data arrives Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery The transformation jobs are different for every table These jobs might take hours to complete You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?

A.

1Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage. Dataproc. and BigQuery operators2 Use a single shared DAG for all tables that need to go through the pipeline3 Schedule the DAG to run hourly

B.

1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators.2 Create a separate DAG for each table that needs to go through the pipeline3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

C.

1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage, Dataproc. and BigQuery operators2 Create a separate DAG for each table that needs to go through the pipeline3 Schedule the DAGs to run hourly

D.

1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators2 Use a single shared DAG for all tables that need to go through the pipeline.3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

Your startup has a web application that currently serves customers out of a single region in Asia. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do?

A.

Use Cloud Spanner to configure a single region instance initially. and then configure multi-region C oud Spanner instances after securing funding.

B.

Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US. Europe, and Asiareplication alter securing funding

C.

Use a Cloud SQL for PostgreSQL zonal instance first and Bigtable with US. Europe, and Asia after securing funding.

D.

Use a Cloud SOL for PostgreSQL zonal instance first, and Cloud SOL for PostgreSQL with highly available configuration after securing funding.

You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. Which two actions should you take? (Choose two.)

A.

Configure your Cloud Dataflow pipeline to use local execution

B.

Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions

C.

Increase the number of nodes in the Cloud Bigtable cluster

D.

Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable

E.

Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable

You used Cloud Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a daily upload of data with the same schema, after the load job with variable execution time completes. What should you do?

A.

Create a cron schedule in Cloud Dataprep.

B.

Create an App Engine cron job to schedule the execution of the Cloud Dataprep job.

C.

Export the recipe as a Cloud Dataprep template, and create a job in Cloud Scheduler.

D.

Export the Cloud Dataprep job as a Cloud Dataflow template, and incorporate it into a Cloud Composer job.

You need to choose a database for a new project that has the following requirements:

Fully managed

Able to automatically scale up

Transactionally consistent

Able to scale up to 6 TB

Able to be queried using SQL

Which database do you choose?

A.

Cloud SQL

B.

Cloud Bigtable

C.

Cloud Spanner

D.

Cloud Datastore

A web server sends click events to a Pub/Sub topic as messages. The web server includes an event Timestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow streaming job that reads from this Pub/Sub topic through a subscription, applies some transformations, and writes the result to another Pub/Sub topic for use by the advertising department. The advertising department needs to receive each message within 30 seconds of the corresponding click occurrence, but they report receiving the messages late. Your Dataflow job's system lag is about 5 seconds, and the data freshness is about 40 seconds. Inspecting a few messages show no more than 1 second lag between their event Timestamp and publish Time. What is the problem and what should you do?

A.

The advertising department is causing delays when consuming the messages. Work with the advertising department to fix this.

B.

Messages in your Dataflow job are processed in less than 30 seconds, but your job cannot keep up with the backlog in the Pub/Subsubscription. Optimize your job or increase the number of workers to fix this.

C.

The web server is not pushing messages fast enough to Pub/Sub. Work with the web server team to fix this.

D.

Messages in your Dataflow job are taking more than 30 seconds to process. Optimize your job or increase the number of workers to fix this.

You are migrating your data warehouse to Google Cloud and decommissioning your on-premises data center Because this is a priority for your company, you know that bandwidth will be made available for the initial data load to the cloud. The files being transferred are not large in number, but each file is 90 GB Additionally, you want your transactional systems to continually update the warehouse on Google Cloud in real time What tools should you use to migrate the data and ensure that it continues to write to your warehouse?

A.

Storage Transfer Service for the migration, Pub/Sub and Cloud Data Fusion for the real-time updates

B.

BigQuery Data Transfer Service for the migration, Pub/Sub and Dataproc for the real-time updates

C.

gsutil for the migration; Pub/Sub and Dataflow for the real-time updates

D.

gsutil for both the migration and the real-time updates

Different teams in your organization store customer and performance data in BigOuery. Each team needs to keep full control of their collected data, be able to query data within their projects, and be able to exchange their data with other teams. You need to implement an organization-wide solution, while minimizing operational tasks and costs. What should you do?

A.

Create a BigQuery scheduled query to replicate all customer data into team projects.

B.

Enable each team to create materialized views of the data they need to access in their projects.

C.

Ask each team to publish their data in Analytics Hub. Direct the other teams to subscribe to them.

D.

Ask each team to create authorized views of their data. Grant the biquery. jobUser role to each team.