Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Google Professional-Data-Engineer - Google Professional Data Engineer Exam

Page: 2 / 7
Total 376 questions

Your startup has a web application that currently serves customers out of a single region in Asia. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do?

A.

Use Cloud Spanner to configure a single region instance initially. and then configure multi-region C oud Spanner instances after securing funding.

B.

Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US. Europe, and Asia

replication alter securing funding

C.

Use a Cloud SQL for PostgreSQL zonal instance first and Bigtable with US. Europe, and Asia after securing funding.

D.

Use a Cloud SOL for PostgreSQL zonal instance first, and Cloud SOL for PostgreSQL with highly available configuration after securing funding.

You work for a farming company. You have one BigQuery table named sensors, which is about 500 MB and contains the list of your 5000 sensors, with columns for id, name, and location. This table is updated every hour. Each sensor generates one metric every 30 seconds along with a timestamp. which you want to store in BigQuery. You want to run an analytical query on the data once a week for monitoring purposes. You also want to minimize costs. What data model should you use?

A.

1. Create a retries column in the sensor? table.

2. Set record type and repeated mode for the metrics column.

3. Use an UPDATE statement every 30 seconds to add new metrics.

B.

1. Create a metrics column in the sensors table.

2. Set RECORD type and REPEATED mode for the metrics column.

3. Use an INSERT statement every 30 seconds to add new metrics.

C.

1. Create a metrics table partitioned by timestamp.

2. Create a sensorld column in the metrics table, that points to the id column in the sensors table.

3. Use an IHSEW statement every 30 seconds to append new metrics to the metrics table.

4. Join the two tables, if needed, when running the analytical query.

D.

1. Create a metrics table partitioned by timestamp.

2. Create a sensor Id column in the metrics table, that points to the _d column in the sensors table.

3. Use an UPDATE statement every 30 seconds to append new metrics to the metrics table.

4. Join the two tables, if needed, when running the analytical query.

You are designing a Dataflow pipeline for a batch processing job. You want to mitigate multiple zonal failures at job submission time. What should you do?

A.

Specify a worker region by using the —region flag.

B.

Set the pipeline staging location as a regional Cloud Storage bucket.

C.

Submit duplicate pipelines in two different zones by using the —zone flag.

D.

Create an Eventarc trigger to resubmit the job in case of zonal failure when submitting the job.

You are designing a messaging system by using Pub/Sub to process clickstream data with an event-driven consumer app that relies on a push subscription. You need to configure the messaging system that is reliable enough to handle temporary downtime of the consumer app. You also need the messaging system to store the input messages that cannot be consumed by the subscriber. The system needs to retry failed messages gradually, avoiding overloading the consumer app, and store the failed messages after a maximum of 10 retries in a topic. How should you configure the Pub/Sub subscription?

A.

Increase the acknowledgement deadline to 10 minutes.

B.

Use immediate redelivery as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

C.

Use exponential backoff as the subscription retry policy, and configure dead lettering to the same source topic with maximum delivery attempts set to 10.

D.

Use exponential backoff as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

A.

Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery

B.

Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.

C.

Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL

queries to transform the data, and then write the transformations to a new table

D.

Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery

You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.

You have the following requirements:

    You will batch-load the posts once per day and run them through the Cloud Natural Language API.

    You will extract topics and sentiment from the posts.

    You must store the raw posts for archiving and reprocessing.

    You will create dashboards to be shared with people both inside and outside your organization.

You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?

A.

Store the social media posts and the data extracted from the API in BigQuery.

B.

Store the social media posts and the data extracted from the API in Cloud SQL.

C.

Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.

D.

Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.

You need to create a new transaction table in Cloud Spanner that stores product sales data. You are deciding what to use as a primary key. From a performance perspective, which strategy should you choose?

A.

The current epoch time

B.

A concatenation of the product name and the current epoch time

C.

A random universally unique identifier number (version 4 UUID)

D.

The original order identification number from the sales system, which is a monotonically increasing integer

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern.

Which service do you select for storing and serving your data?

A.

Cloud Spanner

B.

Cloud Bigtable

C.

Cloud Firestore

D.

Cloud SQL

You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You’ve collected a labeled dataset that has on average 1000 examples for each unique component. Your team wants to implement an app to help warehouse workers recognize incoming components based on a photo of the component. You want to implement the first working version of this app (as Proof-Of-Concept) within a few working days. What should you do?

A.

Use Cloud Vision AutoML with the existing dataset.

B.

Use Cloud Vision AutoML, but reduce your dataset twice.

C.

Use Cloud Vision API by providing custom labels as recognition hints.

D.

Train your own image recognition model leveraging transfer learning techniques.

Does Dataflow process batch data pipelines or streaming data pipelines?

A.

Only Batch Data Pipelines

B.

Both Batch and Streaming Data Pipelines

C.

Only Streaming Data Pipelines

D.

None of the above