New Year Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Google Professional-Data-Engineer - Google Professional Data Engineer Exam

Page: 5 / 8
Total 387 questions

Your company handles data processing for a number of different clients. Each client prefers to use their own suite of analytics tools, with some allowing direct query access via Google BigQuery. You need to secure the data so that clients cannot see each other’s data. You want to ensure appropriate access to the data. Which three steps should you take? (Choose three.)

A.

Load data into different partitions.

B.

Load data into a different dataset for each client.

C.

Put each client’s BigQuery dataset into a different table.

D.

Restrict a client’s dataset to approved users.

E.

Only allow a service account to access the datasets.

F.

Use the appropriate identity and access management (IAM) roles for each client’s users.

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

A.

Include ORDER BY DESK on timestamp column and LIMIT to 1.

B.

Use GROUP BY on the unique ID column and timestamp column and SUM on the values.

C.

Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.

D.

Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.)

A.

Disable writes to certain tables.

B.

Restrict access to tables by role.

C.

Ensure that the data is encrypted at all times.

D.

Restrict BigQuery API access to approved users.

E.

Segregate data across multiple tables or databases.

F.

Use Google Stackdriver Audit Logging to determine policy violations.

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

A.

Use a row key of the form .

B.

Use a row key of the form .

C.

Use a row key of the form #.

D.

Use a row key of the form >##.

Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

A.

Issue a command to restart the database servers.

B.

Retry the query with exponential backoff, up to a cap of 15 minutes.

C.

Retry the query every second until it comes back online to minimize staleness of data.

D.

Reduce the query frequency to once every hour until the database comes back online.

You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update. What should you do?

A.

Update the current pipeline and use the drain flag.

B.

Update the current pipeline and provide the transform mapping JSON object.

C.

Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.

D.

Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

The data scientists have written the following code to read the data for a new key features in the logs.

BigQueryIO.Read

.named(“ReadLogData”)

.from(“clouddataflow-readonly:samples.log_data”)

You want to improve the performance of this data read. What should you do?

A.

Specify the TableReference object in the code.

B.

Use .fromQuery operation to read specific fields from the table.

C.

Use of both the Google BigQuery TableSchema and TableFieldSchema classes.

D.

Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

A.

Disable caching by editing the report settings.

B.

Disable caching in BigQuery by editing table details.

C.

Refresh your browser tab showing the visualizations.

D.

Clear your browser history for the past hour then reload the tab showing the virtualizations.

Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

A.

Put the data into Google Cloud Storage.

B.

Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.

C.

Tune the Cloud Dataproc cluster so that there is just enough disk for all data.

D.

Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

A.

Supervised learning to determine which transactions are most likely to be fraudulent.

B.

Unsupervised learning to determine which transactions are most likely to be fraudulent.

C.

Clustering to divide the transactions into N categories based on feature similarity.

D.

Supervised learning to predict the location of a transaction.

E.

Reinforcement learning to predict the location of a transaction.

F.

Unsupervised learning to predict the location of a transaction.