Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Google Associate-Data-Practitioner - Google Cloud Associate Data Practitioner (ADP Exam)

Your data science team needs to collaboratively analyze a 25 TB BigQuery dataset to support the development of a machine learning model. You want to use Colab Enterprise notebooks while ensuring efficient data access and minimizing cost. What should you do?

A.

Export the BigQuery dataset to Google Drive. Load the dataset into the Colab Enterprise notebook using Pandas.

B.

Use BigQuery magic commands within a Colab Enterprise notebook to query and analyze the data.

C.

Create a Dataproc cluster connected to a Colab Enterprise notebook, and use Spark to process the data in BigQuery.

D.

Copy the BigQuery dataset to the local storage of the Colab Enterprise runtime, and analyze the data using Pandas.

You are working on a data pipeline that will validate and clean incoming data before loading it into BigQuery for real-time analysis. You want to ensure that the data validation and cleaning is performed efficiently and can handle high volumes of data. What should you do?

A.

Write custom scripts in Python to validate and clean the data outside of Google Cloud. Load the cleaned data into BigQuery.

B.

Use Cloud Run functions to trigger data validation and cleaning routines when new data arrives in Cloud Storage.

C.

Use Dataflow to create a streaming pipeline that includes validation and transformation steps.

D.

Load the raw data into BigQuery using Cloud Storage as a staging area, and use SQL queries in BigQuery to validate and clean the data.

Your company’s ecommerce website collects product reviews from customers. The reviews are loaded as CSV files daily to a Cloud Storage bucket. The reviews are in multiple languages and need to be translated to Spanish. You need to configure a pipeline that is serverless, efficient, and requires minimal maintenance. What should you do?

A.

Load the data into BigQuery using Dataproc. Use Apache Spark to translate the reviews by invoking the Cloud Translation API. Set BigQuery as the sink.U

B.

Use a Dataflow templates pipeline to translate the reviews using the Cloud Translation API. Set BigQuery as the sink.

C.

Load the data into BigQuery using a Cloud Run function. Use the BigQuery ML create model statement to train a translation model. Use the model to translate the product reviews within BigQuery.

D.

Load the data into BigQuery using a Cloud Run function. Create a BigQuery remote function that invokes the Cloud Translation API. Use a scheduled query to translate new reviews.

Your retail organization stores sensitive application usage data in Cloud Storage. You need to encrypt the data without the operational overhead of managing encryption keys. What should you do?

A.

Use Google-managed encryption keys (GMEK).

B.

Use customer-managed encryption keys (CMEK).

C.

Use customer-supplied encryption keys (CSEK).

D.

Use customer-supplied encryption keys (CSEK) for the sensitive data and customer-managed encryption keys (CMEK) for the less sensitive data.

You are constructing a data pipeline to process sensitive customer data stored in a Cloud Storage bucket. You need to ensure that this data remains accessible, even in the event of a single-zone outage. What should you do?

A.

Set up a Cloud CDN in front of the bucket.

B.

Enable Object Versioning on the bucket.

C.

Store the data in a multi-region bucket.

D.

Store the data in Nearline storaqe.

You have a Dataproc cluster that performs batch processing on data stored in Cloud Storage. You need to schedule a daily Spark job to generate a report that will be emailed to stakeholders. You need a fully-managed solution that is easy to implement and minimizes complexity. What should you do?

A.

Use Cloud Composer to orchestrate the Spark job and email the report.

B.

Use Dataproc workflow templates to define and schedule the Spark job, and to email the report.

C.

Use Cloud Run functions to trigger the Spark job and email the report.

D.

Use Cloud Scheduler to trigger the Spark job. and use Cloud Run functions to email the report.

Your company wants to implement a data transformation (ETL) pipeline for their BigQuery data warehouse. You need to identify a managed transformation solution that allows users to develop with SQL and JavaScript, has version control, allows for modular code, and has data quality checks. What should you do?

A.

Create a Cloud Composer environment, and orchestrate the transformations by using the BigQueryinsertJob operator.

B.

Create BigQuery scheduled queries to define the transformations in SQL.

C.

Use Dataform to define the transformations in SQLX.

D.

Use Dataproc to create an Apache Spark cluster and implement the transformations by using PySpark SQL.

Your retail company wants to predict customer churn using historical purchase data stored in BigQuery. The dataset includes customer demographics, purchase history, and a label indicating whether the customer churned or not. You want to build a machine learning model to identify customers at risk of churning. You need to create and train a logistic regression model for predicting customer churn, using the customer_data table with the churned column as the target label. Which BigQuery ML query should you use?

A)

B)

C)

D)

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Your organization sends IoT event data to a Pub/Sub topic. Subscriber applications read and perform transformations on the messages before storing them in the data warehouse. During particularly busy times when more data is being written to the topic, you notice that the subscriber applications are not acknowledging messages within the deadline. You need to modify your pipeline to handle these activity spikes and continue to process the messages. What should you do?

A.

Retry messages until they are acknowledged.

B Implement flow control on the subscribers

B.

Forward unacknowledged messages to a dead-letter topic.

C.

Seek back to the last acknowledged message.

You have a Cloud SQL for PostgreSQL database that stores sensitive historical financial data. You need to ensure that the data is uncorrupted and recoverable in the event that the primary region is destroyed. The data is valuable, so you need to prioritize recovery point objective (RPO) over recovery time objective (RTO). You want to recommend a solution that minimizes latency for primary read and write operations. What should you do?

A.

Configure the Cloud SQL for PostgreSQL instance for multi-region backup locations.

B.

Configure the Cloud SQL for PostgreSQL instance for regional availability (HA). Back up the Cloud SQL for PostgreSQL database hourly to a Cloud Storage bucket in a different region.

C.

Configure the Cloud SQL for PostgreSQL instance for regional availability (HA) with synchronousreplication to a secondary instance in a different zone.

D.

Configure the Cloud SQL for PostgreSQL instance for regional availability (HA) with asynchronous replication to a secondary instance in a different region.