Google Professional-Machine-Learning-Engineer - Google Professional Machine Learning Engineer

Google Professional-Machine-Learning-Engineer Premium Access Download Demo

Page: 5 / 9
Total 285 questions

You work for a retail company. You have created a Vertex Al forecast model that produces monthly item sales predictions. You want to quickly create a report that will help to explain how the model calculates the predictions. You have one month of recent actual sales data that was not included in the training dataset. How should you generate data for your report?

Create a batch prediction job by using the actual sales data Compare the predictions to the actuals in the report.

Create a batch prediction job by using the actual sates data and configure the job settings to generate feature attributions. Compare the results in the report.

Generate counterfactual examples by using the actual sales data Create a batch prediction job using the

actual sales data and the counterfactual examples Compare the results in the report.

Train another model by using the same training dataset as the original and exclude some columns. Using the actual sales data create one batch prediction job by using the new model and another one with the original model Compare the two sets of predictions in the report.

Question # 42

You developed a custom model by using Vertex Al to predict your application's user churn rate You are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery contains two sets of features - demographic and behavioral You later discover that two separate models trained on each set perform better than the original model

You need to configure a new model mentioning pipeline that splits traffic among the two models You want to use the same prediction-sampling-rate and monitoring-frequency for each model You also want to minimize management effort What should you do?

Keep the training dataset as is Deploy the models to two separate endpoints and submit two Vertex Al Model Monitoring jobs with appropriately selected feature-thresholds parameters

Keep the training dataset as is Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and feature selections

Separate the training dataset into two tables based on demographic and behavioral features Deploy the models to two separate endpoints, and submit two Vertex Al Model Monitoring jobs

Separate the training dataset into two tables based on demographic and behavioral features. Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and training datasets

Question # 43

Your data science team is training a PyTorch model for image classification based on a pre-trained RestNet model. You need to perform hyperparameter tuning to optimize for several parameters. What should you do?

Convert the model to a Keras model, and run a Keras Tuner job.

Run a hyperparameter tuning job on AI Platform using custom containers.

Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.

Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.

Question # 44

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex Al Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

Create a component in the Vertex Al Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.

Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.

Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.

Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.

Question # 45

You need to execute a batch prediction on 100Â million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline. What should you do?

Import the TensorFlow model with BigQuery ML, and run the ml.predict function.

Use the TensorFlow BigQuery reader to load the data, and use the BigQuery API to write the results to BigQuery.

Create a Dataflow pipeline to convert the data in BigQuery to TFRecords. Run a batch inference on Vertex AI Prediction, and write the results to BigQuery.

Load the TensorFlow SavedModel in a Dataflow pipeline. Use the BigQuery I/O connector with a custom function to perform the inference within the pipeline, and write the results to BigQuery.

Explanation:

Option A is correct because importing the TensorFlow model with BigQuery ML, and running the ml.predict function is the easiest way to execute a batch prediction on a large BigQuery table with a custom TensorFlow model, and store the predicted results in another BigQuery table.Â BigQuery ML allows you to import TensorFlow models that are stored in Cloud Storage, and use them for prediction with SQL queries1.Â The ml.predict function returns a table with the predicted values, which can be saved to another BigQuery table2.

Option B is incorrect because using the TensorFlow BigQuery reader to load the data, and using the BigQuery API to write the results to BigQuery requires more effort to build the inference pipeline than option A.Â The TensorFlow BigQuery reader is a way to read data from BigQuery into TensorFlow datasets, which can be used for training or prediction3.Â However, this option also requires writing code to load the TensorFlow model, run the prediction, and use the BigQuery API to write the results back to BigQuery4.

Option C is incorrect because creating a Dataflow pipeline to convert the data in BigQuery to TFRecords, running a batch inference on Vertex AI Prediction, and writing the results to BigQuery requires more effort to build the inference pipeline than option A.Â Dataflow is a service for creating and running data processing pipelines, such as ETL (extract, transform, load) or batch processing5. Vertex AI Prediction is a service for deploying and serving ML models for online or batch prediction. However, this option also requires writing code to create the Dataflow pipeline, convert the data to TFRecords, run the batch inference, and write the results to BigQuery.

Option D is incorrect because loading the TensorFlow SavedModel in a Dataflow pipeline, using the BigQuery I/O connector with a custom function to perform the inference within the pipeline, and writing the results to BigQuery requires more effort to build the inference pipeline than option A. The BigQuery I/O connector is a way to read and write data from BigQuery within a Dataflow pipeline. However, this option also requires writing code to load the TensorFlow SavedModel, create the custom function for inference, and write the results to BigQuery.

References:

Importing models into BigQuery ML

Using imported models for prediction

TensorFlow BigQuery reader

BigQuery API

Dataflow overview

[Vertex AI Prediction overview]

[Batch prediction with Dataflow]

[BigQuery I/O connector]

[Using TensorFlow models in Dataflow]

Question # 46

You work for a company that is developing a new video streaming platform. You have been asked to create a recommendation system that will suggest the next video for a user to watch. After a review by an AI Ethics team, you are approved to start development. Each video asset in your companyâ€™s catalog has useful metadata (e.g., content type, release date, country), but you do not have any historical user event data. How should you build the recommendation system for the first version of the product?

Launch the product without machine learning. Present videos to users alphabetically, and start collecting user event data so you can develop a recommender model in the future.

Launch the product without machine learning. Use simple heuristics based on content metadata to recommend similar videos to users, and start collecting user event data so you can develop a recommender model in the future.

Launch the product with machine learning. Use a publicly available dataset such as MovieLens to train a model using the Recommendations AI, and then apply this trained model to your data.

Launch the product with machine learning. Generate embeddings for each video by training an autoencoder on the content metadata using TensorFlow. Cluster content based on the similarity of these embeddings, and then recommend videos from the same cluster.

Question # 47

You recently trained a XGBoost model that you plan to deploy to production for online inference Before sending a predict request to your model's binary you need to perform a simple data preprocessing step This step exposes a REST API that accepts requests in your internal VPC Service Controls and returns predictions You want to configure this preprocessing step while minimizing cost and effort What should you do?

Store a pickled model in Cloud Storage Build a Flask-based app packages the app in a custom container image, and deploy the model to Vertex Al Endpoints.

Build a Flask-based app. package the app and a pickled model in a custom container image, and deploy the model to Vertex Al Endpoints.

Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK. package it and a pickled model in a custom container image based on a Vertex built-in image, and deploy the model to Vertex Al Endpoints.

Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK and package the handler in a custom container image based on a Vertex built-in container image Store a pickled model in Cloud Storage and deploy the model to Vertex Al Endpoints.

Question # 48

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE â€˜myproject.mydataset.trainingâ€˜ AS

(SELECT * FROM â€˜myproject.mydataset.mytableâ€˜ WHERE RAND() <= 0.8);

CREATE OR REPLACE TABLE â€˜myproject.mydataset.validationâ€˜ AS

(SELECT * FROM â€˜myproject.mydataset.mytableâ€˜ WHERE RAND() <= 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

There is training-serving skew in your production environment.

There is not a sufficient amount of training data.

The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table.

The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the training table.

Question # 49

You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?

Remove the data transformation step from your pipeline.

Containerize the PySpark transformation step, and add it to your pipeline.

Add a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage.

Deploy Apache Spark at a separate node pool in a Google Kubernetes Engine cluster. Add a ContainerOp to your pipeline that invokes a corresponding transformation job for this Spark instance.

Explanation:

The best option for parametrizing the model training in Kubeflow Pipelines is to add a ContainerOp to the pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage. This option has the following advantages:

It allows the data transformation to be performed as part of the Kubeflow Pipeline, which can ensure the consistency and reproducibility of the data processing and the model training. By adding a ContainerOp to the pipeline, you can define the parameters and the logic of the data transformation step, and integrate it with the other steps of the pipeline, such as the model training and evaluation.

It leverages the scalability and performance of Dataproc, which is a fully managed service that runs Apache Spark and Apache Hadoop clusters on Google Cloud. By spinning a Dataproc cluster, you can run the PySpark transformation on the Parquet files stored in the Hive table, and take advantage of the parallelism and speed of Spark. Dataproc also supports various features and integrations, such as autoscaling, preemptible VMs, and connectors to other Google Cloud services, that can optimize the data processing and reduce the cost.

It simplifies the data storage and access, as the transformed data is saved in Cloud Storage, which is a scalable, durable, and secure object storage service. By saving the transformed data in Cloud Storage, you can avoid the overhead and complexity of managing the data in the Hive table or the Parquet files. Moreover, you can easily access the transformed data from Cloud Storage, using various tools and frameworks, such as TensorFlow, BigQuery, or Vertex AI.

The other options are less optimal for the following reasons:

Option A: Removing the data transformation step from the pipeline eliminates the parametrization of the model training, as the data processing and the model training are decoupled and independent. This option requires running the PySpark transformation separately from the Kubeflow Pipeline, which can introduce inconsistency and unreproducibility in the data processing and the model training. Moreover, this option requires managing the data in the Hive table or the Parquet files, which can be cumbersome and inefficient.

Option B: Containerizing the PySpark transformation step, and adding it to the pipeline introduces additional complexity and overhead. This option requires creating and maintaining a Docker image that can run the PySpark transformation, which can be challenging and time-consuming. Moreover, this option requires running the PySpark transformation on a single container, which can be slow and inefficient, as it does not leverage the parallelism and performance of Spark.

Option D: Deploying Apache Spark at a separate node pool in a Google Kubernetes Engine cluster, and adding a ContainerOp to the pipeline that invokes a corresponding transformation job for this Spark instance introduces additional complexity and cost. This option requires creating and managing a separate node pool in a Google Kubernetes Engine cluster, which is a fully managed service that runs Kubernetes clusters on Google Cloud. Moreover, this option requires deploying and running Apache Spark on the node pool, which can be tedious and costly, as it requires configuring and maintaining the Spark cluster, and paying for the node pool usage.

Question # 50

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job

Use the gcloud command-line tool to submit training jobs on Al Platform when you update your code

Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository

Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Explanation:

Developing ML models with AI Platform for image segmentation on CT scans requires a lot of computation and experimentation, as image segmentation is a complex and challenging task that involves assigning a label to each pixel in an image.Â Image segmentation can be used for various medical applications, such as tumor detection, organ segmentation, or lesion localization1

To minimize the computation costs and manual intervention while having version control for the code, one should use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository. Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure.Â Cloud Build can import source code from Cloud Source Repositories, Cloud Storage, GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives2

Cloud Build allows you to set up automated triggers that start a build when changes are pushed to a source code repository.Â You can configure triggers to filter the changes based on the branch, tag, or file path3

Cloud Source Repositories is a service that provides fully managed private Git repositories on Google Cloud Platform. Cloud Source Repositories allows you to store, manage, and track your code using the Git version control system.Â You can also use Cloud Source Repositories to connect to other Google Cloud services, such as Cloud Build, Cloud Functions, or Cloud Run4

To use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository, you need to do the following steps:

Create a Cloud Source Repository for your code, and push your code to the repository.Â You can use the Cloud SDK, Cloud Console, or Cloud Source Repositories API to create and manage your repository5

Create a Cloud Build trigger for your repository, and specify the build configuration and the trigger settings. You can use the Cloud SDK, Cloud Console, or Cloud Build API to create and manage your trigger.

Specify the steps of the build in a YAML or JSON file, such as installing the dependencies, running the tests, building the container image, and submitting the training job to AI Platform. You can also use the Cloud Build predefined or custom build steps to simplify your build configuration.

Push your new code to the repository, and the trigger will start the build automatically. You can monitor the status and logs of the build using the Cloud SDK, Cloud Console, or Cloud Build API.

The other options are not as easy or feasible. Using Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job is not ideal, as Cloud Functions has limitations on the memory, CPU, and execution time, and does not provide a user interface for managing and tracking your builds. Using the gcloud command-line tool to submit training jobs on AI Platform when you update your code is not optimal, as it requires manual intervention and does not leverage the benefits of Cloud Build and its integration with Cloud Source Repositories. Creating an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor is not relevant, as Cloud Composer is mainly designed for orchestrating complex workflows across multiple systems, and does not provide a version control system for your code.

References:Â 1:Â Image segmentationÂ 2:Â Cloud Build overviewÂ 3:Â Creating and managing build triggersÂ 4:Â Cloud Source Repositories overviewÂ 5:Â Quickstart: Create a repositoryÂ : [Quickstart: Create a build trigger] : [Configuring builds] : [Viewing build results]

Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Google Professional-Machine-Learning-Engineer - Google Professional Machine Learning Engineer

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: