Databricks Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Databricks Databricks-Machine-Learning-Associate Premium Access Download Demo

Page: 1 / 3
Total 74 questions

A data scientist has developed a machine learning pipeline with a static input data set using Spark ML, but the pipeline is taking too long to process. They increase the number of workers in the cluster to get the pipeline to run more efficiently. They notice that the number of rows in the training set after reconfiguring the cluster is different from the number of rows in the training set prior to reconfiguring the cluster.

Which of the following approaches will guarantee a reproducible training and test set for each model?

Manually configure the cluster

Write out the split data sets to persistent storage

Set a speed in the data splitting operation

Manually partition the input data

Question # 2

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFramefeatures_df. A list of the names of the string columns is assigned to theinput_columnsvariable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

They need to specify the method parameter to the OneHotEncoder.

They need to remove the line with the fit operation.

They need to use Stringlndexer prior to one-hot encodinq the features.

They need to useVectorAssemblerprior to one-hot encoding the features.

Question # 3

Which statement describes a Spark ML transformer?

A transformer is an algorithm which can transform one DataFrame into another DataFrame

A transformer is a hyperparameter grid that can be used to train a model

A transformer chains multiple algorithms together to transform an ML workflow

A transformer is a learning algorithm that can use a DataFrame to train a model

Question # 4

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

Keras

pandas

PvTorch

Spark ML

Scikit-learn

Question # 5

A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric columnâ€™s median value.

They have developed the following code block to accomplish this task:

The code block is not accomplishing the task.

Which reasons describes why the code block is not accomplishing the imputation task?

It does not impute both the training and test data sets.

The inputCols and outputCols need to be exactly the same.

The fit method needs to be called instead of transform.

It does not fit the imputer on the data to create an ImputerModel.

Question # 6

A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data.

Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?

They can refactor their notebook to process the data in parallel.

They can refactor their notebook to use the PySpark DataFrame API.

They can refactor their notebook to use the Scala Dataset API.

They can refactor their notebook to use Spark SQL.

They can refactor their notebook to utilize the pandas API on Spark.

Question # 7

A data scientist has written a feature engineering notebook that utilizes the pandas library. As the size of the data processed by the notebook increases, the notebook's runtime is drastically increasing, but it is processing slowly as the size of the data included in the process increases.

Which of the following tools can the data scientist use to spend the least amount of time refactoring their notebook to scale with big data?

PySpark DataFrame API

pandas API on Spark

Spark SQL

Feature Store

Question # 8

Which of the following hyperparameter optimization methods automatically makes informed selections of hyperparameter values based on previous trials for each iterative model evaluation?

Random Search

Halving Random Search

Tree of Parzen Estimators

Grid Search

Question # 9

A data scientist has produced two models for a single machine learning problem. One of the models performs well when one of the features has a value of less than 5, and the other model performs well when the value of that feature is greater than or equal to 5. The data scientist decides to combine the two models into a single machine learning solution.

Which of the following terms is used to describe this combination of models?

Bootstrap aggregation

Support vector machines

Bucketing

Ensemble learning

Stacking

Question # 10

A data scientist has replaced missing values in their feature set with each respective feature variableâ€™s median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.

Which of the following approaches can they take to include as much information as possible in the feature set?

Impute the missing values using each respective feature variable's mean value instead of the median value

Refrain from imputing the missing values in favor of letting the machine learning algorithm determine how to handle them

Remove all feature variables that originally contained missing values from the feature set

Create a binary feature variable for each feature that contained missing values indicating whether each row's value has been imputed

Create a constant feature variable for each feature that contained missing values indicating the percentage of rows from the feature that was originally missing

Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: