Databricks Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Databricks Databricks-Machine-Learning-Associate Premium Access Download Demo

Page: 2 / 3
Total 74 questions

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

Keras

Scikit-learn

PyTorch

Spark ML

Question # 12

A machine learning engineer has grown tired of needing to install the MLflow Python library on each of their clusters. They ask a senior machine learning engineer how their notebooks can load the MLflow library without installing it each time. The senior machine learning engineer suggests that they use Databricks Runtime for Machine Learning.

Which of the following approaches describes how the machine learning engineer can begin using Databricks Runtime for Machine Learning?

They can add a line enabling Databricks Runtime ML in their init script when creating their clusters.

They can check the Databricks Runtime ML box when creating their clusters.

They can select a Databricks Runtime ML version from the Databricks Runtime Version dropdown when creating their clusters.

They can set the runtime-version variable in their Spark session to â€œmlâ€.

Question # 13

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

Leave-one-out encoding

Target encoding

One-hot encoding

Categorical

String indexing

Question # 14

A health organization is developing a classification model to determine whether or not a patient currently has a specific type of infection. The organization's leaders want to maximize the number of positive cases identified by the model.

Which of the following classification metrics should be used to evaluate the model?

RMSE

Precision

Area under the residual operating curve

Accuracy

Recall

Question # 15

Which of the following machine learning algorithms typically uses bagging?

Gradient boosted trees

K-means

Random forest

Linear regression

Decision tree

Question # 16

A data scientist is working with a feature set with the following schema:

Thecustomer_idcolumn is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.

Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

customer_id, loyalty_tier

loyalty_tier

units

spend

customer_id

Question # 17

A machine learning engineer has identified the best run from an MLflow Experiment. They have stored the run ID in the run_id variable and identified the logged model name as "model". They now want to register that model in the MLflow Model Registry with the name "best_model".

Which lines of code can they use to register the model associated with run_id to the MLflow Model Registry?

mlflow.register_model(run_id, "best_model")

mlflow.register_model(f"runs:/{run_id}/modelâ€, "best_modelâ€)

millow.register_model(f"runs:/{run_id)/model")

mlflow.register_model(f"runs:/{run_id}/best_model", "model")

Question # 18

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.

Which of the following code blocks will accomplish this task?

spark_df.loc[:,spark_df["discount"] <= 0]

spark_df[spark_df["discount"] <= 0]

spark_df.filter (col("discount") <= 0)

spark_df.loc(spark_df["discount"] <= 0, :]

Question # 19

A data scientist uses 3-fold cross-validation when optimizing model hyperparameters for a regression problem. The following root-mean-squared-error values are calculated on each of the validation folds:

â€¢ 10.0

â€¢ 12.0

â€¢ 17.0

Which of the following values represents the overall cross-validation root-mean-squared error?

13.0

17.0

12.0

39.0

10.0

Question # 20

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

pandas API on Spark DataFrames are more performant than Spark DataFrames

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

pandas API on Spark DataFrames are unrelated to Spark DataFrames

Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: