Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam

You have data of 10.000 people who make the purchasing from a specific grocery store. You also have their income detail in the data. You have created 5 clusters using this data. But in one of the cluster you see that only 30 people are falling as below 30, 2400, 2600, 2700, 2270 etc."

What would you do in this case?

A.

You will be increasing number of clusters.

B.

You will be decreasing the number of clusters.

C.

You will remove that 30 people from dataset

D.

You will be multiplying standard deviation with the 100

Refer to Exhibit

In the exhibit, the x-axis represents the derived probability of a borrower defaulting on a loan. Also in the exhibit, the pink represents borrowers that are known to have not defaulted on their loan, and the blue represents borrowers that are known to have defaulted on their loan. Which analytical method could produce the probabilities needed to build this exhibit?

A.

Linear Regression

B.

Logistic Regression

C.

Discriminant Analysis

D.

Association Rules

In which of the scenario you can use the regression to predict the values

A.

Samsung can use it for mobile sales forecast

B.

Mobile companies can use it to forecast manufacturing defects

C.

Probability of the celebrity divorce

D.

Only 1 and 2

E.

All 1 ,2 and 3

In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

A.

Discovery

B.

Data Preparation

C.

Model Building

D.

Communicate Results

Which of the following is a correct example of the target variable in regression (supervised learning)?

A.

Nominal values like true, false

B.

Reptile, fish, mammal, amphibian, plant, fungi

C.

Infinite number of numeric values, such as 0.100, 42.001, 1000.743..

D.

All of the above

In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters and the normalizing constant usually ignored in MLEs because

A.

The normalizing constant is always very close to 1

B.

The normalizing constant only has a small impact on the maximum likelihood

C.

The normalizing constant is often zero and can cause division by zero

D.

The normalizing constant doesn't impact the maximizing value

Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.

Above is an example of

A.

Linear Regression

B.

Logistic Regression

C.

Recommendation system

D.

Maximum likelihood estimation

E.

Hierarchical linear models

A data scientist is asked to implement an article recommendation feature for an on-line magazine.

The magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only the style and subject matter of the current article is available for making recommendations. All of the magazine's articles are stored in a database in a format suitable for analytics.

Which method should the data scientist try first?

A.

K Means Clustering

B.

Naive Bayesian

C.

Logistic Regression

D.

Association Rules

RMSE is a useful metric for evaluating which types of models?

A.

Logistic regression

B.

Naive Bayes classifier

C.

Linear regression

D.

All of the above

What type of output generated in case of linear regression?

A.

Continuous variable

B.

Discrete Variable

C.

Any of the Continuous and Discrete variable

D.

Values between 0 and 1