Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam

Which of the following steps you will be using in the discovery phase?

A.

What all are the data sources for the project?

B.

Analyze the Raw data and its format and structure.

C.

What all tools are required, in the project?

D.

What is the network capacity required

E.

What Unix server capacity required?

In which of the following scenario we can use naTve Bayes theorem for classification

A.

Classify whether a given person is a male or a female based on the measured features. The features include height, weight and foot size.

B.

To classify whether an email is spam or not spam

C.

To identify whether a fruit is an orange or not based on features like diameter, color and shape

You are working as a data science consultant for a gaming company. You have three member team and all other stake holders are from the company itself like project managers and project sponsored, data team etc. During the discussion project managed asked you that when can you tell me that the model you are using is robust enough, after which step you can consider answer for this question?

A.

Data Preparation

B.

Discovery

C.

Operationalize

D.

Model planning

E.

Model building

Refer to the exhibit.

You are building a decision tree. In this exhibit, four variables are listed with their respective values of info-gain.

Based on this information, on which attribute would you expect the next split to be in the decision tree?

A.

Credit Score

B.

Age

C.

Income

D.

Gender

What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?

A.

1/3

B.

2/3

C.

1/6

D.

2/6

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

A.

Expected value

B.

Variance

C.

Linear regression

D.

Quantiles

A website is opened 3 times by a user. What is the probability of he clicks 2 times the advertisement, is best calculated by

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

The method based on principal component analysis (PCA) evaluates the features according to

A.

The projection of the largest eigenvector of the correlation matrix on the initial dimensions

B.

According to the magnitude of the components of the discriminate vector

C.

The projection of the smallest eigenvector of the correlation matrix on the initial dimensions

D.

None of the above

Projecting a multi-dimensional dataset onto which vector has the greatest variance?

A.

first principal component

B.

first eigenvector

C.

not enough information given to answer

D.

second eigenvector

E.

second principal component

Refer to the Exhibit.

In the Exhibit, the table shows the values for the input Boolean attributes "A", "B", and "C". It also shows the values for the output attribute "class". Which decision tree is valid for the data?

A.

Tree A

B.

Tree B

C.

Tree C

D.

Tree D