Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam

What are the advantages of the mutual information over the Pearson correlation for text classification problems?

A.

The mutual information has a meaningful test for statistical significance.

B.

The mutual information can signal non-linear relationships between the dependent and independent variables.

C.

The mutual information is easier to parallelize.

D.

The mutual information doesn't assume that the variables are normally distributed.

Consider the following confusion matrix for a data set with 600 out of 11,100 instances positive:

In this case, Precision = 50%, Recall = 83%, Specificity = 95%, and Accuracy = 95%.

Select the correct statement

A.

Precision is low, which means the classifier is predicting positives best

B.

Precision is low, which means the classifier is predicting positives poorly

C.

problem domain has a major impact on the measures that should be used to evaluate a classifier within it

D.

1 and 3

E.

2 and 3

A fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the

A.

Presence of the other features.

B.

Absence of the other features.

C.

Presence or absence of the other features

D.

None of the above

A problem statement is given as below

Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?

Which of the following model will you use to solve it.

A.

Binomial

B.

Poisson

C.

Normal

D.

Any of the above

Which analytical method is considered unsupervised?

may have a trend component that is quadratic in nature. Which pattern of data will indicate that the trend in the time series data is quadratic in nature?

A.

Naive Bayesian classifier

B.

Decision tree

C.

Linear regression

D.

K-means clustering

Suppose A, B , and C are events. The probability of A given B , relative to P(|C), is the same as the probability of A given B and C (relative to P ). That is,

A.

P(A,B|C) P(B|C) =P(A|B,C)

B.

P(A,B|C) P(B|C) =P(B|A,C)

C.

P(A,B|C) P(B|C) =P(C|B,C)

D.

P(A,B|C) P(B|C) =P(A|C,B)

Question-3: In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features (such as the words in a language), i.e., turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values modulo the number of features as indices directly, rather than looking the indices up in an associative array. So what is the primary reason of the hashing trick for building classifiers?

A.

It creates the smaller models

B.

It requires the lesser memory to store the coefficients for the model

C.

It reduces the non-significant features e.g. punctuations

D.

Noisy features are removed

Which of the following skills a data scientists required?

A.

Web designing to represent best visuals of its results from algorithm.

B.

He should be creative

C.

Should possess good programming skills

D.

Should be very good at mathematics and statistic

E.

He should possess database administrative skills.

Which of the following is not a correct application for the Classification?

A.

credit scoring

B.

tumor detection

C.

image recognition

D.

drug discovery

Google Adwords studies the number of men, and women, clicking the advertisement on search

engine during the midnight for an hour each day.

Google find that the number of men that click can be modeled as a random variable with distribution

Poisson(X), and likewise the number of women that click as Poisson(Y).

What is likely to be the best model of the total number of advertisement clicks during the midnight for an hour ?

A.

Binomial(X+Y,X+Y)

B.

Poisson(X/Y)

C.

Normal(X+Y(M+Y)1/2)

D.

Poisson(X+Y)