CompTIA DY0-001 - CompTIA DataX Exam

CompTIA DY0-001 Premium Access Download Demo

Page: 2 / 3
Total 85 questions

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

Embeddings

Extrapolation

Sampling

One-hot encoding

Question # 12

A data scientist is performing a linear regression and wants to construct a model that explains the most variation in the data. Which of the following should the data scientist maximize when evaluating the regression performance metrics?

Accuracy

RÂ²

p value

AUC

Question # 13

A data scientist uses a large data set to build multiple linear regression models to predict the likely market value of a real estate property. The selected new model has an RMSE of 995 on the holdout set and an adjusted RÂ² of 0.75. The benchmark model has an RMSE of 1,000 on the holdout set. Which of the following is the best business statement regarding the new model?

The model should be deployed because it has a lower RMSE.

The model's adjusted RÂ² is exceptionally strong for such a complex relationship.

The model fails to improve meaningfully on the benchmark model.

The model's adjusted RÂ² is too low for the real estate industry.

Question # 14

A data analyst wants to save a newly analyzed data set to a local storage option. The data set must meet the following requirements:

Be minimal in size

Have the ability to be ingested quickly

Have the associated schema, including data types, stored with it

Which of the following file types is the best to use?

JSON

Parquet

XML

CSV

Question # 15

A data analyst wants to generate the most data using tables from a database. Which of the following is the best way to accomplish this objective?

INNER JOIN

LEFT OUTER JOIN

RIGHT OUTER JOIN

FULL OUTER JOIN

Question # 16

Which of the following best describes the minimization of the residual term in a LASSO linear regression?

|e|

eÂ²

Question # 17

A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years. Which of the following forecasting techniques is the most appropriate for the data scientist to use?

Autoregressive

Moving average

Dynamic time warping

Relative strength

Question # 18

Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?

An input layer, a pooling layer, and an output layer

An input layer, a convolutional layer, and a hidden layer

An input layer, a hidden layer, and an output layer

An input layer, a dropout layer, and a hidden layer

Question # 19

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

(Screenshot shows survey rankings for just two professions and a few locations, all voting for "Tigers")

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

Interpolated data

Extrapolated data

In-sample data

Out-of-sample data

Question # 20

Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?

Binomial

Exponential

Normal

Poisson

Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

CompTIA DY0-001 - CompTIA DataX Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: