Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

CompTIA DY0-001 - CompTIA DataX Exam

Page: 2 / 3
Total 85 questions

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

A.

Embeddings

B.

Extrapolation

C.

Sampling

D.

One-hot encoding

A data scientist is performing a linear regression and wants to construct a model that explains the most variation in the data. Which of the following should the data scientist maximize when evaluating the regression performance metrics?

A.

Accuracy

B.

R²

C.

p value

D.

AUC

A data scientist uses a large data set to build multiple linear regression models to predict the likely market value of a real estate property. The selected new model has an RMSE of 995 on the holdout set and an adjusted R² of 0.75. The benchmark model has an RMSE of 1,000 on the holdout set. Which of the following is the best business statement regarding the new model?

A.

The model should be deployed because it has a lower RMSE.

B.

The model's adjusted R² is exceptionally strong for such a complex relationship.

C.

The model fails to improve meaningfully on the benchmark model.

D.

The model's adjusted R² is too low for the real estate industry.

A data analyst wants to save a newly analyzed data set to a local storage option. The data set must meet the following requirements:

    Be minimal in size

    Have the ability to be ingested quickly

    Have the associated schema, including data types, stored with it

Which of the following file types is the best to use?

A.

JSON

B.

Parquet

C.

XML

D.

CSV

A data analyst wants to generate the most data using tables from a database. Which of the following is the best way to accomplish this objective?

A.

INNER JOIN

B.

LEFT OUTER JOIN

C.

RIGHT OUTER JOIN

D.

FULL OUTER JOIN

Which of the following best describes the minimization of the residual term in a LASSO linear regression?

A.

|e|

B.

e

C.

0

D.

e²

A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years. Which of the following forecasting techniques is the most appropriate for the data scientist to use?

A.

Autoregressive

B.

Moving average

C.

Dynamic time warping

D.

Relative strength

Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?

A.

An input layer, a pooling layer, and an output layer

B.

An input layer, a convolutional layer, and a hidden layer

C.

An input layer, a hidden layer, and an output layer

D.

An input layer, a dropout layer, and a hidden layer

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

(Screenshot shows survey rankings for just two professions and a few locations, all voting for "Tigers")

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

A.

Interpolated data

B.

Extrapolated data

C.

In-sample data

D.

Out-of-sample data

Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?

A.

Binomial

B.

Exponential

C.

Normal

D.

Poisson