Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

CompTIA DY0-001 - CompTIA DataX Exam

Page: 1 / 3
Total 85 questions

A data scientist is building a proof of concept for a commercialized machine-learning model. Which of the following is the best starting point?

A.

Literature review

B.

Model performance evaluation

C.

Hyperparameter tuning

D.

Model selection

A data scientist built several models that perform about the same but vary in the number of features. Which of the following models should the data scientist recommend for production according to Occam's razor?

A.

The model with the fewest features and highest performance

B.

The model with the fewest features and the lowest performance

C.

The model with the most features and the lowest performance

D.

The model with the most features and the highest performance

A data scientist is presenting the recommendations from a monthslong modeling and experiment process to the company’s Chief Executive Officer. Which of the following is the best set of artifacts to include in the presentation?

A.

Methods, data overview, results, recommendations, and charts

B.

Results, recommendations, justifications, and clear charts

C.

Recommendation, charts, justifications, code reviews, and results

D.

Methodology, code snippets, findings, data tables, and p-values

A data scientist receives an update on a business case about a machine that has thousands of error codes. The data scientist creates the following summary statistics profile while reviewing the logs for each machine:

| Number of machines observed | 3,000,000

| Number of unique error codes observed | 19,000

| Median number of unique codes per machine | 7

| Median number of error transactions | 45

Which of the following is the most likely concern with respect to data design for model ingestion?

A.

Sparse matrix

B.

Granularity misalignment

C.

Insufficient features

D.

Multivariate outliers

A company created a very popular collectible card set. Collectors attempt to collect the entire set, but the availability of each card varies, because some cards have higher production volumes than others. The set contains a total of 12 cards. The attributes of the cards are shown.

The data scientist is tasked with designing an initial model iteration to predict whether the animal on the card lives in the sea or on land, given the card's features: Wrapper color, Wrapper shape, and Animal.

Which of the following is the best way to accomplish this task?

A.

ARIMA

B.

Linear regression

C.

Association rules

D.

Decision trees

Which of the following modeling tools is appropriate for solving a scheduling problem?

A.

One-armed bandit

B.

Constrained optimization

C.

Decision tree

D.

Gradient descent

An analyst is examining data from an array of temperature sensors and sees that one sensor consistently returns values that are much higher than the values from the other sensors. Which of the following terms best describes this type of error?

A.

Synthetic

B.

Systematic

C.

Heteroskedastic

D.

Idiosyncratic

A data scientist has constructed a model that meets the minimum performance requirements specified in the proposal for a prediction project. The data scientist thinks the model's accuracy should be improved, but the proposed deadline is approaching. Which of the following actions should the data scientist take first?

A.

Continue collecting data.

B.

Request additional funding.

C.

Consult the key project stakeholder.

D.

Test additional model specifications.

A data analyst is examining the correlation matrix of a new data set to identify issues that could adversely impact model performance. Which of the following is the analyst most likely checking for?

A.

Undersampling

B.

Multicollinearity

C.

Oversampling

D.

Overfitting

Which of the following is a classic example of a constrained optimization problem?

A.

The cold start problem

B.

The traveling salesman

C.

Calculating local maximum

D.

Calculating gradient descent