Machine Learning with Big Data complete course is currently being offered by UC San Diego through Coursera platform.

Quiz 8 Answers - Model Evaluation
1. A model that generalizes well means that
- The model is overfitting.
- The model does a good job of fitting to the noise in the data.
- The model performs well on data not used in training.
- The model performs well on data used to adjust its parameters.
2. What indicates that the model is overfitting?
- High training error and low generalization error
- Low training error and high generalization error
- High training error and high generalization error
- Low training error and low generalization error
3. Which method is used to avoid overfitting in decision trees?
- Post-pruning
- None of these
- Pre-pruning
- Pre-pruning and post-pruning
4. Which of the following best describes a way to create and use a validation set to avoid overfitting?
- leave-one-out cross-validation
- random sub-sampling
- k-fold cross-validation
- All of these
5. Which of the following statements is NOT correct?
- The test set is used to evaluate model performance on new data.
- The validation set is used to determine when to stop training the model.
- The training set is used to adjust the parameters of the model.
- The test set is used for model selection to avoid overfitting.
6. How is the accuracy rate calculated?
- Add the number of true positives and the number of false negatives.
- Divide the number of true positives by the number of true negatives.
- Divide the number of correct predictions by the total number of predictions
- Subtract the number of correct predictions from the total number of predictions.
7. Which evaluation metrics are commonly used for evaluating the performance of a classification model when there is a class imbalance problem?
- precision and recall
- precision and accuracy
- accuracy and error
- precision and error
8. How do you determine the classifier accuracy from the confusion matrix?
- Divide the sum of the diagonal values in the confusion matrix by the sum of the off-diagonal values.
- Divide the sum of all the values in the confusion matrix by the total number of samples.
- Divide the sum of the diagonal values in the confusion matrix by the total number of samples.
- Divide the sum of the off-diagonal values in the confusion matrix by the total number of samples.
Quiz 9 Answers - Model Evaluation in KNIME and Spark
1. KNIME: In the confusion matrix as viewed in the Scorer node, low_humidity_day is:
- the target class label
- the predicted class label
- the only input variable that is categorical
2. KNIME: In the confusion matrix, what is the difference between low_humidity_day and Prediction(low_humidity_day)?
- low_humidity_day is the target class label, and Prediction(low_humidity_day) is the predicted class label
- low_humidity_day is the predicted class label, and Prediction(low_humidity_day) is the target class label
- There is no difference. The two are the same
3. KNIME: In the Table View of the Interactive Table, each row is color-coded. Blue specifies:
- that the target class label for the sample is humidity_not_low
- that the target class label for the sample is humidity_low
- that the predicted class label for the sample is humidity_not_low
- that the predicted class label for the sample is humidity_low
4. KNIME: To change the colors used to color-code each sample in the Table View of the Interactive Table node:
- change the color settings in the Color Manager node
- change the color settings in the Interactive Table dialog
- It is not possible to change these colors
5. KNIME: In the Table View of the Interactive Table, the values in RowID are not consecutive because:
- the RowID values are from the original dataset, and only the test samples are displayed here
- the samples are randomly ordered in the table
- only a few samples from the test set are randomly selected and displayed here
6. Spark: To get the error rate for the decision tree model, use the following code:
print ("Error = %g " % (1.0 - accuracy)) [X]
evaluator = MuticlassClassificationEvaluator(
labelCol="label",
predictionCol="prediction",
metricName="error")
error = evaluator.evaluate(1 - predictions)
7. Spark: To print out the accuracy as a percentage, use the following code:
print ("Accuracy = %.2g" % (accuracy * 100)) [X]
print ("Accuracy = %100g" % (accuracy))
print ("Accuracy = %100.2g" % (accuracy))
8. Spark: In the last line of code in Step 4, the confusion matrix is printed out. If the “transpose()” is removed, the confusion matrix will be displayed as:
array([[87., 14.], [X]
[26., 83.]])
array([[83., 26.],
[14., 87.]])
array([[83., 87.],
[14., 26.]])
Post a Comment