Machine Learning with Big Data complete course is currently being offered by UC San Diego through Coursera platform.

This course is part of the Big Data Specialization.

About this Course

This course provides an overview of machine learning techniques to explore, analyze, and leverage data.  You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems.

At the end of the course, you will be able to:

Design an approach to leverage data using the steps in the machine learning process.
Apply machine learning techniques to explore and prepare data for modeling.
Identify the type of machine learning problem in order to apply the appropriate set of techniques.
Construct models that learn from data using widely available open source tools.
Analyze big data problems using scalable machine learning algorithms on Spark.

SKILLS YOU WILL GAIN

- Machine Learning Concepts
- Knime
- Machine Learning
- Apache Spark


Machine Learning with Big Data Week 1 Quiz Answers!

Quiz 8 Answers - Model Evaluation

1. A model that generalizes well means that

  • The model is overfitting.
  • The model does a good job of fitting to the noise in the data.
  • The model performs well on data not used in training.
  • The model performs well on data used to adjust its parameters.

2. What indicates that the model is overfitting?

  • High training error and low generalization error
  • Low training error and high generalization error
  • High training error and high generalization error
  • Low training error and low generalization error

3. Which method is used to avoid overfitting in decision trees?

  • Post-pruning
  • None of these
  • Pre-pruning
  • Pre-pruning and post-pruning

4. Which of the following best describes a way to create and use a validation set to avoid overfitting?

  • leave-one-out cross-validation
  • random sub-sampling
  • k-fold cross-validation
  • All of these

5. Which of the following statements is NOT correct?

  • The test set is used to evaluate model performance on new data.
  • The validation set is used to determine when to stop training the model.
  • The training set is used to adjust the parameters of the model.
  • The test set is used for model selection to avoid overfitting.

6. How is the accuracy rate calculated?

  • Add the number of true positives and the number of false negatives.
  • Divide the number of true positives by the number of true negatives.
  • Divide the number of correct predictions by the total number of predictions
  • Subtract the number of correct predictions from the total number of predictions.

7. Which evaluation metrics are commonly used for evaluating the performance of a classification model when there is a class imbalance problem?

  • precision and recall
  • precision and accuracy
  • accuracy and error
  • precision and error

8. How do you determine the classifier accuracy from the confusion matrix?

  • Divide the sum of the diagonal values in the confusion matrix by the sum of the off-diagonal values.
  • Divide the sum of all the values in the confusion matrix by the total number of samples.
  • Divide the sum of the diagonal values in the confusion matrix by the total number of samples.
  • Divide the sum of the off-diagonal values in the confusion matrix by the total number of samples.

Quiz 9 Answers - Model Evaluation in KNIME and Spark

1. KNIME: In the confusion matrix as viewed in the Scorer node, low_humidity_day is:

  • the target class label
  • the predicted class label
  • the only input variable that is categorical

2. KNIME: In the confusion matrix, what is the difference between low_humidity_day and Prediction(low_humidity_day)?

  • low_humidity_day is the target class label, and Prediction(low_humidity_day) is the predicted class label
  • low_humidity_day is the predicted class label, and Prediction(low_humidity_day) is the target class label
  • There is no difference. The two are the same

3. KNIME: In the Table View of the Interactive Table, each row is color-coded. Blue specifies:

  • that the target class label for the sample is humidity_not_low
  • that the target class label for the sample is humidity_low
  • that the predicted class label for the sample is humidity_not_low
  • that the predicted class label for the sample is humidity_low

4. KNIME: To change the colors used to color-code each sample in the Table View of the Interactive Table node:

  • change the color settings in the Color Manager node
  • change the color settings in the Interactive Table dialog
  • It is not possible to change these colors

5. KNIME: In the Table View of the Interactive Table, the values in RowID are not consecutive because:

  • the RowID values are from the original dataset, and only the test samples are displayed here
  • the samples are randomly ordered in the table
  • only a few samples from the test set are randomly selected and displayed here

6. Spark: To get the error rate for the decision tree model, use the following code:

print ("Error = %g " % (1.0 - accuracy)) [X]

evaluator = MuticlassClassificationEvaluator(

    labelCol="label",

    predictionCol="prediction",

    metricName="error")

error = evaluator.evaluate(1 - predictions)

7. Spark: To print out the accuracy as a percentage, use the following code:

print ("Accuracy = %.2g" % (accuracy * 100)) [X]

print ("Accuracy = %100g" % (accuracy))

print ("Accuracy = %100.2g" % (accuracy))

8. Spark: In the last line of code in Step 4, the confusion matrix is printed out. If the “transpose()” is removed, the confusion matrix will be displayed as:

array([[87., 14.],  [X]

       [26., 83.]])

array([[83., 26.],

       [14., 87.]])

array([[83., 87.],

       [14., 26.]])

Post a Comment

Previous Post Next Post