*Machine Learning with Big Data* complete course is currently being offered by UC San Diego through Coursera platform.

**Big Data Specialization.**

*About this Course*

*:*

*At the end of the course, you will be able to:***SKILLS YOU WILL GAIN**

**Also Check: How to Apply for Coursera Financial Aid**

*Quiz 6 Answers - Classification***1. Which of the following is a TRUE statement about classification?**

**Classification is a supervised task.**- Classification is an unsupervised task.
- In a classification problem, the target variable has only two possible outcomes.

**2. In which phase are model parameters adjusted?**

- Testing phase
**Training phase**- Data preparation phase
- Model parameters are constant throughout the modeling process.

**3. Which classification algorithm uses a probabilistic approach?**

**naive bayes**- none of the above
- decision tree
- k-nearest-neighbors

**4. What does the 'k' stand for in k-nearest-neighbors?**

- the number of samples in the dataset
**the number of nearest neighbors to consider in classifying a sample**- the distance between neighbors: All neighboring samples that are 'k' distance apart from the sample are considered in classifying that sample.
- the number of training datasets

**5. During construction of a decision tree, there are several criteria that can be used to determine when a node should no longer be split into subsets. Which one of the following is NOT applicable?**

- The tree depth reaches a maximum threshold.
- The number of samples in the node reaches a minimum threshold.
- All (or X% of) samples have the same class label.
**The value of the Gini index reaches a maximum threshold.**

**6. Which statement is true of tree induction?**

- You want to split the data in a node into subsets that are as homogeneous as possible
**All of these statements are true of tree induction.**- An impurity measure is used to determine the best split for a node.
- For each node, splits on all variables are tested to determine the best split for the node.

**7. What does 'naive' mean in Naive Bayes?**

- The full Bayes' Theorem is not used. The 'naive' in naive bayes specifies that a simplified version of Bayes' Theorem is used.
- The Bayes’ Theorem makes estimating the probabilities easier. The 'naÃ¯ve' in the name of classifier comes from this ease of probability calculation.
**The model assumes that the input features are statistically independent of one another. The 'naÃ¯ve' in the name of classifier comes from this naÃ¯ve assumption.**

**8. The feature independence assumption in Naive Bayes simplifies the classification problem by**

- assuming that the prior probabilities of all classes are independent of one another.
- assuming that classes are independent of the input features.
- ignoring the prior probabilities altogether.
**allowing the probability of each feature given the class to be estimated individually.**

**1. KNIME: In configuring the Numeric Binner node, what would happen if the definition for the humidity_low bin is changed from**

] -infinity ... 25.0 [

to

] -infinity ... 25.0 ]

(i.e., the last bracket is changed from [ to ] ?

**The definition for the humidity_low bin would change from excluding 25.0 to including 25.0**- The definition for the humidity_low bin would change from having 25.0 as the endpoint to having 25.1 as the endpoint
- Nothing would change

- KNIME: Considering the Numeric Binner node again, what would happen if the “Append new column” box is not checked?

**The relative_humidity_3pm variable will become a categorical variable**- The relaltive_humidity_3pm variable will remain unchanged, and a new unnamed categorical variable will be created
- The relative_humidity_3pm variable will become undefined, and an error will occur

- KNIME: How many samples had a missing value for air_temp_9am before missing values were addressed?

**5**- 3
- 0

- KNIME: How many samples were placed in the test set after the dataset was partitioned into training and test sets?

**213**- 851
- 20

- KNIME: What are the target and predicted class labels for the first sample in the test set?

**Both are humidity_not_low**- Target class label is humidity_not_low, and predicted class label is humidity_low
- Target class label is humidity_low, and predicted class label is humidity_not_low

- Spark: What values are in the number column?

**Integer values starting at 0**- Time and date values
- Random integer values

- Spark: With the original dataset split into 80% for training and 20% for test, how many of the first 20 samples from the test set were correctly classified?

**19**- 10
- 1

- Spark: If we split the data using 70% for training data and 30% for test data, how many samples would the training set have (using seed 13234)?

**730**- 334
- 70

## Post a Comment