Cs440Wa5
Q1
Statement | Reason |
---|
$\Pr[h \mid D] = \frac{\Pr[D \mid h] \Pr[h]}{\sum_{h' \in H} \Pr[D \mid h']\Pr[h']}$ | Given |
| No Noise Assumption |
$\Pr[D \mid h] = 1$ | If $h$ is in the version space $(h \in VS$), it means $h$ is perfectly consistent with the data. Therefore, |
$\Pr[D \mid h] = 0$ | If $h$ is not in the version space ($h \notin VS$), then $h$ makes at least one error on the training set, so |
$\Pr[h \mid D] = \frac{0 \cdot \Pr[h]}{\sum_{h'\in H} \Pr[D \mid h']\Pr[h']} = 0$ | For $h \notin VS$ |
$\Pr[h \mid D] = \frac{1 \cdot \Pr[h]}{\sum_{h'\in VS} 1\cdot \Pr[h']} = \frac{\Pr[h]}{\sum_{h'\in VS} \Pr[h']}$ | For $h \in VS$ |
What does this say about choosing the “best” target function? Is training data alone enough? (Hint: What is Pr[D | h] if h is in our version space? What is it if h is not in our version space?) |
This result says that the training data is not enough to choose the "best" target function since all hypotheses in $VS$ have equal likelihood in terms of fitting the data perfectly when $\Pr[D \mid h] = 1$. So the training data alone is not enough
Q2
Statement | Reason |
---|
(n − 1) positives and n negatives | if leave out a positive |
the perfectly balanced traning set is no longer balanced | Then |
negative | the majority classifier's prediction |
positive | what we want to predict |
0% accuracy | Thus |
n positives and (n − 1) negatives | if leave out a negative |
positive | the majority classifier's prediction |
negative | what we want to predict |
overall for both cases 0% accuracy | Thus |
Q3
Statement | Reason |
---|
$\Pr(\text{exactly } j \text{ errors}) = \binom{K}{j}\epsilon^j (1-\epsilon)^{K-j}$ | if each classifier has an error probability $ϵ$, the probability that exactly $j$ classifiers make an error is |
if the ensemble is wrong
| Assumption |
at least $(K+1)/2$ erroneous votes | then |
$E(K, \epsilon) = \sum_{j=\lceil (K+1)/2 \rceil}^{K} \binom{K}{j}\epsilon^j (1-\epsilon)^{K-j}$ | Thus, the error of the ensemble is |
$\sum_{j=\lceil (K+1)/2 \rceil}^{K} \binom{K}{j}\epsilon^j (\epsilon^{i}-\epsilon^{K})$ | simplify |