Cs440Wa5

Q1

StatementReason
$\Pr[h \mid D] = \frac{\Pr[D \mid h] \Pr[h]}{\sum_{h' \in H} \Pr[D \mid h']\Pr[h']}$Given
No Noise Assumption
$\Pr[D \mid h] = 1$If $h$ is in the version space $(h \in VS$), it means $h$ is perfectly consistent with the data. Therefore,
$\Pr[D \mid h] = 0$If $h$ is not in the version space ($h \notin VS$), then $h$ makes at least one error on the training set, so
$\Pr[h \mid D] = \frac{0 \cdot \Pr[h]}{\sum_{h'\in H} \Pr[D \mid h']\Pr[h']} = 0$For $h \notin VS$
$\Pr[h \mid D] = \frac{1 \cdot \Pr[h]}{\sum_{h'\in VS} 1\cdot \Pr[h']} = \frac{\Pr[h]}{\sum_{h'\in VS} \Pr[h']}$For $h \in VS$
What does this say about choosing the “best” target function? Is training data alone enough? (Hint: What is Pr[Dh] if h is in our version space? What is it if h is not in our version space?)

This result says that the training data is not enough to choose the "best" target function since all hypotheses in $VS$ have equal likelihood in terms of fitting the data perfectly when $\Pr[D \mid h] = 1$. So the training data alone is not enough

Q2

StatementReason
(n − 1) positives and n negativesif leave out a positive
the perfectly balanced traning set is no longer balancedThen
negativethe majority classifier's prediction
positivewhat we want to predict
0% accuracyThus
n positives and (n − 1) negativesif leave out a negative
positivethe majority classifier's prediction
negativewhat we want to predict
overall for both cases 0% accuracyThus

Q3

StatementReason
$\Pr(\text{exactly } j \text{ errors}) = \binom{K}{j}\epsilon^j (1-\epsilon)^{K-j}$if each classifier has an error probability $ϵ$, the probability that exactly $j$ classifiers make an error is
if the ensemble is wrong
Assumption
at least $(K+1)/2$ erroneous votesthen
$E(K, \epsilon) = \sum_{j=\lceil (K+1)/2 \rceil}^{K} \binom{K}{j}\epsilon^j (1-\epsilon)^{K-j}$Thus, the error of the ensemble is
$\sum_{j=\lceil (K+1)/2 \rceil}^{K} \binom{K}{j}\epsilon^j (\epsilon^{i}-\epsilon^{K})$simplify