Cs440Wa5

Q1

Statement	Reason
$\Pr[h \mid D] = \frac{\Pr[D \mid h] \Pr[h]}{\sum_{h' \in H} \Pr[D \mid h']\Pr[h']}$	Given
	No Noise Assumption
$\Pr[D \mid h] = 1$	If $h$ is in the version space $(h \in VS$), it means $h$ is perfectly consistent with the data. Therefore,
$\Pr[D \mid h] = 0$	If $h$ is not in the version space ($h \notin VS$), then $h$ makes at least one error on the training set, so
$\Pr[h \mid D] = \frac{0 \cdot \Pr[h]}{\sum_{h'\in H} \Pr[D \mid h']\Pr[h']} = 0$	For $h \notin VS$
$\Pr[h \mid D] = \frac{1 \cdot \Pr[h]}{\sum_{h'\in VS} 1\cdot \Pr[h']} = \frac{\Pr[h]}{\sum_{h'\in VS} \Pr[h']}$	For $h \in VS$
What does this say about choosing the “best” target function? Is training data alone enough? (Hint: What is Pr[D	h] if h is in our version space? What is it if h is not in our version space?)

This result says that the training data is not enough to choose the "best" target function since all hypotheses in $VS$ have equal likelihood in terms of fitting the data perfectly when $\Pr[D \mid h] = 1$. So the training data alone is not enough

Q2

Statement	Reason
(n − 1) positives and n negatives	if leave out a positive
the perfectly balanced traning set is no longer balanced	Then
negative	the majority classifier's prediction
positive	what we want to predict
0% accuracy	Thus
n positives and (n − 1) negatives	if leave out a negative
positive	the majority classifier's prediction
negative	what we want to predict
overall for both cases 0% accuracy	Thus

Q3

Statement	Reason
$\Pr(\text{exactly } j \text{ errors}) = \binom{K}{j}\epsilon^j (1-\epsilon)^{K-j}$	if each classifier has an error probability $ϵ$, the probability that exactly $j$ classifiers make an error is
if the ensemble is wrong	Assumption
at least $(K+1)/2$ erroneous votes	then
$E(K, \epsilon) = \sum_{j=\lceil (K+1)/2 \rceil}^{K} \binom{K}{j}\epsilon^j (1-\epsilon)^{K-j}$	Thus, the error of the ensemble is
$\sum_{j=\lceil (K+1)/2 \rceil}^{K} \binom{K}{j}\epsilon^j (\epsilon^{i}-\epsilon^{K})$	simplify