Scrap Page
Gradient Descent With Multiple Variables
$$\begin{align*} \text{repeat}&\text{ until convergence:} ; \lbrace \newline;
& w_j := w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} ; & \text{for j = 0..n-1}\newline
&b\ \ := b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \newline \rbrace
\end{align*}$$
where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where
$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\\frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3}
\end{align}
$$
m is the number of training examples in the data set
$f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value
Normalizing Features
def zscore_normalize_features(X):
"""
computes X, zcore normalized by column
Args:
X (ndarray (m,n)) : input data, m examples, n features
Returns:
X_norm (ndarray (m,n)): input normalized by column
mu (ndarray (n,)) : mean of each feature
sigma (ndarray (n,)) : standard deviation of each feature
"""
# find the mean of each column/feature
mu = np.mean(X, axis=0) # mu will have shape (n,)
# find the standard deviation of each column/feature
sigma = np.std(X, axis=0) # sigma will have shape (n,)
# element-wise, subtract mu for that column from each example, divide by std for that column
X_norm = (X - mu) / sigma
return (X_norm, mu, sigma)
#check our work
#from sklearn.preprocessing import scale
#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)
X_norm, X_mu, X_sigma = zscore_normalize_features(X_train)
Compute the Cost for Logistic Regression
Use the logistic loss instead of the squared difference, cuz its convex
def compute_cost_logistic(X, y, w, b):
"""
Computes cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
cost (scalar): cost
"""
m = X.shape[0]
cost = 0.0
for i in range(m):
z_i = np.dot(X[i],w) + b
f_wb_i = sigmoid(z_i)
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
cost = cost / m
return cost
Compute Gradient for Logistic Regression
def compute_gradient_logistic(X, y, w, b):
"""
Computes the gradient for logistic regression
Args:
X (ndarray (m,n): Data, m examples with n features
y (ndarray (m,)): target values
w (ndarray (n,)): model parameters
b (scalar) : model parameter
Returns
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar) : The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape
dj_dw = np.zeros((n,)) #(n,)
dj_db = 0.
for i in range(m):
f_wb_i = sigmoid(np.dot(X[i],w) + b) #(n,)(n,)=scalar
err_i = f_wb_i - y[i] #scalar
for j in range(n):
dj_dw[j] = dj_dw[j] + err_i * X[i,j] #scalar
dj_db = dj_db + err_i
dj_dw = dj_dw/m #(n,)
dj_db = dj_db/m #scalar
return dj_db, dj_dw
Solution Explanation
We are given a deterministic twoâplayer zeroâsum turnâbased game where:
- One player is MAX (us).
- The other player is MIN (the opponent).
We also have an oracle (O(s)) that, given any state (s) in which MIN is to move, returns the move that MIN would choose if they were playing optimally for themselves (i.e., the minimizing move).
We want to reduce this twoâplayer game to a singleâplayer search problem that correctly computes our (MAXâs) optimal policy. The key idea is:
When it is our turn (MAXâs turn) in state (s):
- We branch on each possible action (a) we can take.
- Once we pick an action (a), the state transitions to (s' = \text{Result}(s, a)).
- Now it should be MINâs turn in (s').
- Instead of branching on all possible moves of MIN, we use the oracle (O(s')) to get the single best (for MIN) move (a_{\text{opp}}).
- We then transition immediately to (s'' = \text{Result}(s', a_{\text{opp}})).
- Hence from MAXâs perspective, the next state after our move (a) is deterministic (because the oracle effectively âcollapsesâ MINâs possible moves to the single best one for MIN).
Hence, the new âsingleâplayerâ transition is:
[
s
\xrightarrow[\text{our action}]{a}
s' = \text{Result}(s,a)
\xrightarrow[\text{oracle for MIN}]{O(s')}
s'' = \text{Result}(s',,O(s')).
]
We treat this combined step as a single âtransitionâ in a singleâplayer game: from (s) to (s'') via our choice (a).We can then do a standard singleâagent search (e.g., depthâfirst with memoization, or any other exhaustive search technique) over this newly defined (deterministic) state space. At each node (s) (where âit is our turnâ), we branch only over our possible actions, and the opponentâs reply is fixed by the oracle.
Correctness follows from the fact that in the original twoâplayer game, MIN would indeed choose that same move if playing optimally. Thus analyzing only that single MIN move in each state is equivalent to standard minimax, but with the branching factor for MIN collapsed to 1 (the known optimal MIN move).
1. Pseudocode
Below is a highâlevel pseudocode for the OracleâAdvised SingleâPlayer Search that yields the same result as a full minimax search but uses the oracle to avoid branching over MINâs actions.
FUNCTION OracleAdvisedSearch(state s):
# If s is a terminal state, return its utility for MAX.
if IsTerminal(s):
return Utility(s)
best_value â -â
# For each action we (MAX) can take in state s:
for each action a in Actions(s):
s' â Result(s, a) # State after our action a
a_opp â O(s') # Oracle gives MIN's best response
s'' â Result(s', a_opp) # State after MIN's best response
# Recursively evaluate the resulting position
value_a â OracleAdvisedSearch(s'')
# Update best value
if value_a > best_value:
best_value â value_a
return best_value
IsTerminal(s)checks if (s) is a gameâending state.Utility(s)returns the terminal payoff (for MAX) if (s) is terminal.Actions(s)returns the actions available to MAX in state (s).Result(s, a)returns the next state if action (a) is played in state (s).O(s')returns the opponentâs (MINâs) optimal move in state (s'), per the oracle.
In many implementations, one might also keep track of the best action (not just the best value) so we can return the policy. But above we show only the value computation for simplicity.
2. Proof of Correctness (TwoâColumn Proof)
Use an inductive argument on the depth of the game tree to show that OracleAdvisedSearch returns the same value that would be computed by the standard Minimax value for MAX.
Statement of What We Prove
For every state (s) in which it is MAXâs turn, the function
OracleAdvisedSearch(s)returns the same value asMinimaxValue(s)(the minimax value under optimal play for both sides).
Below is a typical twoâcolumn proof outline.
| Left (Statements) | Right (Reasons) |
|---|---|
1. We define MinimaxValue(s) in the usual way: | Definition of minimax. |
| Â Â ( \displaystyle \text{MinimaxValue}(s) = \max_{a ,\in, \text{Actions}(s)} \min_{b ,\in, \text{Actions}(\text{Result}(s,a))} \text{MinimaxValue}(\text{Result}(\text{Result}(s,a), b)) ) if (s) is nonâterminal on MAXâs turn, and similarly for MINâs turn. | By the standard twoâplayer game definition, if itâs MINâs turn, we take a (\min). |
2. We define OracleAdvisedSearch(s) as in our pseudocode: from (s), we try each action (a), then apply the oracle move for MIN, leading to a single next state, and recurse. | Definition of the OracleAdvisedSearch. |
3. Base Case: If (s) is terminal, OracleAdvisedSearch(s) = Utility(s) and MinimaxValue(s) = Utility(s). They match immediately. | Directly from the pseudocode and definition of minimax, both agree on terminal nodes. |
4. Inductive Step: Assume for all states (t) at depth (d) or less (from the bottom), OracleAdvisedSearch(t) = MinimaxValue(t). We must show it holds at depth (d+1). | Principle of strong (or simple) induction on the gameâtree depth. |
| 5. Let (s) be a state at depth (d+1) where it is MAXâs turn. By definition, | We are focusing on states where MAX moves. |
| Â Â ( \text{MinimaxValue}(s) = \max_{a \in \text{Actions}(s)} \min_{b \in \text{Actions}(\text{Result}(s,a))} \text{MinimaxValue}(\text{Result}(\text{Result}(s,a), b)) ). ) | Standard minimax rule. |
| 6. By assumption of the oracle, for any state (s') (with MIN to move) we have ( O(s') = b^* ) where ( b^* ) satisfies: | The oracle returns MINâs best move. |
| Â Â ( b^* \in \arg\min_{b \in \text{Actions}(s')} \text{MinimaxValue}(\text{Result}(s', b)). ) | Definition of âoracle is always correct.â |
| 7. Hence, in the singleâplayer transformation, from state (s), if MAX picks (a), we move to (s' = \text{Result}(s,a)), then the oracleâs move is ( b^* = O(s') ). So the resulting state is ( s'' = \text{Result}(s', b^*) ). | This matches the pseudocode transitions. |
| 8. Because (b^*) is MINâs optimal move from (s'), we have: | By definition of (b^*). |
| Â Â ( \min_{b \in \text{Actions}(s')} \text{MinimaxValue}(\text{Result}(s', b)) = \text{MinimaxValue}(\text{Result}(s', b^*)). ) | (b^*) attains the min. |
| 9. Therefore, | Combining steps 5 and 8. |
| Â Â ( \text{MinimaxValue}(s) = \max_{a \in \text{Actions}(s)} \text{MinimaxValue}(s''), \text{ where } s'' = \text{Result}(\text{Result}(s,a), O(\text{Result}(s,a))). ) | Rewriting the minimax expression to factor in the oracle. |
10. By the inductive hypothesis, for each âchildâ (s'') at depth (d) (one level closer to terminal), OracleAdvisedSearch(s'') = MinimaxValue(s''). | We apply the induction assumption to states closer to the bottom. |
11. Thus, in computing OracleAdvisedSearch(s), we take the same maximum over exactly the same values (\text{MinimaxValue}(s'')). | From the pseudocodeâs recursion. |
12. Hence OracleAdvisedSearch(s) = MinimaxValue(s). | They coincide at depth (d+1). |
| 13. By mathematical induction on the depth, these two functions coincide for all states (s). | This completes the proof. |
Conclusion: For every state (s), OracleAdvisedSearch(s) returns the same value as the standard minimax search would. Therefore, using the oracle to collapse MINâs branching to a single move is a correct way to turn a twoâplayer game (under optimal play by MIN) into a singleâplayer search problem from MAXâs perspective in a deterministic, zeroâsum setting.Solution Explanation
We are given a deterministic twoâplayer zeroâsum turnâbased game where:
- One player is MAX (us).
- The other player is MIN (the opponent).
We also have an oracle (O(s)) that, given any state (s) in which MIN is to move, returns the move that MIN would choose if they were playing optimally for themselves (i.e., the minimizing move).
We want to reduce this twoâplayer game to a singleâplayer search problem that correctly computes our (MAXâs) optimal policy. The key idea is:
When it is our turn (MAXâs turn) in state (s):
- We branch on each possible action (a) we can take.
- Once we pick an action (a), the state transitions to (s' = \text{Result}(s, a)).
- Now it should be MINâs turn in (s').
- Instead of branching on all possible moves of MIN, we use the oracle (O(s')) to get the single best (for MIN) move (a_{\text{opp}}).
- We then transition immediately to (s'' = \text{Result}(s', a_{\text{opp}})).
- Hence from MAXâs perspective, the next state after our move (a) is deterministic (because the oracle effectively âcollapsesâ MINâs possible moves to the single best one for MIN).
Hence, the new âsingleâplayerâ transition is:
[
s
\xrightarrow[\text{our action}]{a}
s' = \text{Result}(s,a)
\xrightarrow[\text{oracle for MIN}]{O(s')}
s'' = \text{Result}(s',,O(s')).
]
We treat this combined step as a single âtransitionâ in a singleâplayer game: from (s) to (s'') via our choice (a).We can then do a standard singleâagent search (e.g., depthâfirst with memoization, or any other exhaustive search technique) over this newly defined (deterministic) state space. At each node (s) (where âit is our turnâ), we branch only over our possible actions, and the opponentâs reply is fixed by the oracle.
Correctness follows from the fact that in the original twoâplayer game, MIN would indeed choose that same move if playing optimally. Thus analyzing only that single MIN move in each state is equivalent to standard minimax, but with the branching factor for MIN collapsed to 1 (the known optimal MIN move).
1. Pseudocode
Below is a highâlevel pseudocode for the OracleâAdvised SingleâPlayer Search that yields the same result as a full minimax search but uses the oracle to avoid branching over MINâs actions.
FUNCTION OracleAdvisedSearch(state s):
# If s is a terminal state, return its utility for MAX.
if IsTerminal(s):
return Utility(s)
best_value â -â
# For each action we (MAX) can take in state s:
for each action a in Actions(s):
s' â Result(s, a) # State after our action a
a_opp â O(s') # Oracle gives MIN's best response
s'' â Result(s', a_opp) # State after MIN's best response
# Recursively evaluate the resulting position
value_a â OracleAdvisedSearch(s'')
# Update best value
if value_a > best_value:
best_value â value_a
return best_value
IsTerminal(s)checks if (s) is a gameâending state.Utility(s)returns the terminal payoff (for MAX) if (s) is terminal.Actions(s)returns the actions available to MAX in state (s).Result(s, a)returns the next state if action (a) is played in state (s).O(s')returns the opponentâs (MINâs) optimal move in state (s'), per the oracle.
In many implementations, one might also keep track of the best action (not just the best value) so we can return the policy. But above we show only the value computation for simplicity.
2. Proof of Correctness (TwoâColumn Proof)
Use an inductive argument on the depth of the game tree to show that OracleAdvisedSearch returns the same value that would be computed by the standard Minimax value for MAX.
Statement of What We Prove
For every state (s) in which it is MAXâs turn, the function
OracleAdvisedSearch(s)returns the same value asMinimaxValue(s)(the minimax value under optimal play for both sides).
Below is a typical twoâcolumn proof outline.
$$
$$
| Left (Statements) | Right (Reasons) |
|---|---|
1. We define MinimaxValue(s) in the usual way: | Definition of minimax. |
| Â Â ( \displaystyle \text{MinimaxValue}(s) = \max_{a ,\in, \text{Actions}(s)} \min_{b ,\in, \text{Actions}(\text{Result}(s,a))} \text{MinimaxValue}(\text{Result}(\text{Result}(s,a), b)) ) if (s) is nonâterminal on MAXâs turn, and similarly for MINâs turn. | By the standard twoâplayer game definition, if itâs MINâs turn, we take a (\min). |
2. We define OracleAdvisedSearch(s) as in our pseudocode: from (s), we try each action (a), then apply the oracle move for MIN, leading to a single next state, and recurse. | Definition of the OracleAdvisedSearch. |
3. Base Case: If (s) is terminal, OracleAdvisedSearch(s) = Utility(s) and MinimaxValue(s) = Utility(s). They match immediately. | Directly from the pseudocode and definition of minimax, both agree on terminal nodes. |
4. Inductive Step: Assume for all states (t) at depth (d) or less (from the bottom), OracleAdvisedSearch(t) = MinimaxValue(t). We must show it holds at depth (d+1). | Principle of strong (or simple) induction on the gameâtree depth. |
| 5. Let (s) be a state at depth (d+1) where it is MAXâs turn. By definition, | We are focusing on states where MAX moves. |
| Â Â ( \text{MinimaxValue}(s) = \max_{a \in \text{Actions}(s)} \min_{b \in \text{Actions}(\text{Result}(s,a))} \text{MinimaxValue}(\text{Result}(\text{Result}(s,a), b)) ). ) | Standard minimax rule. |
| 6. By assumption of the oracle, for any state (s') (with MIN to move) we have ( O(s') = b^* ) where ( b^* ) satisfies: | The oracle returns MINâs best move. |
| Â Â ( b^* \in \arg\min_{b \in \text{Actions}(s')} \text{MinimaxValue}(\text{Result}(s', b)). ) | Definition of âoracle is always correct.â |
| 7. Hence, in the singleâplayer transformation, from state (s), if MAX picks (a), we move to (s' = \text{Result}(s,a)), then the oracleâs move is ( b^* = O(s') ). So the resulting state is ( s'' = \text{Result}(s', b^*) ). | This matches the pseudocode transitions. |
| 8. Because (b^*) is MINâs optimal move from (s'), we have: | By definition of (b^*). |
| Â Â ( \min_{b \in \text{Actions}(s')} \text{MinimaxValue}(\text{Result}(s', b)) = \text{MinimaxValue}(\text{Result}(s', b^*)). ) | (b^*) attains the min. |
| 9. Therefore, | Combining steps 5 and 8. |
| Â Â ( \text{MinimaxValue}(s) = \max_{a \in \text{Actions}(s)} \text{MinimaxValue}(s''), \text{ where } s'' = \text{Result}(\text{Result}(s,a), O(\text{Result}(s,a))). ) | Rewriting the minimax expression to factor in the oracle. |
10. By the inductive hypothesis, for each âchildâ (s'') at depth (d) (one level closer to terminal), OracleAdvisedSearch(s'') = MinimaxValue(s''). | We apply the induction assumption to states closer to the bottom. |
11. Thus, in computing OracleAdvisedSearch(s), we take the same maximum over exactly the same values (\text{MinimaxValue}(s'')). | From the pseudocodeâs recursion. |
12. Hence OracleAdvisedSearch(s) = MinimaxValue(s). | They coincide at depth (d+1). |
| 13. By mathematical induction on the depth, these two functions coincide for all states (s). | This completes the proof. |
Conclusion: For every state (s), OracleAdvisedSearch(s) returns the same value as the standard minimax search would. Therefore, using the oracle to collapse MINâs branching to a single move is a correct way to turn a twoâplayer game (under optimal play by MIN) into a singleâplayer search problem from MAXâs perspective in a deterministic, zeroâsum setting.
 Â
MinimaxValue
(
đ
)
max
âĄ
đ
â
â
â
Actions
(
đ
)
min
âĄ
đ
â
â
â
Actions
(
Result
(
đ
,
đ
)
)
MinimaxValue
(
Result
(
Result
(
đ
,
đ
)
,
đ
)
)
MinimaxValue(s)=
aâActions(s)
max
â
bâActions(Result(s,a))
min
â
MinimaxValue(Result(Result(s,a),b)) if
đ
s is nonâterminal on MAXâs turn, and similarly for MINâs turn.