What will this function print
1 2 3 4 5 6 7 8 9 10 11 12 | def diff_round(x, decimals=1): scale_factor = (10 ** decimals) x = x * scale_factor diff = (1 + 1e-2) * x - torch.floor(x) x = x - diff + (torch.floor(x) + torch.where(diff >= 0.5, 1, 0)) x = x / scale_factor return x g = torch.tensor([.99, 1.54, 1.9], dtype=torch.float32, requires_grad=True) loss = torch.sum(diff_round(g, 1)) loss.backward() print(g.grad.sum()) |
What will this function print
1 2 3 4 5 6 7 8 9 10 11 12 | def diff_round(x, decimals=1): scale_factor = (10 ** decimals) x = x * scale_factor diff = (1 + 1e-2) * x - torch.floor(x) x = x - diff + (torch.floor(x) + torch.where(diff >= 0.5, 1, 0)) x = x / scale_factor return x g = torch.tensor([.99, 1.54, 1.9], dtype=torch.float32, requires_grad=True) loss = torch.sum(diff_round(g, 1)) loss.backward() print(g.grad.sum()) |
Okay, let's trace the execution step by step, paying close attention to the forward pass and how gradients are computed in the backward pass. 1. **Initialization:** * `g = torch.tensor([0.99, 1.54, 1.9], dtype=torch.float32, requires_grad=True)` 2. **Forward Pass (`diff_round(g, 1)`):** * `x` initially is `g`: `[0.99, 1.54, 1.9]` * `decimals = 1` * `scale_factor = 10 ** 1 = 10` * `x = x * scale_factor`: `x` becomes `[9.9, 15.4, 19.0]` * `torch.floor(x)`: `[9., 15., 19.]` * `diff = (1 + 1e-2) * x - torch.floor(x)`: * `1 + 1e-2 = 1.01` * `1.01 * x = 1.01 * [9.9, 15.4, 19.0] = [9.999, 15.554, 19.19]` * `diff = [9.999, 15.554, 19.19] - [9., 15., 19.] = [0.999, 0.554, 0.19]` * `torch.where(diff >= 0.5, 1, 0)`: * `diff >= 0.5`: `[True, True, False]` * Result: `[1, 1, 0]` * `rounded_term = torch.floor(x) + torch.where(diff >= 0.5, 1, 0)`: * `rounded_term = [9., 15., 19.] + [1, 1, 0] = [10., 16., 19.]` * `x = x - diff + rounded_term`: * `x = [9.9, 15.4, 19.0] - [0.999, 0.554, 0.19] + [10., 16., 19.]` * `x = [8.901, 14.846, 18.81] + [10., 16., 19.]` * `x = [18.901, 30.846, 37.81]` * `x = x / scale_factor`: * `x = [18.901, 30.846, 37.81] / 10 = [1.8901, 3.0846, 3.781]` * The function returns `y = torch.tensor([1.8901, 3.0846, 3.781])`. 3. **Loss Calculation:** * `loss = torch.sum(y)` * `loss = 1.8901 + 3.0846 + 3.781 = 8.7557` (The exact value isn't crucial, but the operation is). 4. **Backward Pass (`loss.backward()`):** * We need to compute `d(loss)/dg`. * `loss = sum(y_i)`, so `d(loss)/dy_i = 1` for all `i`. * We need `dy/dg`. Let's look at the `diff_round` function again in terms of the input `g`. Let `y = diff_round(g)`. * Inside the function (let's use `g` as the input variable name now): * `g_scaled = g * scale_factor` * `g_floor = torch.floor(g_scaled)` * `diff = (1 + 1e-2) * g_scaled - g_floor` * `round_decision = torch.where(diff >= 0.5, 1, 0)` * `rounded_term = g_floor + round_decision` * `y_scaled = g_scaled - diff + rounded_term` * `y = y_scaled / scale_factor` * Substitute `diff` into the expression for `y_scaled`: * `y_scaled = g_scaled - ((1 + 1e-2) * g_scaled - g_floor) + rounded_term` * `y_scaled = g_scaled - 1.01 * g_scaled + g_floor + rounded_term` * `y_scaled = -0.01 * g_scaled + g_floor + rounded_term` * Now, let's find the gradient `dy/dg` using the chain rule: `dy/dg = (dy / dy_scaled) * (dy_scaled / dg_scaled) * (dg_scaled / dg)`. * `dy / dy_scaled = 1 / scale_factor` * `dg_scaled / dg = scale_factor` * `dy_scaled / dg_scaled`: This is the crucial part. We differentiate `-0.01 * g_scaled + g_floor + rounded_term` with respect to `g_scaled`. * The gradient of `-0.01 * g_scaled` w.r.t `g_scaled` is `-0.01`. * `torch.floor` is a step function, its gradient is 0 almost everywhere. Autograd treats it as 0. * `rounded_term` depends on `g_floor` and `round_decision`. `round_decision` comes from `torch.where` based on a comparison (`>=`), which also has a gradient of 0 almost everywhere w.r.t the *values* being compared. Autograd treats the gradient contribution from `g_floor` and `rounded_term` as 0. * Therefore, `dy_scaled / dg_scaled = -0.01`. * Putting it together: `dy/dg = (1 / scale_factor) * (-0.01) * (scale_factor) = -0.01`. * This means the gradient of each element `y_i` with respect to the corresponding input element `g_i` is -0.01. * Since `loss = sum(y_i)`, the gradient `d(loss)/dg_i = sum_j (d(loss)/dy_j * dy_j/dg_i)`. As the operations are element-wise, `dy_j/dg_i` is 0 for `i != j` and `-0.01` for `i == j`. * So, `d(loss)/dg_i = (d(loss)/dy_i) * (dy_i/dg_i) = 1 * -0.01 = -0.01`. * Therefore, `g.grad` will be `torch.tensor([-0.01, -0.01, -0.01])`. 5. **Final Print:** * `print(g.grad.sum())` * `sum([-0.01, -0.01, -0.01]) = -0.03` The function will print:
1 | tensor(-0.0300) |
Testing if the previous output contains the string -0.03
: True