# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
Let's evaluate each equation one by one to determine if they are correct according to the specification. ### Equation 1 \[ f_{1}(x') = -\loss_{F,t}(x') + 1 \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - The cross-entropy loss for a target class \( t \) will be minimized when the model assigns high probability to class \( t \). Since \( \loss_{F,t}(x') \) is minimized when \( F(x')_t \) is high, \( -\loss_{F,t}(x') \) will be maximized, making \( f_{1}(x') \le 0 \) when \( F(x')_t \) is high. - **Conclusion**: Correct. ### Equation 2 \[ f_{2}(x') = (\max_{i \ne t}( F(x')_i) - F(x')_t)^+ \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - This function measures how much the highest other class score exceeds the target class score. If \( F(x')_t \) is the highest, this value will be negative or zero, satisfying \( f_{2}(x') \le 0 \). - **Conclusion**: Correct. ### Equation 3 \[ f_{3}(x') = \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2) \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - \(\text{softplus}(x)\) is always positive and \(\text{softplus}(0) = \log(2)\). Thus, when \( F(x')_t \) is the highest score, \( \max_{i \ne t}( F(x')_i) - F(x')_t \le 0 \), making \( \text{softplus}(\cdot) \le \log(2) \), hence \( f_{3}(x') \le 0 \). - **Conclusion**: Correct. ### Equation 4 \[ f_{4}(x') = (0.5 - F(x')_t)^+ \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - This function is incorrect because \( F(x')_t \) should be high for \( C(x+\delta) = t \), but this equation would only be \(\le 0\) if \( F(x')_t \ge 0.5 \). Since \( F(x')_t \) can be less than 0.5 and still be the highest probability (e.g., in multi-class scenarios with many classes), this does not correctly reflect \( C(x+\delta) = t \). - **Conclusion**: Incorrect. ### Equation 5 \[ f_{5}(x') = -\log(2 F(x')_t - 2) \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - The expression inside the logarithm, \( 2F(x')_t - 2 \), must be positive for the log to be defined, i.e., \( F(x')_t > 1 \), which is impossible since \( 0 \le F(x')_t \le 1 \). - **Conclusion**: Incorrect. ### Equation 6 \[ f_{6}(x') = (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+ \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - This function is similar to \( f_2 \) but uses logits instead of the softmax output. If \( Z(x')_t \) is the highest logit, \( \max_{i \ne t}( Z(x')_i) - Z(x')_t \le 0 \), making \( f_{6}(x') \le 0 \). - **Conclusion**: Correct. ### Equation 7 \[ f_{7}(x') = \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \] - **Specification**: \( C(x+\delta) = t \) if and only if \( f(x+\delta) \le 0 \). - This function is similar to \( f_3 \) but uses logits instead of the softmax output. The logic is the same: \( \text{softplus}(0) = \log(2) \), and if \( Z(x')_t \) is the highest logit, \( \text{softplus}(\cdot) \le \log(2) \), hence \( f_{7}(x') \le 0 \). - **Conclusion**: Correct. ### Conclusion The incorrect equations are \( f_4 \) and \( f_5 \).
1 | [4, 5] |
Resulting in output: False
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
Let's analyze each equation to determine if they are correct based on the provided specification. ### Equation 1: \[ f_{1}(x') = -\loss_{F,t}(x') + 1 \] - Here, \(\loss_{F,t}(x')\) is the cross-entropy loss for the target class \(t\). The objective function indicates that \(C(x+\delta) = t\) if \(f(x+\delta) \le 0\). - The cross-entropy loss is minimized when the prediction matches the target class \(t\). By adding 1 and taking the negative, this equation attempts to flip the inequality. This seems correct as it meets the condition \(f(x+\delta) \le 0\). ### Equation 2: \[ f_{2}(x') = (\max_{i \ne t}( F(x')_i) - F(x')_t)^+ \] - This equation measures the difference between the highest non-target class probability and the target class probability. - If this value is positive, it means the target class \(t\) is not the highest probability, hence the classifier \(C\) would not predict \(t\). This matches the condition \(f(x+\delta) \le 0\). ### Equation 3: \[ f_{3}(x') = \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2) \] - The \(\text{softplus}\) function is log(1 + exp(x)), which is a smooth approximation to the ReLU function. - Subtracting \(\log(2)\) shifts the function such that it equals 0 when \(\max_{i \ne t}(F(x')_i) - F(x')_t = 0\). This seems correct in ensuring \(f(x+\delta) \le 0\) when \(C(x+\delta) = t\). ### Equation 4: \[ f_{4}(x') = (0.5 - F(x')_t)^+ \] - For \(C(x+\delta) = t\), \(F(x+\delta)_t\) should be the highest probability, which should ideally be greater than 0.5. - Hence, if \(F(x')_t \ge 0.5\), \(f_{4}(x') = 0\), which satisfies \(f(x+\delta) \le 0\). This seems correct. ### Equation 5: \[ f_{5}(x') = -\log(2 F(x')_t - 2) \] - This equation is problematic. The argument of the logarithm, \(2 F(x')_t - 2\), will be negative or zero for \(F(x')_t \le 1\), and logarithm of a non-positive number is undefined. - This equation cannot always ensure \(f(x+\delta) \le 0\) in a valid manner. This equation is incorrect. ### Equation 6: \[ f_{6}(x') = (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+ \] - This is similar to \(f_2\) but applied to logits instead of softmax outputs. - If \(\max_{i \ne t}( Z(x')_i) - Z(x')_t \le 0\), it ensures the logit for the target class \(t\) is highest, ensuring \(C(x+\delta) = t\). This seems correct. ### Equation 7: \[ f_{7}(x') = \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \] - Similar to \(f_3\), but applied to logits. This also ensures \(f(x+\delta) \le 0\) when \(C(x+\delta) = t\). This seems correct. Based on the analysis, the incorrect equation is: \[ f_{5}(x') = -\log(2 F(x')_t - 2) \] ### Summary The list of incorrect equations is: \[ [5] \]
Resulting in output: False
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
Let's analyze each equation one by one to determine if it is incorrect according to the given specification. ### Equation 1: \[ f_{1}(x') = -\loss_{F,t}(x') + 1 \] The objective function $f$ should satisfy $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. The cross-entropy loss $\loss_{F,t}(x')$ is minimized when $F(x')_t$ is maximized and the other outputs are minimized. By adding 1 and negating the loss, we make $f_{1}(x')$ negative when $\loss_{F,t}(x')$ is high, and positive when $\loss_{F,t}(x')$ is low. This seems correct because minimizing the cross-entropy loss aligns with making the correct classification. ### Equation 2: \[ f_{2}(x') = (\max_{i \ne t}( F(x')_i) - F(x')_t)^+ \] This equation measures the difference between the highest logit for any class other than $t$ and the logit for class $t$. If $F(x')_t$ is the highest logit, the value inside the max will be negative, and thus $f_{2}(x')$ will be zero, satisfying $f(x+\delta) \le 0$. This seems correct. ### Equation 3: \[ f_{3}(x') = \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t) - \log(2) \] Softplus is a smooth approximation of the ReLU function. If $F(x')_t$ is the highest logit, $\max_{i \ne t}(F(x')_i) - F(x')_t$ will be negative, making the softplus function close to 0. Subtracting $\log(2)$ ensures the term is negative when $F(x')_t$ is the highest. This seems correct. ### Equation 4: \[ f_{4}(x') = (0.5 - F(x')_t)^+ \] This equation seems incorrect because $F(x')_t$ represents a probability (so it should be between 0 and 1). The function $f_{4}(x')$ will be positive if $F(x')_t$ is less than 0.5, which does not necessarily align with $C(x+\delta) = t$. Typically, $F(x')_t$ should be the highest probability, not just greater than 0.5. ### Equation 5: \[ f_{5}(x') = -\log(2 F(x')_t - 2) \] This equation is problematic because $2 F(x')_t - 2$ will be negative or zero for any valid probability $F(x')_t \in [0, 1]$, leading to a log of a non-positive number, which is undefined. ### Equation 6: \[ f_{6}(x') = (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+ \] This equation uses logits directly, similar to $f_{2}$ but without the softmax transformation. It measures the difference between the highest logit for any class other than $t$ and the logit for class $t$. If $Z(x')_t$ is the highest logit, the value inside the max will be negative, and thus $f_{6}(x')$ will be zero, satisfying $f(x+\delta) \le 0$. This seems correct. ### Equation 7: \[ f_{7}(x') = \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t) - \log(2) \] This is similar to $f_{3}$ but uses logits instead of softmax outputs. If $Z(x')_t$ is the highest logit, $\max_{i \ne t}(Z(x')_i) - Z(x')_t$ will be negative, making the softplus function close to 0. Subtracting $\log(2)$ ensures the term is negative when $Z(x')_t$ is the highest. This seems correct. ### Incorrect Equations: Based on the analysis, the incorrect equations are: - Equation 4: \((0.5 - F(x')_t)^+\) - Equation 5: \(-\log(2 F(x')_t - 2)\) Thus, the list of incorrect equations is: \[ [4, 5] \]
Resulting in output: False
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
Let's analyze each equation one by one to determine if they are correct according to the specification provided. 1. **Equation \( f_{1}(x') = -\loss_{F,t}(x') + 1 \)**: - The cross-entropy loss \(\loss_{F,t}(x')\) measures how well the model's output \(F(x')\) matches the target label \(t\). If \(C(x') = t\), then ideally the loss should be low. - The equation \(-\loss_{F,t}(x') + 1\) implies that when \(C(x') = t\), the term \(-\loss_{F,t}(x')\) should be high (since \(\loss_{F,t}(x')\) is low), making \(f_{1}(x') \le 0\) true. - This equation seems correct. 2. **Equation \( f_{2}(x') = (\max_{i \ne t} (F(x')_i) - F(x')_t)^+ \)**: - This equation checks if the highest probability among classes other than \(t\) is greater than the probability of class \(t\). - For \(C(x') = t\), \(F(x')_t\) should be the highest and thus \(f_{2}(x') \le 0\). - This equation seems correct. 3. **Equation \( f_{3}(x') = \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t) - \log(2) \)**: - \(\text{softplus}(x) = \log(1 + \exp(x))\) is always positive and increasing. - For \(C(x') = t\), \(F(x')_t\) is the highest, so \(\max_{i \ne t}(F(x')_i) - F(x')_t \le 0\). - \(\text{softplus}(0) = \log(2)\), thus \(f_{3}(x') = 0\) when \(C(x') = t\). - This equation seems correct. 4. **Equation \( f_{4}(x') = (0.5 - F(x')_t)^+ \)**: - For \(C(x') = t\), \(F(x')_t\) should be the highest probability and ideally greater than 0.5. - If \(F(x')_t > 0.5\), \(0.5 - F(x')_t\) is negative, and thus \(f_{4}(x') = 0\). - This equation seems correct. 5. **Equation \( f_{5}(x') = -\log(2 F(x')_t - 2) \)**: - \(F(x')_t\) is a probability between 0 and 1. - \(2 F(x')_t - 2\) will be negative or zero for all \(F(x')_t \in [0, 1]\). - The logarithm of a non-positive number is undefined. - This equation is incorrect. 6. **Equation \( f_{6}(x') = (\max_{i \ne t}(Z(x')_i) - Z(x')_t)^+ \)**: - \(Z(x')_t\) are the logits before applying the softmax. - The same logic as \(f_{2}(x')\) applies but to logits. - For \(C(x') = t\), \(Z(x')_t\) should be higher than other logits, so \(\max_{i \ne t}(Z(x')_i) - Z(x')_t \le 0\). - This equation seems correct. 7. **Equation \( f_{7}(x') = \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t) - \log(2) \)**: - The same logic as \(f_{3}(x')\) applies but to logits. - For \(C(x') = t\), \(\max_{i \ne t}(Z(x')_i) - Z(x')_t \le 0\). - \(\text{softplus}(0) = \log(2)\), thus \(f_{7}(x') = 0\) when \(C(x') = t\). - This equation seems correct. Based on the analysis, the only incorrect equation is \( f_{5}(x') \). The list of incorrect equations is: **[5]**
Resulting in output: False
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
# Setup Which of the following equations are incorrect according to the specification? # Notation A neural network is a function $F(x) = y$ that accepts an input $x \in \mathbb{R}^n$ and produces an output $y \in \mathbb{R}^m$. The model $F$ also implicitly depends on some model parameters $\theta$; in our work the model is fixed, so for convenience we don't show the dependence on $\theta$. In this paper we focus on neural networks used as an $m$-class classifier. The output of the network is computed using the softmax function, which ensures that the output vector $y$ satisfies $0 \le y_i \le 1$ and $y_1 + \dots + y_m = 1$. The output vector $y$ is thus treated as a probability distribution, i.e., $y_i$ is treated as the probability that input $x$ has class $i$. The classifier assigns the label $C(x) = \arg\max_i F(x)_i$ to the input $x$. Let $C^*(x)$ be the correct label of $x$. The inputs to the softmax function are called \emph{logits}. We use the notation from Papernot et al. \cite{distillation}: define $F$ to be the full neural network including the softmax function, $Z(x) = z$ to be the output of all layers except the softmax (so $z$ are the logits), and \begin{equation*} F(x) = \softmax(Z(x)) = y. \end{equation*} A neural network typically \footnote{Most simple networks have this simple linear structure, however other more sophisticated networks have more complicated structures (e.g., ResNet \cite{he2016deep} and Inception \cite{szegedy2015rethinking}). The network architecture does not impact our attacks.} consists of layers \begin{equation*} F = \softmax \circ F_n \circ F_{n-1} \circ \cdots \circ F_1 \end{equation*} where \begin{equation*} F_i(x) = \sigma(\theta_i \cdot x) + \hat\theta_i \end{equation*} for some non-linear activation function $\sigma$, some matrix $\theta_i$ of model weights, and some vector $\hat\theta_i$ of model biases. Together $\theta$ and $\hat\theta$ make up the model parameters. Common choices of $\sigma$ are tanh \cite{mishkin2015all}, sigmoid, ReLU \cite{maas2013rectifier}, or ELU \cite{clevert2015fast}. In this paper we focus primarily on networks that use a ReLU activation function, as it currently is the most widely used activation function \cite{szegedy2015rethinking,springenberg2014striving,mishkin2015all,distillation}. # Specification: We define an objective function $f$ such that $C(x+\delta) = t$ if and only if $f(x+\delta) \le 0$. There are many possible choices for $f$: # The equations \begin{align*} f_{1}(x') &= -\loss_{F,t}(x') + 1\\ f_{2}(x') &= (\max_{i \ne t}( F(x')_i) - F(x')_t)^+\\ f_{3}(x') &= \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2)\\ f_{4}(x') &= (0.5 - F(x')_t)^+\\ f_{5}(x') &= -\log(2 F(x')_t - 2)\\ f_{6}(x') &= (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+\\ f_{7}(x') &= \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \end{align*} where $s$ is the correct classification, $(e)^+$ is short-hand for $\max(e,0)$, $\text{softplus}(x) = \log(1+\exp(x))$, and $\loss_{F,s}(x)$ is the cross entropy loss for $x$. # Your task Consider each equation one by one. End your answer with a python list of numbers [1,2,3,4,5,6,7] for those that are wrong.
Let's analyze each equation in the context of the given specification and the role they play in defining the objective function \( f \). 1. **Equation \( f_1(x') \)**: \[ f_{1}(x') = -\loss_{F,t}(x') + 1 \] - This equation seems to be related to the cross-entropy loss. For this equation to be correct, \( -\loss_{F,t}(x') + 1 \) should be less than or equal to 0 when \( C(x') = t \). As \( \loss_{F,t}(x') \) is usually non-negative, the offset of +1 ensures that \( f_{1}(x') \leq 0 \) when the model assigns class \( t \) with high confidence (low loss). This equation is correct. 2. **Equation \( f_2(x') \)**: \[ f_{2}(x') = (\max_{i \ne t}( F(x')_i) - F(x')_t)^+ \] - This equation computes the difference between the highest non-target class probability and the target class probability. If \( C(x') = t \), \( F(x')_t \) should be the highest probability, making this difference negative or zero, thus ensuring \( f_{2}(x') \leq 0 \). This equation is correct. 3. **Equation \( f_3(x') \)**: \[ f_{3}(x') = \text{softplus}(\max_{i \ne t}(F(x')_i) - F(x')_t)-\log(2) \] - The softplus function, \( \log(1+\exp(x)) \), is always positive. When \( \max_{i \ne t}(F(x')_i) \) is less than \( F(x')_t \), the argument to the softplus is negative, and softplus of a negative number minus \( \log(2) \) can be zero or less. This equation is correct. 4. **Equation \( f_4(x') \)**: \[ f_{4}(x') = (0.5 - F(x')_t)^+ \] - If \( F(x')_t \) is greater than or equal to 0.5, \( f_{4}(x') \) will be zero. However, if \( F(x')_t \) is less than 0.5, \( f_{4}(x') \) will be positive. For correct classification, \( F(x')_t \) should be the highest, and often it needs to be greater than 0.5, especially for binary or balanced multi-class classification. This equation is consistent with the specification and is correct. 5. **Equation \( f_5(x') \)**: \[ f_{5}(x') = -\log(2 F(x')_t - 2) \] - The term \( 2 F(x')_t - 2 \) can be zero or negative when \( F(x')_t \leq 1 \). Taking the logarithm of a non-positive number is undefined. This equation is incorrect. 6. **Equation \( f_6(x') \)**: \[ f_{6}(x') = (\max_{i \ne t}( Z(x')_i) - Z(x')_t)^+ \] - This equation is similar to \( f_2 \) but uses logits instead of probabilities. The same logic applies; if \( Z(x')_t \) is the highest logit, the expression inside the max function will be zero or negative, ensuring \( f_{6}(x') \leq 0 \). This equation is correct. 7. **Equation \( f_7(x') \)**: \[ f_{7}(x') = \text{softplus}(\max_{i \ne t}(Z(x')_i) - Z(x')_t)-\log(2) \] - This equation is similar to \( f_3 \) but with logits. The softplus function applied to logits should behave similarly, ensuring \( f_{7}(x') \leq 0 \) when \( Z(x')_t \) is the highest logit. This equation is correct. In summary, the incorrect equation is: \[ f_{5}(x') = -\log(2 F(x')_t - 2) \] The list of incorrect equations is:
1 | [5] |
Resulting in output: False