|
import streamlit as st |
|
from st_pages import add_indentation |
|
|
|
add_indentation() |
|
|
|
st.title('Loss functions') |
|
st.subheader('SDM Loss') |
|
st.markdown(''' |
|
The similarity distribution matching (SDM) loss, which is the KL divergence |
|
of the image to text and text to image to the label distribution. |
|
|
|
We define $f^v$ and $f^t$ to be the global representation of the visual and textual features respectively. |
|
The cosine similarity $sim(u, v) = \\frac{u \\cdot v}{|u||v|}$ will be used to compute the probability of the labels. |
|
|
|
We define $y_{i, j}=1$ if the visual feature $f^v_i$ matches the textual feature $f^t_j$, else $y_{i, j}=0$. |
|
The predicted label distribution can be formulated by''') |
|
st.latex(r''' |
|
p_{i} = \sigma(sim(f^v_i, f^t)) |
|
''') |
|
|
|
st.markdown(''' |
|
We can define the image to text loss as |
|
''') |
|
|
|
st.latex(r''' |
|
\mathcal{L}_{i2t} = KL(\mathbf{p_i} || \mathbf{q_i}) |
|
''') |
|
|
|
st.markdown('Where $\\mathbf{q_i}$, the true probability distribution, is defined as') |
|
|
|
st.latex(r''' |
|
q_{i, j} = \frac{y_{i, j}}{\sum_{k=1}^{N} y_{i, k}} |
|
''') |
|
|
|
st.markdown('It should be noted that the reason this computation is needed is because there could be multiple correct labels.') |
|
|
|
|
|
st.subheader('IRR (MLM) Loss') |
|
st.subheader('ID Loss') |