Questions tagged [diffusion]

12 questions
2
votes
1 answer

Diffusion Models: Conditioning on Time vs. Noise Level

I am new to SE-Data Science, therefore I hope this is the right place to ask this rather theoretical question. In diffusion models we usually have a time variable which determines the noise schedule (e.g. $T \in [0,…4000]$). For training we sample a…
2
votes
2 answers

Edit friendly DDPM noise space

I was reading this paper, "An Edit Friendly DDPM Noise Space: Inversion and Manipulations". In page no. 4, they have mentioned that in DDPM, noise maps of consecutive steps are highly correlated while their edit friendly noise maps of consecutive…
shivani
  • 150
  • 10
1
vote
0 answers

Should I interleave sin and cosine in sinusoidal positional encoding?

I'm trying to implement a sinusoidal positional encoding. I found two solutions that give different encodings. I am wondering if one of them is wrong or both are correct. I showcase visual figures of the resulting encodings for both options. Thank…
1
vote
1 answer

How does ChatGPT-4o work on text + image data?

What known state of art techniques might ChatGPT-4o, Claude 3 or other similar systems be using to understand both text and image data? I noticed that ChatGPT-4o can recognize text in an image well. Might it be using an external OCR tool or has it…
user163246
  • 11
  • 2
1
vote
0 answers

diffusion model: can't overfit on single batch

I am training the diffusion model from diffusion policy, specifically their vision notebook, on a custom dataset. As always, I try to make a sanity check of the pipeline, by overfitting on a single batch. I would expect the loss to go to 0 or nearly…
Felix Hegg
  • 11
  • 1
1
vote
0 answers

How to derive at the expectation equation given in the paper "Video Diffusion Models"?

In the paper Video Diffusion Models, Section 3.1 mentions the following equation: $$ E_q[x^b|,z_t,x^a] = E_q[x^b|z_t] + (\frac{{\sigma}_t^2}{{\alpha}_t})\nabla_{z_t^b}\log q(x^a|z_t)$$, where $x^a, x^b$ are two video samples, $q$ is forward…
p1p13
  • 21
  • 2
1
vote
0 answers

Common sense fixes to a buggy diffusion model that won’t overfit one sample?

hope this question is in the right place. I’m working with a toy diffusion model to generate points e.g learning a Swiss roll which to me is a basic use case that I wanted to start with. My model is generally sensible and I’ve implemented both a…
0
votes
0 answers

Rate-distortion plots in denoising diffusion model evaluation

In the Denoising Diffusion Probabilistic Models paper (https://arxiv.org/abs/2006.11239), the rate-distortion plot is computed assuming access to a protocol that can transmit samples $(x_T, ... x_0)$. This is then used to construct Algorithm3 and…
abora
  • 1,228
  • 1
  • 9
  • 3
0
votes
0 answers

Connexion between noise and score in Diffusion

In the article Score-Based Generative Modeling through Stochastic Differential Equations (Song and al.), it's explained that we need to solve the reverse-time SDE to obtain samples from image distribution $p_{0}$: $$ \text d \mathbf{x} =…
0
votes
0 answers

Accurate score function estimation using score-based diffusion models

My question is mainly related to the seminal paper by Song et al.: "Score-Based Generative Modeling through Stochastic Differential Equations". I would like to leverage their framework in order to build a strong prior that can accurately estimate…
cosec
  • 51
  • 1
  • 1
  • 3
0
votes
0 answers

Diffusion Model consistency term derivation question

The consistency term of the diffusion model is written as: $$\mathop{\mathbb{E_{q_\phi(x_{1:T}|x_0)}}} \left[\log\prod_{t=2}^T \frac{p(x_{t-1} | x_t)}{q_\phi(x_{t-1}|x_t, x_0)}\right]$$ $$= \sum_{t=2}^T \mathop{\mathbb{E_{q_\phi(x_t,x_{t-1}|x_0)}}}…
0
votes
0 answers

using ddpm (Denoising diffusion probabilistic model) to teach the model to generate low resolution satellite images

what i am doing right now is using low res satellite images to train a ddpm. My problem lies with the dataset . The data set consists of 10 band-images of the same patch of land but with variation in tree sizes(+-20%) and variation in some physical…