ImageGenerationDiffusionModels

Documentation for ImageGenerationDiffusionModels.

ImageGenerationDiffusionModels.apply_noise
ImageGenerationDiffusionModels.build_unet
ImageGenerationDiffusionModels.denoise_image
ImageGenerationDiffusionModels.down_block
ImageGenerationDiffusionModels.generate_grid
ImageGenerationDiffusionModels.generate_image_from_noise
ImageGenerationDiffusionModels.get_data
ImageGenerationDiffusionModels.pad_or_crop
ImageGenerationDiffusionModels.sinusoidal_embedding
ImageGenerationDiffusionModels.up_block

ImageGenerationDiffusionModels.apply_noise — Method

apply_noise(img; num_noise_steps = 500, beta_min = 0.0001, beta_max = 0.02)

Applies forward-noise to an image This function adds Gaussian noise to an image during multiple steps, which corresponds to the forward process in diffusion models.

Arguments

img : The input image
num_noise_steps: number of steps over which noise should be added to the image (500 by default).
beta_min: Minimum beta value (0.0001 by default)
beta_max: Maximum beta value (0.02 by default)

Returns

An image with noise

source

ImageGenerationDiffusionModels.build_unet — Function

build_unet(in_ch::Int=1, out_ch::Int=1, time_dim::Int=256)

Builds a time-conditioned U-Net model for image denoising and generation in diffusion models

Arguments

in_ch::Int=1: Number of unput channels(1 is for grayscale)
out_ch::Int=1: Number of output channels(1 is for grayscale)
time_dim::Int=256: Dimensionality of the time embedding vector used for condition

Returns

A callable function (x, t_vec) -> output, where:
- x: Input image
- t_vec: time step vector
- output: tensor of same dimensions as out_ch channels

source

ImageGenerationDiffusionModels.denoise_image — Method

denoise_image(noisy_img)

Denoises a noisy image using the trained neural network 'model'. Given a single input noisy_img::Matrix{<:Real}, this function produces a denoised version of that input file

Arguments

noisy_img::Matrix{<:Real}: noisy image

Returns

A denoised version of the image

source

ImageGenerationDiffusionModels.down_block — Method

down_block(in_ch, out_ch, time_dim)

Creates a downsampling block for the U-Net

Arguments

in_ch::Int: Number of input channels
out_ch::Int: Number of output channels
time_dim::Int: Dimensionality of the time embedding vector used for conditioning

Returns

A callable function (x, t_emb) -> (down, skip), where:
- x: Input feature map
- t_emb: Time embedding vector for the current step
- down: Downsampled feature map for the next layer
- skip: Intermediate feature map

source

ImageGenerationDiffusionModels.generate_grid — Method

generate_grid()

Loads the digits data and generates grid

source

ImageGenerationDiffusionModels.generate_image_from_noise — Method

generate_image_from_noise()

Generates a new image from random noise and denoises it.

source

ImageGenerationDiffusionModels.get_data — Method

get_data(batch_size)

Helper function that loads MNIST images and returns loader.

Arguments

batch_size::Int: size of batch

source

ImageGenerationDiffusionModels.pad_or_crop — Method

pad_or_crop(x, ref)

Pads or crops the input tensor x so that its dimensions match those of ref

Arguments

x: A 4D tensor, typically shaped (C, H, W, N)
ref: A reference tensor whose spatial size (H, W) x should match

Returns

A tensor with the same number of channels and batch size as x, but with height and width adjusted to match ref

source

ImageGenerationDiffusionModels.sinusoidal_embedding — Method

sinusoidal_embedding(t::Vector{Float32}, dim::Int)

Generates sinusoidal positional embeddings from a vector of scalar inputs, typically used to encode time steps or sequence positions

Arguments

t::Vector{Float32}: A vector of time or position values
dim::Int: The desired embedding dimensionality

Returns

A matrix of shape (length(t), dim) where each row is the embedding of one time step

source

ImageGenerationDiffusionModels.up_block — Method

up_block(in_ch, out_ch, time_dim)

Creates an upsampling block used in U-Net

Arguments

in_ch::Int: Number of input channels to the block
out_ch::Int: Number of output channels after the convolutions
time_dim::Int: Dimensionality of the time embedding vector

Returns

A callable function (x, skip, t_emb) -> output, where:
- x: The upsampled feature map from the previous layer
- skip: The skip connection feature map from the encoder
- t_emb: The time embedding vector for the current step
- The output is a feature map with out_ch channels

source