Project 5: Fun With Diffusion Models!

Part 0: Setup

The random seed I used is 17.

Various inference steps

The output image appear to be corresponding to the prompt. However the style between each image is different, the men wearing hat appears to be realistic while the other two images appear to be in a cartoonish style, dispite the fact that the snomy mountain village image has 'oil painting' in its prompt.

We could see that after we reduce the number of inference steps to 5, the picture start to be noisy, this is expected as there would not be enough steps for the model to do denoising.

After modifying the number of inference steps to 50, while it takes longer to output image, the quality/resolution of images seem to increase. However, the style of the snowy mountain village is still cartoonish showing this is not a problem of steps, but of prompt.

Part 1: Sampling Loops

1.3 One-Step Denoising

D_x — One-Step Denoised Campanile at t=500

D_x — One-Step Denoised Campanile at t=750

1.4 Iterative Denoising

We could see that iterative denoising performs much better than the other two methods.

1.7 Image-to-image Translation

1.7.3 Text-Conditioned Image-to-image Translation

rocket to campanile picture

pencil to car picture

Barista to fish picture

Part 1: Single-Step Denoising UNet

Noising Process

Training Loss Curve

Sample results on the test set after the first epoch

Sample results on the test set after the fifth epoch

Sample results on the test set with out-of-distribution noise levels after the model is trained

Part 2: Training a Diffusion Model

Time Conditioning UNet

Training Loss Curve

Sample results on the test set after 1 epoch

Sample results on the test set after 5 epoch

Sample results on the test set after 20 epoch

Class Conditioning UNet

Training Loss Curve

Sample results on the test set after 1 epoch

Sample results on the test set after 5 epoch

Sample results on the test set after 20 epoch

conclusion

This is a very fun project, helping me utilize and get familier with different funtionalities of diffusion model. The part of the visual anagrams is especially fun as how the algorithm could weave layers of information into a single picture, like those master painters used to do.

Part 0: Setup

Various inference steps

Part 1: Sampling Loops

1.1 Implementing the Forward Process

1.2 Classical Denoising

1.3 One-Step Denoising

1.4 Iterative Denoising

1.5 Diffusion Model Sampling

1.6 Classifier-Free Guidance (CFG)

1.7 Image-to-image Translation

1.7.1 Editing Hand-Drawn and Web Images

1.7.2 Inpainting

1.7.3 Text-Conditioned Image-to-image Translation

rocket to campanile picture

pencil to car picture

Barista to fish picture

1.8 Visual Anagrams

Campfire & Old man

Man & Dog

Man & Barista

1.9 Hybrid Images

Skull & Waterfall

Rocket & Pencil

Lion & Donut

Part 1: Single-Step Denoising UNet

Noising Process

Training Loss Curve

Sample results on the test set after the first epoch

Sample results on the test set after the fifth epoch

Sample results on the test set with out-of-distribution noise levels after the model is trained

Part 2: Training a Diffusion Model

Time Conditioning UNet

Training Loss Curve

Sample results on the test set after 1 epoch

Sample results on the test set after 5 epoch

Sample results on the test set after 20 epoch

Class Conditioning UNet

Training Loss Curve

Sample results on the test set after 1 epoch

Sample results on the test set after 5 epoch

Sample results on the test set after 20 epoch

conclusion