Draw Things App

Error converting content: marked is not a function

- Runs #Stable Diffusion on device. By @Liu Liu SF Dude
- Blog - https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-model-that-can-draw-everything-in-your-pocket/
  collapsed:: true
- Once in a few years, there are programs where even on the best of our computing devices they can be barely usable. But these new programs with newly enabled scenarios are so great that people are willing to suffer through. Last time this happened was the deep neural networks, and the time before that, was the 3D graphics. I believe this is the 3rd time.
- It turns out, to run Stable Diffusion on an iPhone is easier than I thought, and I probably left 50% performance on the table still.
- The main challenge is to run the app on the 6GiB RAM iPhone devices. 6GiB sounds a lot, but iOS will start to kill your app if you use more than 2.8GiB on a 6GiB device, and more than 2GiB on a 4GiB device.
- The model has 4 parts: a text encoder that generates text feature vectors to guide the image generation. An optional image encoder to encode image into latent space (for image-to-image generation). A denoiser model that slowly denoise out a latent representation of an image from noise. An image decoder to decode the image from that latent representation. The 1st, 2nd, and 4th models need to run once during inference. They are relatively cheap (around 1GiB max). The denoiser model’s weights occupy 3.2GiB (in full floating-point) of the original 4.2GiB model weights. It also needs to run multiple times per execution, so we want to keep it in RAM longer
  - Then, why originally Stable Diffusion model requires close to 10GiB to run for a single image inference?
  - Between the single input (2x4x64x64) and single output (2x4x64x64), there are many layer outputs. Not all layer outputs can be immediately reused next. Some of these, due to the network structures, have to be kept around to be used later (residual networks).
- The 3.2GiB, or 1.6GiB in half floating-point, is the starting point we are working with. We have around 500MiB space to work with if we don’t want to get near where Apple’s OOM killer might kill us.
- On Apple hardware, a popular choice to implement neural network backend is to use the MPSGraph framework.
- probably can reduce runtime by 30% and memory usage by about 15%. Well, for another day.