Diffusing Images on a Laptop with no GPU

This weekend I learned that it is possible to diffuse images without a GPU. I didn't think this would work but it's not only possible, it's easy and actually pretty fast! (Disclaimer: you need a good amount of RAM, I have 20GB) To keep this simple and portable. I have used docker to run fastsdcpu. FROM ubuntu:24.04 AS base RUN apt update \ && apt-get install -y python3 python3-venv python3-pip python3-wheel ffmpeg git wget nano \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* \ && pip install uv --break-system-packages FROM base AS fastsd ARG FASTSDCPU_VERSION=v1.0.0-beta.200 RUN git clone https://github.com/rupeshs/fastsdcpu /app \ && cd app \ && git checkout -b $FASTSDCPU_VERSION \ && wget https://huggingface.co/rupeshs/FastSD-Flux-GGUF/resolve/main/libstable-diffusion.so?download=true -O libstable-diffusion.so WORKDIR /app SHELL [ "/bin/bash", "-c" ] RUN echo y | bash -x ./install.sh --disable-gui VOLUME /app/models/gguf/ VOLUME /app/lora_models/ VOLUME /app/controlnet_models/ VOLUME /root/.cache/huggingface/hub/ ENV GRADIO_SERVER_NAME=0.0.0.0 EXPOSE 7860 CMD [ "/app/start-webui.sh" ] And start this container with Docker Compose, mapping the volumes to directories on your host system to store models outside of the container. services: fastsdcpu: build: context: . dockerfile: Dockerfile ports: - "7860:7860" volumes: - gguf:/app/models/gguf/ - lora:/app/lora_models/ - ctrl:/app/controlnet_models/ - cache:/root/.cache/huggingface/hub/ deploy: resources: limits: memory: 20g stdin_open: true tty: true environment: - GRADIO_SERVER_NAME=0.0.0.0 volumes: gguf: driver: local driver_opts: type: none o: bind device: ./models/gguf lora: driver: local driver_opts: type: none o: bind device: ./models/lora cache: driver: local driver_opts: type: none o: bind device: ./models/cache ctrl: driver: local driver_opts: type: none o: bind device: ./models/ctrl $ sudo docker compose up --build Once the web service has started you can access it at http://localhost:7860. This app is designed to auto-download the selected model the first time you try to generate an image. You'll have to experiment with what works best for your use-case. The default model LCM -> stabilityai/sd-turbo works pretty well for objects and scenery but does not do so well with realistic images of people. LCM-Lora -> Lykon/dreamshaper-8 is much better at people and quite surprisingly fast. Even with my modest Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz with no dedicated GPU, I can generate crisp, consistent images in ~30 seconds. Of course your generation settings will affect this. Higher resolution images or more inference steps will take longer. I found the best settings for dreamshaper to be 4-5 steps with guidance scale 1. I can quickly generate 256x256 images for testing prompts and after I get roughly what I want, I increase the resolution and other settings gradually until I get exactly the image I'm looking for. Using tiny auto encoder for SD makes a significant difference in speed. I tried using LCM-OpenVINO -> rupeshs/sd-turbo-openvino which is specifically for Intel setups but I found this took longer and bogged my system down. If you have a newer Intel Arc based system this will probably work better for you. Sadly I was not able to get Flux1 working. I think it requires CPU instructions that my system does not possess. If you have an i7 or higher, this would be the ideal model to choose if you want highly creative images especially in a fantasy setting. Also, Flux1 can generate coherent text which Stable Diffusion notoriously fails at.

Apr 21, 2025 - 20:22
 0
Diffusing Images on a Laptop with no GPU

This weekend I learned that it is possible to diffuse images without a GPU. I didn't think this would work but it's not only possible, it's easy and actually pretty fast! (Disclaimer: you need a good amount of RAM, I have 20GB)

To keep this simple and portable. I have used docker to run fastsdcpu.

FROM ubuntu:24.04 AS base

RUN apt update \
&& apt-get install -y python3 python3-venv python3-pip python3-wheel ffmpeg git wget nano \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip install uv --break-system-packages

FROM base AS fastsd

ARG FASTSDCPU_VERSION=v1.0.0-beta.200

RUN git clone https://github.com/rupeshs/fastsdcpu /app \
&& cd app \
&& git checkout -b $FASTSDCPU_VERSION \
&& wget https://huggingface.co/rupeshs/FastSD-Flux-GGUF/resolve/main/libstable-diffusion.so?download=true -O libstable-diffusion.so

WORKDIR /app

SHELL [ "/bin/bash", "-c" ]

RUN echo y | bash -x ./install.sh --disable-gui

VOLUME /app/models/gguf/
VOLUME /app/lora_models/
VOLUME /app/controlnet_models/
VOLUME /root/.cache/huggingface/hub/

ENV GRADIO_SERVER_NAME=0.0.0.0
EXPOSE 7860

CMD [ "/app/start-webui.sh" ]

And start this container with Docker Compose, mapping the volumes to directories on your host system to store models outside of the container.

services:
  fastsdcpu:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "7860:7860"
    volumes:
      - gguf:/app/models/gguf/
      - lora:/app/lora_models/
      - ctrl:/app/controlnet_models/
      - cache:/root/.cache/huggingface/hub/
    deploy:
      resources:
        limits:
          memory: 20g
    stdin_open: true
    tty: true
    environment:
      - GRADIO_SERVER_NAME=0.0.0.0

volumes:
  gguf:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/gguf
  lora:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/lora
  cache:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/cache
  ctrl:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/ctrl

$ sudo docker compose up --build

Once the web service has started you can access it at http://localhost:7860.

Fast SD Model Settings

This app is designed to auto-download the selected model the first time you try to generate an image. You'll have to experiment with what works best for your use-case. The default model LCM -> stabilityai/sd-turbo works pretty well for objects and scenery but does not do so well with realistic images of people. LCM-Lora -> Lykon/dreamshaper-8 is much better at people and quite surprisingly fast. Even with my modest Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz with no dedicated GPU, I can generate crisp, consistent images in ~30 seconds.

Sample output

Of course your generation settings will affect this. Higher resolution images or more inference steps will take longer. I found the best settings for dreamshaper to be 4-5 steps with guidance scale 1. I can quickly generate 256x256 images for testing prompts and after I get roughly what I want, I increase the resolution and other settings gradually until I get exactly the image I'm looking for. Using tiny auto encoder for SD makes a significant difference in speed.

I tried using LCM-OpenVINO -> rupeshs/sd-turbo-openvino which is specifically for Intel setups but I found this took longer and bogged my system down. If you have a newer Intel Arc based system this will probably work better for you.

Sadly I was not able to get Flux1 working. I think it requires CPU instructions that my system does not possess. If you have an i7 or higher, this would be the ideal model to choose if you want highly creative images especially in a fantasy setting. Also, Flux1 can generate coherent text which Stable Diffusion notoriously fails at.