3 min read
More SDXL

📢 Please note: Generative AI may have been used to produce or edit text, images, and/or audio on this site. However, all opinions and intellectual property rights remain my own.

Experiments with other LoRAs

In addition to making custom LoRAs of myself, I also made one of the same unsuspecting friend who judged images from version 6. Since his Instagram is public, he was the perfect subject for my next tests. All I had to do was scrape the images from his profile, and it would be perfect training data. However, there were some additional steps.

  • I had to upscale all of the images using a tool like Upscayl. This type of AI software does tend to get a little messy around human subjects, and this is likely the cause of degraded face quality in the final results.
  • I removed other humans from the photos using Photoshop’s generative fill. For photos where the subject was closely embracing someone else (which was a lot of them), I opted to replace the humans with dogs since they can easily be filtered out using keywords in the captions.

After those two additional steps, I trained the model like usual (skipping out on the regularization images to save on training time). I used the keyword “Zac Efron” as the base for the character LoRA. Due to the quality of the training data, faces tend to get messed up in most generations. I tried a face-restoring LoRA on top of it, but that only seemed to make things worse. In order to make the images usable, I decided to pivot away from generations that mimic photos and instead go for a more artistic route. I used the styles built into Fooocus as well as the SDXL Fae Style LoRA.

The following checkpoints were also used:

  • SDXL Unstable Diffusers ☛ YamerMIX
  • DynaVision XL

Example Example Example Example

Did I get a little carried away? Potentially. Should I have asked for permission before generating images of my longtime friend as a sexy rabbit cyborg? Never.

The next victim (I mean subject) of my finetuning is my very own mother. She was an ideal candidate due to her consistent makeup look and having access to plenty of curated training data. I asked the subject for 10-50 self-chosen images and received 20 images back, with 17 of them making it into the final model.

The model keyword I used was Chrissy Teigen and the whole process took around three hours with 3400 training steps. Training was again conducted on my RTX 3090.

Example Example Example

The images of Maria are the most lifelike I’ve created so far. One image in particular was rated especially well by human judges, whose only dead giveaway is the deformed television in the background. Also, the subject is known to exclusively take photos in a portrait orientation, so the fact that the generated images were in a landscape orientation could also be a giveaway.