Progression over Time
Background
First Efforts
Quick history:
MidJourney… very quickly created characters such as Charlotte. It was so satisfying that I paid for a subscription. However, the characters all began to look very similar.
LeonardoAI… Subsequently, this produced a significantly broader variety of individual faces. The key character ‘Victoria’ evolved in Leonardo.
Discovered Stable Diffusion and Automatic1111, which opened the door to limitless image/character creation (including NSFW). For this, however, I needed a more powerful graphics card.
Purchased the NVidia 3070 (8gig VRAM) card. Note: this card had only 8gig of VRAM, where the 3060 had 12, which would have made a better choice.
Downloaded and (via Anaconda) installed Automatic1111 web-UI.
In A1111
The main objectives:
1-to create as realistic (photo-real) characters as possible
2-create consistent characters (back to MidJourney for this)
3-have multiple characters in a scene
4-have those multiple characters be unique (different from each other)
5-have those multiple characters be reproduceable
Was able to achieve 1, 2, 3 and 4. Installed but did not invoke roop - did all Faceswapping in MidJourney.
Discovered ComfyUI, and migrated all activities to ComfyUI.
Node-based Interface ComfyUI
Objective Breakdown
I installed ComfyUI in the recommended Python3.10.6 environment which eliminated many obscure Python error messages no one seemed to have an answer to. I reinstalled A1111 the same way: bog-standard Python3.10.6 venv. All models are currently saved in the A1111 folder. As of September, 2023, have cancelled all subscriptions to Leonardo and MidJourney: roop is very straightforward and easy to accomplish in ComfyUI.
The main objectives, current as of 7th Oct, 2023, are:
1-to create as realistic (photo-real) characters as possible [DONE]
2-create persistent characters [DONE using roop]
3-have multiple characters in a scene [DONE]
4-have those multiple characters be unique and reproduceable [DONE dual roop]
5-characters will be anatomically correct (minimal deformities)
6-have those multiple characters interact (more than just staring blankly)
7-clothe the characters differently
8-develop poses / LoRA / LyCORIS etc
2 - Persistent Characters
The first characters (Charlotte, Victoria, Mabel, etc) were created for an odd little novella I was writing, one that has sort-of stalled as what I started writing seemed, with time, a bit trite and uninspired. They were ‘created’ in MidJourney and later LeonardoAI, satisfying the ‘photoreal’ requirement easily. I’ve since gone to using This Person Does Not Exist for reference faces as the features are a lot less airbrushed and produce far more realistic-looking individuals using roop. There is one carefully GIMPed real person reference image for a persona in some of my images.
3 - Multiple Characters
The length and detail of the prompt very much affects how consistently I can get two characters to appear. Too much emphasis on a detail results in one person to appear instead of two, regardless of whether I enclose the phrase ‘2 women’ in parentheses. Removing detail appears to resolve that problem.
4 - Unique, reproduceable characters
Generally speaking, the same conditions that result in only one figure appearing where two were asked for account for roop working consistently. I’ve had Comfy assigned my first face to a tertiary individual in the image (very much a background figure) and so the first character receives the second face and none is assigned to the second charcter at all:
In this image, the first face was given to the obscure person on the couch on the left of the picture, the second figure - in the far-too-short skirt - was given the second face by roop, and the third figure was whatever Comfy / SD had in mind. Note: this brings up another interesting challenge… appropriate attire. Unfortunately, prompting has little effect on the length of skirts or the decollette of the neckline.
So, whilst I do feel this objective has been accomplished, getting consistent results very much depends on shorter, concise prompts. Details may need to come from LoRAs or - newer tech for artists this October - IP Adapters.
5 - Minimal, or no Deformities
While for the most part, facial distortion appears to have been resolved - with the exception of when the faces are in close proximity to each other - hand deformities unfortunately, frustratingly persist:
Note the hands (rendered 06.Oct.23) - this is currently my main objective: to consistently render non-deformed limbs, or at least be able to fix deformities via inpainting or something like that.
And yet, in the preceding image, the hands are fine.
Update: 23.10.08
The original developer of ComfyUI’s Manager, ltdrdata, has released a video proposing an automatic fix to hand deformations. The key node-set to manage all this is contained in his Impact Pack.
6 - Character Interaction
So far (Oct 2023) no real solution has presented itself. Indeed, getting one character to show one expression and the other, something else eludes me. Even getting the characters to look at anything specific is “too-hard basket” stuff at the moment. I can get ‘selfie’-style shots galore, and, a la rigeur, two individuals ‘discussing’ while each sort of looks off into the distance, ostensibly contemplative, but even achieving that is inconsistent and really, how much need is there for that sort of pose? Getting consistent interaction remains a thorny problem. I currently can have characters holding hands and ‘embrace’ but (at least in SD1.5), a more intimate display of affection - i.e., a kiss - tends to distort the faces near the contact point. (Observed 04.Oct.23: appears to be a roop thing).
Node Sets - October Workflow
Nodes affect Objectives
23.10.08
I am discovering that, almost as much if not more, prompts - and specifically the injudicious, casual use of keywords in a prompt, will lead to undesireable outcomes in the final image. For example: I had the phrase “lace-up boots” in a prompt. this resulted in short skirts and lace-topped stockings. Removing this brought the skirt hemline down to more desireable levels and a far better image.
In a good prompt, less is more. Too many words in the negative prompt does absolutely nothing to improve the image and can adversely affect the outcome.
Challenges I have yet to resolve:
Camera distance from my subject
What the subject is looking at is consistent to the prompt
HiRes Fix
23.10.07
From Automatic1111, I really missed ‘HiRez-Fix’. Well, got that working, sort-of. I think. Open to new ideas, particularly if they work:
23.10.09
Thanks to EndangeredAI and his excellent videos (linked below), I’ve had cause to revise the Sampling section of my workflow. Mind you: what was suggested wasn’t for HiResFix per se, and it was SDXL-based, so I might look at it again
Nodal Noodles
The Efficiency Pack clears up a lot of noodle mess. It includes one LoRA and an empty latent image.
IP Adapter
23.10.13
At this juncture, I’m able to render two characters - in this case, women - with consistent faces. Despite the hand fix “module”, perhaps because of the proximity of the characters hands can still turn out a bit messy. I’m cycling through the checkpoints to see if one of them does a better job at this. At the moment (1000h) the Swizz8-BakedVAE-FP16-Pruned checkpoint appeared initially to produce the best results. However, with subsequent renders limbs were missing and the figures put themselves in impossible poses.
Animation - October Workflow
23.10.24 Anim-Begins
Animating my figures wasn’t even on the radar, for some reason I thought I’d have a go. It involved installing the AnimateDiff-Evolved node sets and a whole bunch of models. One can create GIFs or MP4s. The frame count limit is 32 unless you add the Uniform Context Options node.
23.10.27 Rooping -FlowFrames
For the video to actually be smooth, I bring the MP4s into a programme - only available for Windows, unfortunately - called “FlowFrames”. It does a decent job interpolating those missing frames, creating very viewable video and even a decent gif, by converting that GIF:
to this:
The video was 4x frame-interpolated.
The figures seem to have come to life. So far, of course, I’ve only featured one figure at a time, which may have an impact of figure accuracy. Interestingly, all anatomy issues that were produced in still image ‘renders’ have magically resolved: no extra - or missing - fingers or limbs or anything of the sort. Again, perhaps that’s due to it only being one figure.
23.12.09 AnimateDiff
I am getting out-of-memory errors all the time, now. So, AUD$749 for a 4060ti with exactly twice the memory I have now. Rendering video with purposeful movement should finally be within the scope of the possible.
ComfyUI Tips and Tricks
The “Unofficial” Sub-Reddit ComfyUI has proven an amazing resource, more than anywhere else, to be honest. Hope to collect some of the “Tips and Tricks” I pick up here.
Inserting Node-sets
To insert a node-set into a workflow:
save the node set to insert by highlighing the nodes, and [RtClick] -> ‘Save Selected as Template’
In the target workflow, [RtClick] -> ‘Node Templates’ (select the node set)
Caveat: if enclosed in a grouping box, the box is not saved into the template.
Note: templates are stored in the browser’s cache. Until recently, there was no GUI-based approach to removing unwanted templates: that has changed.