How AI preserves identity in artistic portraits: Flux Kontext Max explained
A technical yet accessible explanation of how modern generative models (Flux Kontext Max by Black Forest Labs) turn a real photograph into an artistic portrait without losing identity. Applied to digital memorials.
Until late 2024, AI-generated portraits had a serious problem for emotional use cases: they deformed the face. A son would upload a photo of his mother asking for an oil-style portrait, and the AI would return an aesthetically pleasing image with features that were no longer hers. For a memorial, that is unacceptable. In 2025, a new generation of "identity-preserving" models appeared — among them Flux Kontext Max, from Black Forest Labs — that changed the equation. This article explains how they work, where they still fail, and why they are now the backbone of serious digital memorials.
The technical problem: why was identity so hard to preserve?
Generative image models (Stable Diffusion, Midjourney, DALL·E) learn patterns of the visual world from millions of "image + caption" pairs. When a user asks for "oil portrait of a 70-year-old woman," the model generates a plausible image of a 70-year-old woman — not necessarily that woman. Without explicit "identity conditioning" mechanisms, the model treats the face as an average of faces it has seen, not as a unique face.
The first workarounds (Textual Inversion, DreamBooth, LoRA) required training the model with 20–30 photos of the specific subject and waiting hours. That was impossible for a flow where a family uploads a single photo and wants the result in minutes.
What Flux Kontext Max solved
Flux Kontext Max is part of the Flux family published by Black Forest Labs (founded by former Stability AI researchers) in 2024–2025. Key contributions for identity use cases:
- Expanded visual context: the model accepts not only text but also a reference image as input, and uses its visual embeddings (the vectors representing the face) as a strong guide.
- Flow-matching architecture: replaces classical diffusion with a process that better preserves fine structure (facial features) during the style transformation.
- Face-emphasized training data: the training dataset has curated over-representation of faces — humans and pets — across poses, styles, and ethnicities.
- Fast inference: a portrait takes 6–15 seconds on modern hardware (A100/H100), not hours. This makes interactive flows possible where the user requests 3 styles and sees them in under a minute.
How we measure "identity preservation"
It is not subjective. The industry uses three metrics:
- Face embedding cosine similarity: compute the facial vector (using models like ArcFace or FaceNet) for the original photo and the generated image. A 0.65+ score means "same person recognizable"; 0.75+ means "reliable recognition by close humans."
- Human evaluation: close family members view the result and rate "is this person recognizable?" on a 1–5 scale. Target: average ≥ 4.
- Landmark deviation: compare key facial landmarks (eyes, nose tip, mouth corners) between source and output. Average deviation < 3% of face width.
In internal testing, Flux Kontext Max reaches an average cosine similarity of 0.78 on human portraits and 0.71 on pet portraits (a little harder due to breed variability). Average human evaluation: 4.3/5 for humans, 4.1/5 for pets.
Which artistic styles work best
Not all styles preserve identity equally. Those that work best for memorials:
| Style | Identity preservation | Emotional tone |
|---|---|---|
| Classical oil | Very high (0.80+) | Solemn, timeless |
| Soft watercolor | High (0.75) | Luminous, tender |
| Editorial gold | High (0.74) | Ceremonial |
| Pastel illustration | Medium (0.68) | Nostalgic |
| Cartoon / comic | Low (0.55) | Childlike — inappropriate for most memorials |
| Abstract geometric | Very low (0.42) | Not recommended for memorials |
Historias Infinitas defaults to classical oil, soft watercolor, and editorial gold because they strike the best balance between fidelity and dignity. The user picks one or requests all three and decides.
Infrastructure: how it is orchestrated in production
The technical flow when creating an AI portrait on a memorial:
- The client uploads one or more photos to Supabase Storage (TLS in transit + Row-Level Security).
- A Next.js backend validates the content (no explicit imagery, minimum 768×768 px).
- The system calls Replicate — a platform that hosts open models — with a structured prompt: "preserve identity of subject, render as [style], warm cinematic light, dignified composition."
- Replicate runs Flux Kontext Max on an H100 GPU and returns the image in ~8 seconds.
- The output is stored in Supabase Storage with a unique hash and shown to the user in under a minute.
- The user picks a favorite and downloads the high-resolution file (2048×2048) without watermark.
Honest current limits
- Very small or blurry photos (less than 512 px on the face) produce less faithful results. We always ask for the highest-resolution photo available.
- Photos with dark sunglasses or scarves covering features reduce preservation. A photo where the eyes are fully visible works best.
- Unusual pet breeds (e.g. Mexican hairless dogs, exotic birds) may need 2–3 attempts to capture distinctive traits.
- "Enhanced photorealistic" style is deliberately not offered — it drifts into deep-fake territory, which is not what a memorial should deliver.
Ethics: the pact we sign
Using generative AI with faces has implications. Our public commitments:
- We never train our model on customer-uploaded photos. Flux Kontext Max is the base model, with no per-customer fine-tuning.
- Original photos never leave the customer's infrastructure + Replicate (both encrypted in transit and at rest).
- The generated portrait belongs to the customer. We can only showcase it with explicit consent.
- We do not generate portraits of living people without authorization — the flow only supports memorials (authorization coming from the content holder, the family).
- If a future model offers more faithful and safer results, we evaluate it transparently.
What's next in 2026–2027
Next-generation models (Flux Pro Ultra, Google Imagen 4, OpenAI's next gpt-image) are working on three fronts: even higher identity preservation (cosine similarity > 0.90), animating static portraits ("photo that blinks and smiles" — useful for AR memorials), and voice synthesis from a short audio sample (a more delicate ethical use case). We monitor every release but only integrate what meets the identity, privacy, and dignity criteria outlined here.