Episode 4 — Generative AI Engineering / 4.18 — Building a Simple Multi Agent Workflow

4.18.c — ImageKit Direction: SEO Pipeline

In one sentence: This section builds a complete image processing multi-agent workflow where three specialized agents — Image Metadata Extractor, SEO Optimizer, and Tag Categorizer — work in sequence to transform raw image descriptions into SEO-optimized titles, descriptions, alt text, and categorized tags, all validated with Zod schemas at every step.

Navigation: ← 4.18.b — Hinge Direction: Profile Pipeline · 4.18.d — Validation and Error Handling →


1. The ImageKit SEO Pipeline — Overview

Imagine you are building a feature for an image management platform (like ImageKit) that automatically generates SEO metadata for uploaded images. A photographer uploads thousands of images — each one needs titles, descriptions, alt text, and categorized tags optimized for search engines.

A single agent doing all of this produces inconsistent results. Instead, we decompose:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    IMAGEKIT SEO PIPELINE                                      │
│                                                                               │
│  ┌──────────┐    ┌───────────────┐    ┌───────────────┐    ┌──────────────┐ │
│  │  Image    │───►│  AGENT 1       │───►│  AGENT 2       │───►│  AGENT 3      │ │
│  │  Metadata │    │  Metadata      │    │  SEO           │    │  Tag          │ │
│  │  (raw)    │    │  Extractor     │    │  Optimizer     │    │  Categorizer  │ │
│  │           │    │                │    │                │    │               │ │
│  └──────────┘    └───────┬───────┘    └───────┬───────┘    └──────┬───────┘ │
│                          │                     │                    │          │
│                    Zod validates          Zod validates       Zod validates    │
│                    extraction            SEO content         categorized tags  │
│                          │                     │                    │          │
│                          ▼                     ▼                    ▼          │
│                    Subject, scene,       Title, description,  Technical tags,  │
│                    colors, mood,         alt text, keywords,  subject tags,    │
│                    technical details     meta description     mood tags, etc.  │
│                                                                               │
│  FINAL OUTPUT: Structured JSON with metadata + SEO content + categorized tags │
└─────────────────────────────────────────────────────────────────────────────┘

Agent Responsibilities

AgentResponsibilityInputOutput
Agent 1: Metadata ExtractorAnalyze image description/metadata to identify subjects, scene, colors, mood, technical detailsRaw image data (description, filename, dimensions, EXIF data)Extracted metadata: subjects[], scene description, colors[], mood, technical details
Agent 2: SEO OptimizerGenerate SEO-optimized titles, descriptions, alt text, and keywordsExtracted metadata from Agent 1SEO title, meta description, alt text, keywords[], SEO score
Agent 3: Tag CategorizerGenerate and categorize tags into structured groupsExtracted metadata + SEO keywordsCategorized tags: technical[], subject[], mood[], color[], style[], use-case[]

2. Step 1 — Define the Input Schema

The input represents raw image data — what we know about the image before any AI processing:

import { z } from "zod";

// Input: Raw image information
const ImageInputSchema = z.object({
  filename: z.string().min(1),
  description: z.string().min(5),
  dimensions: z.object({
    width: z.number().positive(),
    height: z.number().positive(),
  }),
  fileSize: z.string().optional(),
  format: z.enum(["jpg", "jpeg", "png", "webp", "gif", "svg", "tiff", "raw"]).optional(),
  exifData: z.object({
    camera: z.string().optional(),
    lens: z.string().optional(),
    focalLength: z.string().optional(),
    aperture: z.string().optional(),
    shutterSpeed: z.string().optional(),
    iso: z.number().optional(),
    dateTaken: z.string().optional(),
    location: z.string().optional(),
  }).optional(),
  uploaderNotes: z.string().optional(),
  existingTags: z.array(z.string()).optional(),
});

Example Input

const sampleImage = {
  filename: "golden-gate-sunset-2024.jpg",
  description: "A wide-angle photograph of the Golden Gate Bridge during sunset. The sky is painted in vibrant shades of orange, pink, and purple. The bridge is silhouetted against the colorful sky. In the foreground, there are rocky shores with gentle waves. A few sailboats are visible in the bay. The overall mood is serene and majestic.",
  dimensions: { width: 6000, height: 4000 },
  fileSize: "18.2 MB",
  format: "jpg",
  exifData: {
    camera: "Sony A7R IV",
    lens: "Sony 16-35mm f/2.8 GM",
    focalLength: "24mm",
    aperture: "f/11",
    shutterSpeed: "1/125s",
    iso: 100,
    dateTaken: "2024-09-15T19:32:00Z",
    location: "San Francisco, CA, USA",
  },
  uploaderNotes: "Shot from Battery Spencer viewpoint. Best seller in my landscape collection.",
  existingTags: ["bridge", "sunset"],
};

3. Step 2 — Agent 1: Image Metadata Extractor

Output Schema

const ExtractedMetadataSchema = z.object({
  subjects: z.array(z.object({
    name: z.string(),
    prominence: z.enum(["primary", "secondary", "background"]),
    description: z.string().min(5),
  })).min(1),
  scene: z.object({
    type: z.enum([
      "landscape", "portrait", "urban", "nature", "architecture",
      "food", "product", "abstract", "event", "macro", "aerial", "underwater"
    ]),
    setting: z.string().min(5),
    timeOfDay: z.enum(["dawn", "morning", "midday", "afternoon", "golden-hour", "sunset", "twilight", "night"]),
    season: z.enum(["spring", "summer", "autumn", "winter", "unknown"]).optional(),
    weather: z.string().optional(),
  }),
  colors: z.object({
    dominant: z.array(z.string()).min(1).max(5),
    palette: z.enum(["warm", "cool", "neutral", "vibrant", "muted", "monochrome", "pastel"]),
    contrast: z.enum(["low", "medium", "high"]),
  }),
  mood: z.object({
    primary: z.string(),
    secondary: z.string().optional(),
    emotionalImpact: z.enum(["calming", "energizing", "inspiring", "dramatic", "nostalgic", "joyful", "mysterious", "romantic"]),
  }),
  technicalAnalysis: z.object({
    composition: z.string(),
    lighting: z.string(),
    depthOfField: z.enum(["shallow", "medium", "deep"]),
    quality: z.enum(["amateur", "semi-pro", "professional", "editorial"]),
  }),
  contentSummary: z.string().min(20),
});

System Prompt

const METADATA_EXTRACTOR_PROMPT = `You are an expert image analyst and metadata extractor. Your job is to analyze an image's description and technical data to extract structured metadata about what the image contains.

RULES:
1. Identify ALL subjects in the image (primary, secondary, background).
2. Determine the scene type, setting, time of day, and season.
3. Analyze the color palette — dominant colors, overall palette feel, contrast level.
4. Assess the mood and emotional impact.
5. Evaluate technical aspects: composition technique, lighting, depth of field, quality level.
6. Write a comprehensive content summary.
7. Use EXIF data when available to inform technical analysis.
8. Be specific and detailed — generic descriptions are useless for SEO.

Respond with ONLY valid JSON:
{
  "subjects": [{ "name": "...", "prominence": "primary|secondary|background", "description": "..." }],
  "scene": { "type": "landscape|portrait|urban|nature|...", "setting": "...", "timeOfDay": "dawn|morning|...|night", "season": "..." },
  "colors": { "dominant": ["..."], "palette": "warm|cool|...", "contrast": "low|medium|high" },
  "mood": { "primary": "...", "secondary": "...", "emotionalImpact": "calming|energizing|..." },
  "technicalAnalysis": { "composition": "...", "lighting": "...", "depthOfField": "shallow|medium|deep", "quality": "amateur|...|editorial" },
  "contentSummary": "..."
}`;

Agent 1 Implementation

import OpenAI from "openai";

const client = new OpenAI();

async function callAgent(name, systemPrompt, input, schema, temperature = 0.7) {
  console.log(`\n--- ${name} ---`);

  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature,
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: JSON.stringify(input) },
    ],
  });

  const raw = response.choices[0].message.content;
  if (!raw) throw new Error(`${name} returned empty response`);

  let parsed;
  try {
    parsed = JSON.parse(raw);
  } catch {
    const match = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (match) {
      parsed = JSON.parse(match[1].trim());
    } else {
      throw new Error(`${name} returned invalid JSON: ${raw.substring(0, 300)}`);
    }
  }

  const validated = schema.parse(parsed);
  console.log(`${name}: validated successfully`);
  return validated;
}

4. Step 3 — Agent 2: SEO Optimizer

Output Schema

const SEOContentSchema = z.object({
  title: z.object({
    primary: z.string().min(10).max(70),
    alternatives: z.array(z.string().min(10).max(70)).min(2).max(5),
    strategy: z.string(),
  }),
  description: z.object({
    meta: z.string().min(50).max(160),
    long: z.string().min(100).max(500),
    strategy: z.string(),
  }),
  altText: z.object({
    primary: z.string().min(20).max(125),
    decorative: z.string().min(20).max(125),
    informative: z.string().min(20).max(200),
  }),
  keywords: z.object({
    primary: z.array(z.string()).min(3).max(5),
    secondary: z.array(z.string()).min(3).max(10),
    longTail: z.array(z.string()).min(2).max(5),
  }),
  seoScore: z.object({
    overall: z.number().min(1).max(100),
    titleOptimization: z.number().min(1).max(100),
    descriptionOptimization: z.number().min(1).max(100),
    keywordRelevance: z.number().min(1).max(100),
    recommendations: z.array(z.string()),
  }),
  searchIntent: z.array(z.enum([
    "informational", "commercial", "navigational",
    "transactional", "inspirational", "educational"
  ])).min(1),
});

System Prompt

const SEO_OPTIMIZER_PROMPT = `You are an expert SEO specialist for image content. Your job is to generate SEO-optimized metadata for images based on extracted metadata.

RULES:
1. TITLE: Write compelling, keyword-rich titles. Primary title under 70 characters. Include the main subject and differentiator.
2. DESCRIPTION: Meta description 50-160 chars (for search results). Long description 100-500 chars (for image pages).
3. ALT TEXT: Write three versions:
   - Primary: Concise, descriptive (under 125 chars)
   - Decorative: For when image is supplementary
   - Informative: Detailed for accessibility (under 200 chars)
4. KEYWORDS: Identify primary (3-5 high-volume), secondary (3-10 related), and long-tail keywords (2-5 specific phrases).
5. SEO SCORE: Rate the optimization quality across dimensions.
6. SEARCH INTENT: What would someone searching for this image be looking for?
7. Use natural language — no keyword stuffing.
8. Consider what stock photo buyers, bloggers, and designers would search for.

Respond with ONLY valid JSON:
{
  "title": { "primary": "...", "alternatives": ["..."], "strategy": "..." },
  "description": { "meta": "...", "long": "...", "strategy": "..." },
  "altText": { "primary": "...", "decorative": "...", "informative": "..." },
  "keywords": { "primary": ["..."], "secondary": ["..."], "longTail": ["..."] },
  "seoScore": { "overall": 1-100, "titleOptimization": 1-100, "descriptionOptimization": 1-100, "keywordRelevance": 1-100, "recommendations": ["..."] },
  "searchIntent": ["informational|commercial|navigational|transactional|inspirational|educational"]
}`;

5. Step 4 — Agent 3: Tag Categorizer

Output Schema

const CategorizedTagsSchema = z.object({
  tags: z.object({
    technical: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    subject: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    mood: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    color: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    style: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    useCase: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
  }),
  flatTags: z.array(z.string()).min(5),
  tagCount: z.number(),
  primaryCategory: z.string(),
  suggestedCollections: z.array(z.object({
    name: z.string(),
    description: z.string(),
    matchScore: z.number().min(0).max(1),
  })).min(1).max(5),
  commercialViability: z.object({
    score: z.number().min(1).max(10),
    targetMarkets: z.array(z.string()),
    licensingRecommendation: z.enum(["editorial", "commercial", "both"]),
  }),
});

System Prompt

const TAG_CATEGORIZER_PROMPT = `You are an expert image tag curator and categorizer for a stock photo platform. Your job is to generate comprehensive, categorized tags for images.

RULES:
1. Generate tags in SIX categories:
   - technical: camera, lens, technique, composition tags
   - subject: what is in the image
   - mood: emotional, atmospheric tags
   - color: color-related tags
   - style: artistic style, photography genre
   - useCase: what this image could be used for
2. Each tag has a confidence score (0-1).
3. Include a flat list of ALL tags for simple tag fields.
4. Identify the primary category for the image.
5. Suggest collections/albums this image would fit in.
6. Assess commercial viability: score, target markets, licensing type.
7. Tags should be lowercase, no special characters.
8. Include both specific and broad tags for maximum discoverability.

Respond with ONLY valid JSON:
{
  "tags": {
    "technical": [{ "tag": "...", "confidence": 0-1 }],
    "subject": [{ "tag": "...", "confidence": 0-1 }],
    "mood": [{ "tag": "...", "confidence": 0-1 }],
    "color": [{ "tag": "...", "confidence": 0-1 }],
    "style": [{ "tag": "...", "confidence": 0-1 }],
    "useCase": [{ "tag": "...", "confidence": 0-1 }]
  },
  "flatTags": ["tag1", "tag2", ...],
  "tagCount": number,
  "primaryCategory": "...",
  "suggestedCollections": [{ "name": "...", "description": "...", "matchScore": 0-1 }],
  "commercialViability": { "score": 1-10, "targetMarkets": ["..."], "licensingRecommendation": "editorial|commercial|both" }
}`;

6. Step 5 — The Complete Pipeline

Final Output Schema

const ImageKitPipelineOutputSchema = z.object({
  originalImage: ImageInputSchema,
  extractedMetadata: ExtractedMetadataSchema,
  seoContent: SEOContentSchema,
  categorizedTags: CategorizedTagsSchema,
  pipelineMetadata: z.object({
    totalDuration: z.number(),
    agentCount: z.number(),
    timestamp: z.string(),
    imageId: z.string(),
  }),
});

End-to-End Pipeline

async function runImageKitSEOPipeline(rawImage) {
  console.log("\n╔══════════════════════════════════════════════════╗");
  console.log("║   IMAGEKIT SEO PIPELINE — START                 ║");
  console.log("╚══════════════════════════════════════════════════╝");

  const start = Date.now();

  // Validate input
  const image = ImageInputSchema.parse(rawImage);
  console.log(`\nInput validated: ${image.filename} (${image.dimensions.width}x${image.dimensions.height})`);

  // Agent 1: Extract metadata
  const metadata = await callAgent(
    "Image Metadata Extractor",
    METADATA_EXTRACTOR_PROMPT,
    image,
    ExtractedMetadataSchema,
    0.5  // Low temperature for analytical extraction
  );

  // Agent 2: Generate SEO content (needs extracted metadata)
  const seoContent = await callAgent(
    "SEO Optimizer",
    SEO_OPTIMIZER_PROMPT,
    {
      extractedMetadata: metadata,
      filename: image.filename,
      dimensions: image.dimensions,
      format: image.format,
    },
    SEOContentSchema,
    0.7  // Medium temperature for creative but accurate SEO
  );

  // Agent 3: Categorize tags (needs metadata + SEO keywords)
  const categorizedTags = await callAgent(
    "Tag Categorizer",
    TAG_CATEGORIZER_PROMPT,
    {
      extractedMetadata: metadata,
      seoKeywords: seoContent.keywords,
      subjects: metadata.subjects,
      scene: metadata.scene,
      mood: metadata.mood,
      colors: metadata.colors,
      existingTags: image.existingTags,
    },
    CategorizedTagsSchema,
    0.6  // Moderate temperature for comprehensive but consistent tags
  );

  const totalDuration = Date.now() - start;

  // Generate a simple image ID from filename
  const imageId = image.filename.replace(/\.[^.]+$/, "").replace(/[^a-z0-9]/gi, "-").toLowerCase();

  const finalOutput = ImageKitPipelineOutputSchema.parse({
    originalImage: image,
    extractedMetadata: metadata,
    seoContent,
    categorizedTags,
    pipelineMetadata: {
      totalDuration,
      agentCount: 3,
      timestamp: new Date().toISOString(),
      imageId,
    },
  });

  console.log("\n╔══════════════════════════════════════════════════╗");
  console.log("║   IMAGEKIT SEO PIPELINE — COMPLETE              ║");
  console.log(`║   Total time: ${totalDuration}ms                          ║`);
  console.log("╚══════════════════════════════════════════════════╝");

  return finalOutput;
}

7. Understanding the Data Flow

INPUT (ImageInput)
  │
  │  filename: "golden-gate-sunset-2024.jpg"
  │  description: "A wide-angle photograph of the Golden Gate Bridge..."
  │  dimensions: { width: 6000, height: 4000 }
  │  exifData: { camera: "Sony A7R IV", aperture: "f/11", ... }
  │
  ▼
AGENT 1: Image Metadata Extractor
  │
  │  OUTPUT (ExtractedMetadata):
  │  {
  │    subjects: [
  │      { name: "Golden Gate Bridge", prominence: "primary", description: "..." },
  │      { name: "sailboats", prominence: "secondary", description: "..." },
  │      { name: "rocky shore", prominence: "background", description: "..." }
  │    ],
  │    scene: { type: "landscape", setting: "Coastal bay with iconic bridge",
  │             timeOfDay: "sunset", season: "autumn" },
  │    colors: { dominant: ["orange", "pink", "purple", "dark red"],
  │              palette: "warm", contrast: "high" },
  │    mood: { primary: "serene", secondary: "majestic",
  │            emotionalImpact: "inspiring" },
  │    technicalAnalysis: { composition: "Rule of thirds with bridge",
  │                         lighting: "Golden hour backlit",
  │                         depthOfField: "deep", quality: "professional" },
  │    contentSummary: "Professional landscape photograph of the Golden Gate..."
  │  }
  │
  ▼
AGENT 2: SEO Optimizer
  │
  │  RECEIVES: { extractedMetadata, filename, dimensions, format }
  │
  │  OUTPUT (SEOContent):
  │  {
  │    title: { primary: "Golden Gate Bridge Sunset — Vibrant Landscape Photo",
  │             alternatives: [...], strategy: "..." },
  │    description: { meta: "Stunning sunset over the Golden Gate Bridge...",
  │                   long: "Professional wide-angle photograph capturing...",
  │                   strategy: "..." },
  │    altText: { primary: "Golden Gate Bridge silhouetted against orange sunset sky",
  │              decorative: "Sunset over San Francisco Bay",
  │              informative: "Wide-angle photograph of the Golden Gate Bridge..." },
  │    keywords: {
  │      primary: ["golden gate bridge", "sunset photography", "san francisco"],
  │      secondary: ["landscape", "bay area", "california", "bridge photography"],
  │      longTail: ["golden gate bridge sunset photo", "san francisco landscape"]
  │    },
  │    seoScore: { overall: 87, ... },
  │    searchIntent: ["inspirational", "commercial"]
  │  }
  │
  ▼
AGENT 3: Tag Categorizer
  │
  │  RECEIVES: { extractedMetadata, seoKeywords, subjects, scene, mood, colors }
  │
  │  OUTPUT (CategorizedTags):
  │  {
  │    tags: {
  │      technical: [{ tag: "wide-angle", confidence: 0.95 }, ...],
  │      subject: [{ tag: "golden gate bridge", confidence: 0.99 }, ...],
  │      mood: [{ tag: "serene", confidence: 0.92 }, ...],
  │      color: [{ tag: "orange sunset", confidence: 0.95 }, ...],
  │      style: [{ tag: "landscape photography", confidence: 0.97 }, ...],
  │      useCase: [{ tag: "wall art", confidence: 0.88 }, ...]
  │    },
  │    flatTags: ["golden gate bridge", "sunset", "landscape", ...],
  │    tagCount: 35,
  │    primaryCategory: "Landscape Photography",
  │    suggestedCollections: [
  │      { name: "San Francisco Icons", matchScore: 0.95 },
  │      { name: "Sunset Collection", matchScore: 0.90 }
  │    ],
  │    commercialViability: { score: 9, targetMarkets: ["travel", "print"],
  │                           licensingRecommendation: "both" }
  │  }
  │
  ▼
FINAL OUTPUT (ImageKitPipelineOutput)
  {
    originalImage: { ... },
    extractedMetadata: { ... },
    seoContent: { ... },
    categorizedTags: { ... },
    pipelineMetadata: { totalDuration: 7823, agentCount: 3, ... }
  }

8. Temperature Strategy for Image Processing

Different agents need different levels of creativity vs. consistency:

┌──────────────────────────────────────────────────────────────────────┐
│                TEMPERATURE STRATEGY                                   │
│                                                                       │
│  Agent 1 — Metadata Extractor:  temperature = 0.5                     │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  WHY LOW: Extraction is analytical. The bridge is a bridge,     │ │
│  │  the sky is orange. We want factual consistency, not creative    │ │
│  │  interpretation. Two runs should produce similar results.        │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                       │
│  Agent 2 — SEO Optimizer:  temperature = 0.7                          │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  WHY MEDIUM: SEO titles and descriptions need to be creative     │ │
│  │  AND accurate. "Golden Gate Bridge Sunset" is factual but we     │ │
│  │  want engaging variations. Some creativity, but grounded in data.│ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                       │
│  Agent 3 — Tag Categorizer:  temperature = 0.6                        │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  WHY MODERATE: Tags need to be comprehensive (some creativity    │ │
│  │  to think of related tags) but consistent (same image should     │ │
│  │  get similar tags on repeat runs). Balance between coverage      │ │
│  │  and reliability.                                                │ │
│  └─────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

9. Processing Multiple Images (Batch Pipeline)

In a real image platform, you process hundreds or thousands of images. Here's how to batch the pipeline:

async function batchProcessImages(images, concurrency = 3) {
  console.log(`\nBatch processing ${images.length} images (concurrency: ${concurrency})`);

  const results = [];
  const errors = [];

  // Process in batches of `concurrency` size
  for (let i = 0; i < images.length; i += concurrency) {
    const batch = images.slice(i, i + concurrency);
    console.log(`\nBatch ${Math.floor(i / concurrency) + 1}: processing ${batch.length} images`);

    const batchResults = await Promise.allSettled(
      batch.map(image => runImageKitSEOPipeline(image))
    );

    batchResults.forEach((result, index) => {
      const imageIndex = i + index;
      if (result.status === "fulfilled") {
        results.push(result.value);
        console.log(`  Image ${imageIndex + 1}: SUCCESS`);
      } else {
        errors.push({
          imageIndex,
          filename: batch[index].filename,
          error: result.reason.message,
        });
        console.log(`  Image ${imageIndex + 1}: FAILED — ${result.reason.message}`);
      }
    });
  }

  return {
    successful: results,
    failed: errors,
    summary: {
      total: images.length,
      succeeded: results.length,
      failed: errors.length,
    },
  };
}

// Usage
const images = [sampleImage, anotherImage, yetAnotherImage];
const batchResult = await batchProcessImages(images, 2);
console.log(`Processed: ${batchResult.summary.succeeded}/${batchResult.summary.total}`);

10. Complete Code — Copy and Run

// imagekit-seo-pipeline.js
// Complete multi-agent pipeline for image SEO optimization
// Requires: npm install openai zod

import { z } from "zod";
import OpenAI from "openai";

const client = new OpenAI();

// ═══════════════════════════════════════════════════════
// SCHEMAS
// ═══════════════════════════════════════════════════════

const ImageInputSchema = z.object({
  filename: z.string().min(1),
  description: z.string().min(5),
  dimensions: z.object({
    width: z.number().positive(),
    height: z.number().positive(),
  }),
  fileSize: z.string().optional(),
  format: z.enum(["jpg", "jpeg", "png", "webp", "gif", "svg", "tiff", "raw"]).optional(),
  exifData: z.object({
    camera: z.string().optional(),
    lens: z.string().optional(),
    focalLength: z.string().optional(),
    aperture: z.string().optional(),
    shutterSpeed: z.string().optional(),
    iso: z.number().optional(),
    dateTaken: z.string().optional(),
    location: z.string().optional(),
  }).optional(),
  uploaderNotes: z.string().optional(),
  existingTags: z.array(z.string()).optional(),
});

const ExtractedMetadataSchema = z.object({
  subjects: z.array(z.object({
    name: z.string(),
    prominence: z.enum(["primary", "secondary", "background"]),
    description: z.string().min(5),
  })).min(1),
  scene: z.object({
    type: z.enum([
      "landscape", "portrait", "urban", "nature", "architecture",
      "food", "product", "abstract", "event", "macro", "aerial", "underwater"
    ]),
    setting: z.string().min(5),
    timeOfDay: z.enum(["dawn", "morning", "midday", "afternoon", "golden-hour", "sunset", "twilight", "night"]),
    season: z.enum(["spring", "summer", "autumn", "winter", "unknown"]).optional(),
    weather: z.string().optional(),
  }),
  colors: z.object({
    dominant: z.array(z.string()).min(1).max(5),
    palette: z.enum(["warm", "cool", "neutral", "vibrant", "muted", "monochrome", "pastel"]),
    contrast: z.enum(["low", "medium", "high"]),
  }),
  mood: z.object({
    primary: z.string(),
    secondary: z.string().optional(),
    emotionalImpact: z.enum(["calming", "energizing", "inspiring", "dramatic", "nostalgic", "joyful", "mysterious", "romantic"]),
  }),
  technicalAnalysis: z.object({
    composition: z.string(),
    lighting: z.string(),
    depthOfField: z.enum(["shallow", "medium", "deep"]),
    quality: z.enum(["amateur", "semi-pro", "professional", "editorial"]),
  }),
  contentSummary: z.string().min(20),
});

const SEOContentSchema = z.object({
  title: z.object({
    primary: z.string().min(10).max(70),
    alternatives: z.array(z.string().min(10).max(70)).min(2).max(5),
    strategy: z.string(),
  }),
  description: z.object({
    meta: z.string().min(50).max(160),
    long: z.string().min(100).max(500),
    strategy: z.string(),
  }),
  altText: z.object({
    primary: z.string().min(20).max(125),
    decorative: z.string().min(20).max(125),
    informative: z.string().min(20).max(200),
  }),
  keywords: z.object({
    primary: z.array(z.string()).min(3).max(5),
    secondary: z.array(z.string()).min(3).max(10),
    longTail: z.array(z.string()).min(2).max(5),
  }),
  seoScore: z.object({
    overall: z.number().min(1).max(100),
    titleOptimization: z.number().min(1).max(100),
    descriptionOptimization: z.number().min(1).max(100),
    keywordRelevance: z.number().min(1).max(100),
    recommendations: z.array(z.string()),
  }),
  searchIntent: z.array(z.enum([
    "informational", "commercial", "navigational",
    "transactional", "inspirational", "educational"
  ])).min(1),
});

const CategorizedTagsSchema = z.object({
  tags: z.object({
    technical: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    subject: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    mood: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    color: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    style: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
    useCase: z.array(z.object({
      tag: z.string(),
      confidence: z.number().min(0).max(1),
    })).min(1),
  }),
  flatTags: z.array(z.string()).min(5),
  tagCount: z.number(),
  primaryCategory: z.string(),
  suggestedCollections: z.array(z.object({
    name: z.string(),
    description: z.string(),
    matchScore: z.number().min(0).max(1),
  })).min(1).max(5),
  commercialViability: z.object({
    score: z.number().min(1).max(10),
    targetMarkets: z.array(z.string()),
    licensingRecommendation: z.enum(["editorial", "commercial", "both"]),
  }),
});

const ImageKitPipelineOutputSchema = z.object({
  originalImage: ImageInputSchema,
  extractedMetadata: ExtractedMetadataSchema,
  seoContent: SEOContentSchema,
  categorizedTags: CategorizedTagsSchema,
  pipelineMetadata: z.object({
    totalDuration: z.number(),
    agentCount: z.number(),
    timestamp: z.string(),
    imageId: z.string(),
  }),
});

// ═══════════════════════════════════════════════════════
// SYSTEM PROMPTS
// ═══════════════════════════════════════════════════════

const METADATA_EXTRACTOR_PROMPT = `You are an expert image analyst. Analyze the image description and EXIF data to extract structured metadata.

RULES:
1. Identify ALL subjects (primary, secondary, background).
2. Determine scene type, setting, time of day.
3. Analyze color palette, contrast.
4. Assess mood and emotional impact.
5. Evaluate composition, lighting, depth of field, quality.
6. Write a detailed content summary.

Respond with ONLY valid JSON:
{"subjects":[{"name":"...","prominence":"primary|secondary|background","description":"..."}],"scene":{"type":"landscape|portrait|urban|nature|architecture|food|product|abstract|event|macro|aerial|underwater","setting":"...","timeOfDay":"dawn|morning|midday|afternoon|golden-hour|sunset|twilight|night","season":"spring|summer|autumn|winter|unknown"},"colors":{"dominant":["..."],"palette":"warm|cool|neutral|vibrant|muted|monochrome|pastel","contrast":"low|medium|high"},"mood":{"primary":"...","secondary":"...","emotionalImpact":"calming|energizing|inspiring|dramatic|nostalgic|joyful|mysterious|romantic"},"technicalAnalysis":{"composition":"...","lighting":"...","depthOfField":"shallow|medium|deep","quality":"amateur|semi-pro|professional|editorial"},"contentSummary":"..."}`;

const SEO_OPTIMIZER_PROMPT = `You are an expert SEO specialist for image content. Generate SEO-optimized metadata.

RULES:
1. Title: Compelling, keyword-rich, under 70 chars. Provide alternatives.
2. Meta description: 50-160 chars. Long description: 100-500 chars.
3. Alt text: Primary (under 125 chars), decorative, informative (under 200 chars).
4. Keywords: primary (3-5), secondary (3-10), long-tail (2-5).
5. Score optimization quality.
6. Natural language, no keyword stuffing.

Respond with ONLY valid JSON:
{"title":{"primary":"...","alternatives":["..."],"strategy":"..."},"description":{"meta":"...","long":"...","strategy":"..."},"altText":{"primary":"...","decorative":"...","informative":"..."},"keywords":{"primary":["..."],"secondary":["..."],"longTail":["..."]},"seoScore":{"overall":1,"titleOptimization":1,"descriptionOptimization":1,"keywordRelevance":1,"recommendations":["..."]},"searchIntent":["informational|commercial|navigational|transactional|inspirational|educational"]}`;

const TAG_CATEGORIZER_PROMPT = `You are an expert image tag curator for a stock photo platform. Generate comprehensive categorized tags.

RULES:
1. Six categories: technical, subject, mood, color, style, useCase.
2. Each tag has a confidence score 0-1.
3. Include flat tag list for simple fields.
4. Identify primary category.
5. Suggest collections this image fits.
6. Assess commercial viability.
7. Tags lowercase, no special characters.

Respond with ONLY valid JSON:
{"tags":{"technical":[{"tag":"...","confidence":0.9}],"subject":[{"tag":"...","confidence":0.9}],"mood":[{"tag":"...","confidence":0.9}],"color":[{"tag":"...","confidence":0.9}],"style":[{"tag":"...","confidence":0.9}],"useCase":[{"tag":"...","confidence":0.9}]},"flatTags":["..."],"tagCount":0,"primaryCategory":"...","suggestedCollections":[{"name":"...","description":"...","matchScore":0.9}],"commercialViability":{"score":1,"targetMarkets":["..."],"licensingRecommendation":"editorial|commercial|both"}}`;

// ═══════════════════════════════════════════════════════
// AGENT RUNNER
// ═══════════════════════════════════════════════════════

async function callAgent(name, systemPrompt, input, schema, temperature = 0.7) {
  console.log(`\n--- ${name} ---`);

  const response = await client.chat.completions.create({
    model: "gpt-4o",
    temperature,
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: JSON.stringify(input) },
    ],
  });

  const raw = response.choices[0].message.content;
  if (!raw) throw new Error(`${name} returned empty response`);

  let parsed;
  try {
    parsed = JSON.parse(raw);
  } catch {
    const match = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
    if (match) {
      parsed = JSON.parse(match[1].trim());
    } else {
      throw new Error(`${name} returned invalid JSON: ${raw.substring(0, 300)}`);
    }
  }

  const validated = schema.parse(parsed);
  console.log(`${name}: validated successfully`);
  return validated;
}

// ═══════════════════════════════════════════════════════
// PIPELINE
// ═══════════════════════════════════════════════════════

async function runImageKitSEOPipeline(rawImage) {
  const start = Date.now();
  const image = ImageInputSchema.parse(rawImage);

  // Agent 1: Extract metadata
  const metadata = await callAgent(
    "Image Metadata Extractor", METADATA_EXTRACTOR_PROMPT,
    image, ExtractedMetadataSchema, 0.5
  );

  // Agent 2: SEO optimization
  const seoContent = await callAgent(
    "SEO Optimizer", SEO_OPTIMIZER_PROMPT,
    { extractedMetadata: metadata, filename: image.filename,
      dimensions: image.dimensions, format: image.format },
    SEOContentSchema, 0.7
  );

  // Agent 3: Tag categorization
  const categorizedTags = await callAgent(
    "Tag Categorizer", TAG_CATEGORIZER_PROMPT,
    { extractedMetadata: metadata, seoKeywords: seoContent.keywords,
      subjects: metadata.subjects, scene: metadata.scene,
      mood: metadata.mood, colors: metadata.colors,
      existingTags: image.existingTags },
    CategorizedTagsSchema, 0.6
  );

  const totalDuration = Date.now() - start;
  const imageId = image.filename.replace(/\.[^.]+$/, "").replace(/[^a-z0-9]/gi, "-").toLowerCase();

  return ImageKitPipelineOutputSchema.parse({
    originalImage: image,
    extractedMetadata: metadata,
    seoContent,
    categorizedTags,
    pipelineMetadata: { totalDuration, agentCount: 3,
      timestamp: new Date().toISOString(), imageId },
  });
}

// ═══════════════════════════════════════════════════════
// MAIN
// ═══════════════════════════════════════════════════════

const sampleImage = {
  filename: "golden-gate-sunset-2024.jpg",
  description: "A wide-angle photograph of the Golden Gate Bridge during sunset. The sky is painted in vibrant shades of orange, pink, and purple. The bridge is silhouetted against the colorful sky. In the foreground, there are rocky shores with gentle waves. A few sailboats are visible in the bay. The overall mood is serene and majestic.",
  dimensions: { width: 6000, height: 4000 },
  fileSize: "18.2 MB",
  format: "jpg",
  exifData: {
    camera: "Sony A7R IV",
    lens: "Sony 16-35mm f/2.8 GM",
    focalLength: "24mm",
    aperture: "f/11",
    shutterSpeed: "1/125s",
    iso: 100,
    dateTaken: "2024-09-15T19:32:00Z",
    location: "San Francisco, CA, USA",
  },
  uploaderNotes: "Shot from Battery Spencer viewpoint. Best seller in my landscape collection.",
  existingTags: ["bridge", "sunset"],
};

runImageKitSEOPipeline(sampleImage)
  .then(result => {
    console.log("\n=== FINAL OUTPUT ===");
    console.log(`\nSEO Title: ${result.seoContent.title.primary}`);
    console.log(`Alt Text: ${result.seoContent.altText.primary}`);
    console.log(`Meta: ${result.seoContent.description.meta}`);
    console.log(`Keywords: ${result.seoContent.keywords.primary.join(", ")}`);
    console.log(`Tags (${result.categorizedTags.tagCount}): ${result.categorizedTags.flatTags.slice(0, 10).join(", ")}...`);
    console.log(`SEO Score: ${result.seoContent.seoScore.overall}/100`);
    console.log(`Commercial Score: ${result.categorizedTags.commercialViability.score}/10`);
    console.log(`\nFull output:\n${JSON.stringify(result, null, 2)}`);
  })
  .catch(error => {
    console.error("Pipeline failed:", error.message);
  });

11. Comparing the Two Pipeline Directions

AspectHinge Pipeline (4.18.b)ImageKit Pipeline (4.18.c)
DomainDating app profilesImage asset management
Agent 1Profile Analyzer (analyze strengths/weaknesses)Metadata Extractor (identify subjects, colors, mood)
Agent 2Bio Improver (rewrite based on analysis)SEO Optimizer (generate titles, descriptions, alt text)
Agent 3Conversation Starter Generator (create openers)Tag Categorizer (organize tags by category)
Temperature pattern0.7 → 0.8 → 0.9 (increasing creativity)0.5 → 0.7 → 0.6 (analytical first, then moderate)
Data flowSelective context (each agent gets specific fields)Selective context (each agent gets specific fields)
ValidationZod at every stepZod at every step
OutputAnalysis + improved bio + openersMetadata + SEO content + categorized tags
Shared patternInput → Analyze → Transform → GenerateInput → Extract → Optimize → Categorize

The key insight: Despite being completely different domains, both pipelines follow the same architectural pattern. The pattern is domain-agnostic.


12. Key Takeaways

  1. The three-agent pattern generalizes across domains — whether you are processing dating profiles or image assets, the Analyze → Transform → Generate structure works.
  2. SEO optimization benefits from multi-agent decomposition because extraction, optimization, and categorization are genuinely different skills.
  3. Zod schema validation at each step is especially critical for SEO pipelines where downstream systems (search engines, CDNs) depend on correct metadata.
  4. Temperature strategy differs by domain — image analysis needs lower temperatures (factual), while dating profile work needs higher temperatures (creative).
  5. Batch processing with Promise.allSettled lets you process many items while isolating failures.
  6. Selective context keeps each agent focused — the Tag Categorizer does not need the original EXIF data if the Metadata Extractor already extracted what matters.

Explain-It Challenge

  1. The ImageKit pipeline uses temperatures of 0.5, 0.7, and 0.6 (not monotonically increasing like the Hinge pipeline). Explain why the Tag Categorizer (Agent 3) uses a LOWER temperature than the SEO Optimizer (Agent 2).
  2. A product manager asks: "Can we run Agent 2 (SEO Optimizer) and Agent 3 (Tag Categorizer) in parallel instead of sequentially?" Analyze whether this is possible, what would change, and what the tradeoffs are.
  3. Design a fourth agent for this pipeline: an "Image Format Recommender" that suggests optimal image formats and compression settings based on the use-case tags. Define its schema and explain where it fits.

Navigation: ← 4.18.b — Hinge Direction: Profile Pipeline · 4.18.d — Validation and Error Handling →