Header Image AI 绘画,创意无限

文生图能力

学习目标

  • 理解文本到图像生成的基本原理
  • 掌握DALL-E 3的使用方法
  • 学习提示词工程在图像生成中的应用
  • 了解图像生成的最佳实践

先修知识

  • OpenAI API的基本使用
  • 基本的图像处理概念
  • Python编程基础

1. 文生图概述

1.1 什么是文生图

文生图(Text-to-Image)是利用人工智能将文本描述转换为视觉图像的技术。它能够:

  • 理解自然语言描述
  • 生成符合描述的图像
  • 保持风格一致性
  • 处理复杂的视觉元素

1.2 应用场景

  1. 创意设计和艺术创作
  2. 产品原型展示
  3. 教育资料制作
  4. 营销内容生成
  5. 游戏资产创建

2. DALL-E 3实现

2.1 基础图像生成器

class ImageGenerator:
    """图像生成器基类"""
    
    def __init__(self, model="dall-e-3"):
        """初始化图像生成器
        
        Args:
            model (str): 使用的模型名称
        """
        self.model = model
        self.style_presets = {
            "realistic": "照片级真实风格",
            "artistic": "艺术创意风格",
            "cartoon": "卡通动漫风格",
            "sketch": "素描手绘风格"
        }
    
    def generate_image(self, prompt: str, style: str = "realistic", size: str = "1024x1024") -> str:
        """根据提示词生成图像
        
        Args:
            prompt (str): 图像描述提示词
            style (str): 图像风格,可选值:realistic, artistic, cartoon, sketch
            size (str): 图像尺寸,可选值:1024x1024, 1024x1792, 1792x1024
            
        Returns:
            str: 生成的图像URL
        """
        # 添加风格描述
        style_desc = self.style_presets.get(style, "")
        enhanced_prompt = f"{prompt}\nStyle: {style_desc}" if style_desc else prompt
        
        try:
            response = openai.Image.create(
                model=self.model,
                prompt=enhanced_prompt,
                size=size,
                quality="hd",
                n=1
            )
            return response.data[0].url
        except Exception as e:
            print(f"图像生成失败: {str(e)}")
            return None

    def generate_variations(self, image_path: str, n: int = 3) -> list:
        """生成图像变体
        
        Args:
            image_path (str): 原始图像路径
            n (int): 生成变体的数量
            
        Returns:
            list: 生成的变体图像URL列表
        """
        try:
            with open(image_path, "rb") as image_file:
                response = openai.Image.create_variation(
                    image=image_file,
                    n=n,
                    size="1024x1024"
                )
                return [data.url for data in response.data]
        except Exception as e:
            print(f"变体生成失败: {str(e)}")
            return []

    def edit_image(self, image_path: str, mask_path: str, prompt: str) -> str:
        """编辑图像
        
        Args:
            image_path (str): 原始图像路径
            mask_path (str): 蒙版图像路径
            prompt (str): 编辑描述
            
        Returns:
            str: 生成的图像URL
        """
        try:
            with open(image_path, "rb") as image_file, open(mask_path, "rb") as mask_file:
                response = openai.Image.create_edit(
                    image=image_file,
                    mask=mask_file,
                    prompt=prompt,
                    n=1,
                    size="1024x1024"
                )
                return response.data[0].url
        except Exception as e:
            print(f"图像编辑失败: {str(e)}")
            return None

2.2 使用示例

# 初始化图像生成器
generator = ImageGenerator()

# 生成图像
prompt = """
A serene landscape with a mountain lake at sunset, 
featuring snow-capped peaks reflected in crystal clear water. 
The sky is painted in vibrant oranges and purples, 
with a few wispy clouds catching the last rays of sunlight.
"""

# 生成不同风格的图像
styles = ["realistic", "artistic", "cartoon", "sketch"]
for style in styles:
    image_url = generator.generate_image(prompt, style=style)
    print(f"{style.capitalize()} style image: {image_url}")

# 生成图像变体
original_image = "landscape.png"
variations = generator.generate_variations(original_image)
for i, url in enumerate(variations):
    print(f"Variation {i+1}: {url}")

# 编辑图像
edit_prompt = "Add a wooden cabin by the lake"
edited_url = generator.edit_image("landscape.png", "mask.png", edit_prompt)
print(f"Edited image: {edited_url}")

图像生成示例

1. 现代工作空间(扁平设计)

提示词

Digital illustration in Flat Design style of a modern workspace with a desk, laptop, and plants, 
using clean lines and bold colors. The scene is well-lit with natural lighting from a large window. 
Minimalist aesthetic with a color palette of soft blues and warm grays.

现代工作空间

2. 智能城市(等距风格)

提示词

Isometric illustration of a smart city concept with modern buildings, flying vehicles, 
and green energy solutions. Clean geometric shapes, consistent 120-degree angles, 
bold color palette with tech-inspired blues and whites.

智能城市

3. 日本花园(水彩风格)

提示词

Watercolor illustration of a serene Japanese garden in spring, 
with blooming cherry blossoms, a small wooden bridge over a koi pond, 
and traditional stone lanterns. Soft, flowing colors with visible brush strokes.

日本花园

4. 数据安全仪表盘

提示词

Here is the flat illustration of a professional discussing data security on a futuristic dashboard, 
featuring a gradient color scheme of deep blue, soft purple, and white highlights.

数据安全仪表盘

5. 圣诞树(迪士尼风格)

提示词

A cute 3D-rendered Christmas tree with bright, vibrant colors in a minimalistic Disney-inspired style. 
Featuring simple, rounded shapes, soft lighting, and a cheerful festive atmosphere on a clean background.

圣诞树

6. 礼物盒(迪士尼风格)

提示词

A cute 3D-rendered gift box with bright, playful colors in a simple, Disney-inspired style. 
Designed with smooth, rounded edges, soft lighting, and a festive, minimalistic appearance, 
set against a clean background.

礼物盒

提示词编写技巧

从上面的示例中,我们可以总结出一些编写高质量提示词的技巧:

  1. 明确风格:在提示词中明确指定想要的艺术风格,如扁平设计、等距风格、水彩风格等。

  2. 详细描述视觉元素

    • 形状:如"clean lines"、"geometric shapes"、"rounded edges"
    • 颜色:如"soft blues"、"warm grays"、"deep blue, soft purple"
    • 光照:如"well-lit"、"soft lighting"
    • 氛围:如"minimalist aesthetic"、"cheerful festive atmosphere"
  3. 指定技术细节

    • 渲染方式:如"3D-rendered"
    • 特定角度:如"120-degree angles"
    • 背景要求:如"clean background"
  4. 参考知名风格:如"Disney-inspired style"可以快速传达特定的视觉风格。

  5. 空间布局:明确说明主要元素的位置和关系,如"in the bottom right corner"、"set against"等。

代码实现

完整的代码实现请参考 generate_images.py 文件。主要步骤包括:

  1. 初始化 OpenAI 客户端
  2. 准备详细的提示词
  3. 调用 DALL-E 3 API 生成图像
  4. 保存生成的图像到本地

关键代码片段:

def generate_image(prompt: str, size: str = "1024x1024") -> str:
    """生成图像并返回URL"""
    try:
        response = requests.post(
            f"{self.api_base}/images/generations",
            headers=self.headers,
            json={
                "model": "dall-e-3",
                "prompt": prompt,
                "n": 1,
                "size": size,
                "quality": "standard"
            }
        )
        
        if response.status_code != 200:
            print(f"API错误: {response.text}")
            return None
            
        return response.json()['data'][0]['url']
        
    except Exception as e:
        print(f"生成图像时出错: {str(e)}")
        return None

注意事项

  1. 提示词要尽可能详细和具体
  2. 指定清晰的视觉风格和技术要求
  3. 注意图像的分辨率和质量设置
  4. 处理好 API 调用的错误情况
  5. 及时保存生成的图像,因为图像 URL 是临时的

3. 提示词工程

3.1 提示词结构

一个好的图像生成提示词应包含:

  1. 主体描述(Subject):要生成的主要对象
  2. 场景环境(Environment):背景和环境描述
  3. 风格指定(Style):艺术风格和视觉效果
  4. 技术参数(Technical):相机角度、光照等
  5. 细节补充(Details):额外的细节要求

3.2 常用提示词模板

3.2.1 场景类

  1. 现代工作空间
Digital illustration in Flat Design style of a modern workspace with a desk, laptop, and plants, 
using clean lines and bold colors. The scene is well-lit with natural lighting from a large window. 
Minimalist aesthetic with a color palette of soft blues and warm grays.
  1. 自然风景
A breathtaking landscape photograph of a mountain range at golden hour, captured with a wide-angle lens. Dramatic clouds catch the warm sunlight, creating a stunning array of orange and purple hues. Sharp details in the foreground rocks and a silky smooth lake reflection. Shot at f/11, ISO 100.
  1. 建筑设计
Architectural visualization of a contemporary minimalist house, featuring clean geometric shapes and large glass windows. The structure seamlessly integrates with its natural surroundings. Rendered in a photorealistic style with careful attention to material textures and environmental lighting.
  1. 产品展示
Professional product photography of a sleek smartphone on a white background. Top-down view with soft, even lighting and subtle shadows. The device displays a vibrant app interface. Captured with a macro lens to highlight the premium materials and craftsmanship.

3.2.2 风格类

  1. 扁平设计
Flat design illustration with:
- Clean, minimalist shapes
- Bold, solid colors
- Simple geometric patterns
- No gradients or shadows
- 2D perspective
Style: Modern and minimal
  1. 等距风格
Isometric design featuring:
- 3D objects at 120-degree angles
- Precise geometric shapes
- Consistent perspective
- Bold color palette
- Clear outlines
Style: Technical and architectural
  1. 水彩风格
Watercolor illustration with:
- Soft, flowing colors
- Visible brush strokes
- Color bleeding effects
- Natural paper texture
- Organic shapes
Style: Artistic and expressive

3.3 提示词增强技巧

  1. 风格强化
Add style-specific keywords:
- Photorealistic: "hyperrealistic, 8k, detailed textures"
- Artistic: "impressionist style, vibrant brushstrokes"
- Digital: "vector art, clean lines, gradient colors"
  1. 光照描述
Enhance lighting details:
- Natural: "golden hour sunlight, soft shadows"
- Studio: "three-point lighting setup, rim light"
- Dramatic: "high contrast, dramatic shadows"
  1. 构图要素
Specify composition:
- Rule of thirds
- Leading lines
- Symmetrical balance
- Dynamic perspective
- Depth of field

4. 高级技巧

4.1 图像质量优化

  1. 分辨率和细节
  • 使用"high resolution"、"8K"等关键词
  • 指定"detailed"、"intricate"等描述
  • 添加"sharp focus"、"crisp details"
  1. 光照和氛围
  • 描述光源类型和方向
  • 指定阴影的软硬程度
  • 添加环境光效果
  1. 材质和纹理
  • 详细描述表面特性
  • 指定反射和透明度
  • 添加微观细节

4.2 常见问题解决

  1. 图像模糊
添加以下关键词:
- "sharp focus"
- "crystal clear"
- "4K resolution"
- "detailed"
  1. 构图不佳
指定构图要素:
- "centered composition"
- "rule of thirds"
- "balanced layout"
- "dynamic angle"
  1. 风格不一致
统一风格描述:
- 明确艺术风格
- 保持一致的渲染方式
- 指定参考作品

4.3 进阶技巧

  1. 负面提示词
  • 指定不想要的元素
  • 避免特定的风格
  • 排除不需要的效果
  1. 权重控制
  • 使用括号增加权重 (keyword)
  • 使用多重括号提高优先级 ((keyword))
  • 使用数字设置具体权重 (keyword:1.5)
  1. 组合提示词
def combine_prompts(*prompts, weights=None):
    """组合多个提示词
    
    Args:
        prompts: 提示词列表
        weights: 对应的权重列表
    
    Returns:
        str: 组合后的提示词
    """
    if weights is None:
        weights = [1.0] * len(prompts)
    
    weighted_prompts = [
        f"({prompt}:{weight})"
        for prompt, weight in zip(prompts, weights)
    ]
    
    return ", ".join(weighted_prompts)

4.4 工作流程优化

  1. 迭代生成
  • 从简单提示词开始
  • 逐步添加细节
  • 保存成功的提示词
  • 记录失败的尝试
  1. 批量生成
  • 准备提示词变体
  • 使用不同参数
  • 比较结果
  • 选择最佳输出
  1. 质量控制
  • 建立评估标准
  • 使用检查清单
  • 收集用户反馈
  • 持续优化流程

练习与作业

  1. 基础练习:生成不同风格的风景图
  2. 进阶练习:使用图像编辑功能创作
  3. 挑战练习:开发一个主题图像生成系统

常见问题

Q1: 如何提高生成图像的质量? A1: 提供详细的描述,指定具体的风格和参数,使用高质量设置。

Q2: 为什么有些图像生成失败? A2: 可能是提示词不当、内容限制或API限制导致。

扩展阅读