AI 绘画,创意无限
文生图能力
学习目标
- 理解文本到图像生成的基本原理
- 掌握DALL-E 3的使用方法
- 学习提示词工程在图像生成中的应用
- 了解图像生成的最佳实践
先修知识
- OpenAI API的基本使用
- 基本的图像处理概念
- Python编程基础
1. 文生图概述
1.1 什么是文生图
文生图(Text-to-Image)是利用人工智能将文本描述转换为视觉图像的技术。它能够:
- 理解自然语言描述
- 生成符合描述的图像
- 保持风格一致性
- 处理复杂的视觉元素
1.2 应用场景
- 创意设计和艺术创作
- 产品原型展示
- 教育资料制作
- 营销内容生成
- 游戏资产创建
2. DALL-E 3实现
2.1 基础图像生成器
class ImageGenerator:
"""图像生成器基类"""
def __init__(self, model="dall-e-3"):
"""初始化图像生成器
Args:
model (str): 使用的模型名称
"""
self.model = model
self.style_presets = {
"realistic": "照片级真实风格",
"artistic": "艺术创意风格",
"cartoon": "卡通动漫风格",
"sketch": "素描手绘风格"
}
def generate_image(self, prompt: str, style: str = "realistic", size: str = "1024x1024") -> str:
"""根据提示词生成图像
Args:
prompt (str): 图像描述提示词
style (str): 图像风格,可选值:realistic, artistic, cartoon, sketch
size (str): 图像尺寸,可选值:1024x1024, 1024x1792, 1792x1024
Returns:
str: 生成的图像URL
"""
# 添加风格描述
style_desc = self.style_presets.get(style, "")
enhanced_prompt = f"{prompt}\nStyle: {style_desc}" if style_desc else prompt
try:
response = openai.Image.create(
model=self.model,
prompt=enhanced_prompt,
size=size,
quality="hd",
n=1
)
return response.data[0].url
except Exception as e:
print(f"图像生成失败: {str(e)}")
return None
def generate_variations(self, image_path: str, n: int = 3) -> list:
"""生成图像变体
Args:
image_path (str): 原始图像路径
n (int): 生成变体的数量
Returns:
list: 生成的变体图像URL列表
"""
try:
with open(image_path, "rb") as image_file:
response = openai.Image.create_variation(
image=image_file,
n=n,
size="1024x1024"
)
return [data.url for data in response.data]
except Exception as e:
print(f"变体生成失败: {str(e)}")
return []
def edit_image(self, image_path: str, mask_path: str, prompt: str) -> str:
"""编辑图像
Args:
image_path (str): 原始图像路径
mask_path (str): 蒙版图像路径
prompt (str): 编辑描述
Returns:
str: 生成的图像URL
"""
try:
with open(image_path, "rb") as image_file, open(mask_path, "rb") as mask_file:
response = openai.Image.create_edit(
image=image_file,
mask=mask_file,
prompt=prompt,
n=1,
size="1024x1024"
)
return response.data[0].url
except Exception as e:
print(f"图像编辑失败: {str(e)}")
return None
2.2 使用示例
# 初始化图像生成器
generator = ImageGenerator()
# 生成图像
prompt = """
A serene landscape with a mountain lake at sunset,
featuring snow-capped peaks reflected in crystal clear water.
The sky is painted in vibrant oranges and purples,
with a few wispy clouds catching the last rays of sunlight.
"""
# 生成不同风格的图像
styles = ["realistic", "artistic", "cartoon", "sketch"]
for style in styles:
image_url = generator.generate_image(prompt, style=style)
print(f"{style.capitalize()} style image: {image_url}")
# 生成图像变体
original_image = "landscape.png"
variations = generator.generate_variations(original_image)
for i, url in enumerate(variations):
print(f"Variation {i+1}: {url}")
# 编辑图像
edit_prompt = "Add a wooden cabin by the lake"
edited_url = generator.edit_image("landscape.png", "mask.png", edit_prompt)
print(f"Edited image: {edited_url}")
图像生成示例
1. 现代工作空间(扁平设计)
提示词:
Digital illustration in Flat Design style of a modern workspace with a desk, laptop, and plants,
using clean lines and bold colors. The scene is well-lit with natural lighting from a large window.
Minimalist aesthetic with a color palette of soft blues and warm grays.
2. 智能城市(等距风格)
提示词:
Isometric illustration of a smart city concept with modern buildings, flying vehicles,
and green energy solutions. Clean geometric shapes, consistent 120-degree angles,
bold color palette with tech-inspired blues and whites.
3. 日本花园(水彩风格)
提示词:
Watercolor illustration of a serene Japanese garden in spring,
with blooming cherry blossoms, a small wooden bridge over a koi pond,
and traditional stone lanterns. Soft, flowing colors with visible brush strokes.
4. 数据安全仪表盘
提示词:
Here is the flat illustration of a professional discussing data security on a futuristic dashboard,
featuring a gradient color scheme of deep blue, soft purple, and white highlights.
5. 圣诞树(迪士尼风格)
提示词:
A cute 3D-rendered Christmas tree with bright, vibrant colors in a minimalistic Disney-inspired style.
Featuring simple, rounded shapes, soft lighting, and a cheerful festive atmosphere on a clean background.
6. 礼物盒(迪士尼风格)
提示词:
A cute 3D-rendered gift box with bright, playful colors in a simple, Disney-inspired style.
Designed with smooth, rounded edges, soft lighting, and a festive, minimalistic appearance,
set against a clean background.
提示词编写技巧
从上面的示例中,我们可以总结出一些编写高质量提示词的技巧:
明确风格:在提示词中明确指定想要的艺术风格,如扁平设计、等距风格、水彩风格等。
详细描述视觉元素:
- 形状:如"clean lines"、"geometric shapes"、"rounded edges"
- 颜色:如"soft blues"、"warm grays"、"deep blue, soft purple"
- 光照:如"well-lit"、"soft lighting"
- 氛围:如"minimalist aesthetic"、"cheerful festive atmosphere"
指定技术细节:
- 渲染方式:如"3D-rendered"
- 特定角度:如"120-degree angles"
- 背景要求:如"clean background"
参考知名风格:如"Disney-inspired style"可以快速传达特定的视觉风格。
空间布局:明确说明主要元素的位置和关系,如"in the bottom right corner"、"set against"等。
代码实现
完整的代码实现请参考 generate_images.py
文件。主要步骤包括:
- 初始化 OpenAI 客户端
- 准备详细的提示词
- 调用 DALL-E 3 API 生成图像
- 保存生成的图像到本地
关键代码片段:
def generate_image(prompt: str, size: str = "1024x1024") -> str:
"""生成图像并返回URL"""
try:
response = requests.post(
f"{self.api_base}/images/generations",
headers=self.headers,
json={
"model": "dall-e-3",
"prompt": prompt,
"n": 1,
"size": size,
"quality": "standard"
}
)
if response.status_code != 200:
print(f"API错误: {response.text}")
return None
return response.json()['data'][0]['url']
except Exception as e:
print(f"生成图像时出错: {str(e)}")
return None
注意事项
- 提示词要尽可能详细和具体
- 指定清晰的视觉风格和技术要求
- 注意图像的分辨率和质量设置
- 处理好 API 调用的错误情况
- 及时保存生成的图像,因为图像 URL 是临时的
3. 提示词工程
3.1 提示词结构
一个好的图像生成提示词应包含:
- 主体描述(Subject):要生成的主要对象
- 场景环境(Environment):背景和环境描述
- 风格指定(Style):艺术风格和视觉效果
- 技术参数(Technical):相机角度、光照等
- 细节补充(Details):额外的细节要求
3.2 常用提示词模板
3.2.1 场景类
- 现代工作空间
Digital illustration in Flat Design style of a modern workspace with a desk, laptop, and plants,
using clean lines and bold colors. The scene is well-lit with natural lighting from a large window.
Minimalist aesthetic with a color palette of soft blues and warm grays.
- 自然风景
A breathtaking landscape photograph of a mountain range at golden hour, captured with a wide-angle lens. Dramatic clouds catch the warm sunlight, creating a stunning array of orange and purple hues. Sharp details in the foreground rocks and a silky smooth lake reflection. Shot at f/11, ISO 100.
- 建筑设计
Architectural visualization of a contemporary minimalist house, featuring clean geometric shapes and large glass windows. The structure seamlessly integrates with its natural surroundings. Rendered in a photorealistic style with careful attention to material textures and environmental lighting.
- 产品展示
Professional product photography of a sleek smartphone on a white background. Top-down view with soft, even lighting and subtle shadows. The device displays a vibrant app interface. Captured with a macro lens to highlight the premium materials and craftsmanship.
3.2.2 风格类
- 扁平设计
Flat design illustration with:
- Clean, minimalist shapes
- Bold, solid colors
- Simple geometric patterns
- No gradients or shadows
- 2D perspective
Style: Modern and minimal
- 等距风格
Isometric design featuring:
- 3D objects at 120-degree angles
- Precise geometric shapes
- Consistent perspective
- Bold color palette
- Clear outlines
Style: Technical and architectural
- 水彩风格
Watercolor illustration with:
- Soft, flowing colors
- Visible brush strokes
- Color bleeding effects
- Natural paper texture
- Organic shapes
Style: Artistic and expressive
3.3 提示词增强技巧
- 风格强化
Add style-specific keywords:
- Photorealistic: "hyperrealistic, 8k, detailed textures"
- Artistic: "impressionist style, vibrant brushstrokes"
- Digital: "vector art, clean lines, gradient colors"
- 光照描述
Enhance lighting details:
- Natural: "golden hour sunlight, soft shadows"
- Studio: "three-point lighting setup, rim light"
- Dramatic: "high contrast, dramatic shadows"
- 构图要素
Specify composition:
- Rule of thirds
- Leading lines
- Symmetrical balance
- Dynamic perspective
- Depth of field
4. 高级技巧
4.1 图像质量优化
- 分辨率和细节
- 使用"high resolution"、"8K"等关键词
- 指定"detailed"、"intricate"等描述
- 添加"sharp focus"、"crisp details"
- 光照和氛围
- 描述光源类型和方向
- 指定阴影的软硬程度
- 添加环境光效果
- 材质和纹理
- 详细描述表面特性
- 指定反射和透明度
- 添加微观细节
4.2 常见问题解决
- 图像模糊
添加以下关键词:
- "sharp focus"
- "crystal clear"
- "4K resolution"
- "detailed"
- 构图不佳
指定构图要素:
- "centered composition"
- "rule of thirds"
- "balanced layout"
- "dynamic angle"
- 风格不一致
统一风格描述:
- 明确艺术风格
- 保持一致的渲染方式
- 指定参考作品
4.3 进阶技巧
- 负面提示词
- 指定不想要的元素
- 避免特定的风格
- 排除不需要的效果
- 权重控制
- 使用括号增加权重 (keyword)
- 使用多重括号提高优先级 ((keyword))
- 使用数字设置具体权重 (keyword:1.5)
- 组合提示词
def combine_prompts(*prompts, weights=None):
"""组合多个提示词
Args:
prompts: 提示词列表
weights: 对应的权重列表
Returns:
str: 组合后的提示词
"""
if weights is None:
weights = [1.0] * len(prompts)
weighted_prompts = [
f"({prompt}:{weight})"
for prompt, weight in zip(prompts, weights)
]
return ", ".join(weighted_prompts)
4.4 工作流程优化
- 迭代生成
- 从简单提示词开始
- 逐步添加细节
- 保存成功的提示词
- 记录失败的尝试
- 批量生成
- 准备提示词变体
- 使用不同参数
- 比较结果
- 选择最佳输出
- 质量控制
- 建立评估标准
- 使用检查清单
- 收集用户反馈
- 持续优化流程
练习与作业
- 基础练习:生成不同风格的风景图
- 进阶练习:使用图像编辑功能创作
- 挑战练习:开发一个主题图像生成系统
常见问题
Q1: 如何提高生成图像的质量? A1: 提供详细的描述,指定具体的风格和参数,使用高质量设置。
Q2: 为什么有些图像生成失败? A2: 可能是提示词不当、内容限制或API限制导致。