前言

前段时间OpenAI推出了Sora,展示了一系列生成的demo视频,效果非常惊艳,许多小伙伴肯定也都是心痒难耐,想要迫不及待的尝试一下,文生视频是否真的能够达到它所展示的效果。但可惜的是,到目前为止OpenAI的Sora仍然还没有公测。前段时间无意中看到了Open-Sora,在Github上开源了第一版模型权重,可以直接下载下来体验效果,于是便尝试了一番,看看它的效果如何。

Open-Sora初次体验

项目官方地址

https://github.com/hpcaitech/Open-Sora

硬件需求

根据README中描述,生成分辨率为512x512,2s长度的视频,需要的显存大小是24GB;我理解只要显卡的显存达到这个要求就能够跑起来,只是不同型号的显卡生成视频的速度可能有所区别。由于我个人并没有满足硬件需求的机器,因此我在 https://featurize.cn/ 上租了一台GPU实例来尝试,目前这个平台上的资源非常紧张,GPU实例基本上都是被占满的,很难租到满足需求的实例,大家也可以自行寻找一些其他的平台。

环境安装

正常按照官方GitHub仓库中的README一步步操作即可,下面的步骤基本只是按照README中的步骤执行一遍,稍微补充一点过程中遇到的与README描述中未包含的一些环境版本不完全一致的情况。

安装conda

如果选择租用featurize的实例,默认是已经安装了conda的,这一步可以跳过,如果租用的实例没有安装过conda,可以按照conda官方文档安装一下,我自己的机器一般会选择安装miniconda,参考:
https://docs.anaconda.com/free/miniconda/miniconda-install/

python环境配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 创建一个虚拟环境,命名为opensora,使用python3.10版本
conda create -n opensora python=3.10
# 使用opsora虚拟环境
conda activate opensora

# featurize中的work账户提供部分免费的永久空间,我由于之前用过一段时间,免费空间是50G,存放环境+模型勉强够用,新用户持久化空间应该是10G,因此我选择将conda环境持久化安装下来
conda create --prefix /home/featurize/work/CondaEnv/opensora python=3.10
conda activate /home/featurize/work/CondaEnv/opensora

# 安装 torch
# 下面的命令是针对CUDA 12.1版本的, 不同的cuda版本参考
# https://pytorch.org/get-started/locally/
pip install torch torchvision
# featurize实例上的cuda版本是11.8(通过命令nvcc -V查看)
# 使用下面的安装命令进行安装
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 安装 flash attention (optional)
# 官方README说是可选的步骤,我也按步骤安装了
pip install packaging ninja
pip install flash-attn --no-build-isolation

# 安装 apex (optional)
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

# 安装 xformers,cuda 11.8版本使用下面那一个命令
pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
pip install -U xformers --index-url https://download.pytorch.org/whl/cu118
# 如果跑起来报xformers相关的问题,可以尝试pip uninstall xformers后,再直接执行pip安装,不带任何参数试试
pip install xformers

# clone this project
git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install -v .

下载模型

1. 下载T5模型(text encoder)

T5权重下载到pretrained_models/t5_ckpts/t5-v1_1-xxl 中。

2. 下载生成模型

GitHub上有3个下载链接,分别是3个训练阶段的权重,我下载的是最后一个阶段的模型(16x512x512 HQ),理论上效果应该比前两个更好些。(可以存储到任意路径,后面使用inference脚本生成视频时,可以指定模型路径;我的存储路径:/home/featurize/work/Models/OpenSora/OpenSora-v1-HQ-16x512x512.pth)

生成视频

1
2
3
4
5
# 激活conda环境
conda activate /home/featurize/work/CondaEnv/opensora
# 生成视频,--prompt-path参数对应的路径为Prompt路径,在该文件中编辑Prompt内容,1行对应1个视频
cd ~/work/Open-Sora
/home/featurize/work/CondaEnv/opensora/bin/torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path /home/featurize/work/Models/OpenSora/OpenSora-v1-HQ-16x512x512.pth --prompt-path /home/featurize/work/Prompts/t2v_samples.txt

16x256x256的视频,生成时大概需要22GB显存

16x512x512的视频,生成时大概需要24GB显存

使用GitHub主页上展示的其中一个sample的默认Prompt:

1
A serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest.

16x256x256(效果比较差,看不出来是什么)

16x512x512(效果明显好很多,接近GitHub主页上的效果)

64x512x512说是需要40G-60G的显存,我租用的实例显存不够,没能跑起来。

懒得想Prompt,随便写了一个很简单的,预期效果会比较差:

1
A dark dragon fly on blue sky.

符合预期

用ChatGPT完善一下Prompt:

1
A wide shot of a vibrant blue summer sky at midday. A dark dragon is flying with powerful wings darts and weaves across the screen, leaving a shimmering trail behind it.

有点龙的样子了,但是完全没有飞行的动态感觉

尝试一下风景的Prompt:

1
Scene: A breathtaking panoramic view of the Li River winding its way through the dramatic karst landscape of Guilin, China, bathed in the golden light of sunset. Foreground: Lush, emerald green rice paddies border the banks of the Li River. Traditional Chinese farmers, clad in straw hats and simple clothing, work diligently with hand tools, tending to the crops. Midground: The calm waters of the Li River shimmer a turquoise blue, reflecting the vibrant colors of the sunset sky. Traditional bamboo rafts, carrying tourists who admire the scenery, gently navigate the river. Background: Towering karst mountains dominate the skyline, their sheer cliffs shrouded in a veil of wispy mist. Lush green vegetation clings to some of the peaks, while others rise barren and dramatic against the fiery sky. Sky: A breathtaking sunset paints the sky with a magnificent display of color. Fiery oranges, deep pinks, and vibrant purples blend seamlessly, creating a breathtaking spectacle. Wispy clouds streak across the canvas, adding to the dynamism of the scene. Sound: Capture the peaceful sounds of nature: the gentle flow of the Li River, the chirping of birds in the lush greenery, and the occasional call of unseen animals in the mountains. Camera Movement: Begin with a wide panoramic shot showcasing the vastness of the landscape. Slowly pan across the scene, highlighting the details - the farmers working in the rice paddies, the tranquil Li River, and the majestic karst mountains. Zoom in for a closer look at the bamboo rafts and the awe on the tourists' faces.

画面感觉还不错,虽然很多Prompt里的东西没有体现,但是2s的视频也确实不太可能能涵盖这么多内容

最后尝试一下使用OpenAI Sora的公开的Demo的Prompt生成的效果:

1
A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

1
Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.

1
A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

1
Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.

1
3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.

1
New York City submerged like Atlantis. Fish, whales, sea turtles and sharks swim through the streets of New York.

总结

  • 总体感觉非人物类型的视频生成的效果还不错。生成人物的效果目前还很差,这点官方主页上也说了;
  • 2s的长度确实也有点短,不过这是目前少数开源能够上手尝试把玩的文生视频的模型了,还能够下载权重模型到本地运行,还要什么自行车呢
  • 根据官方主页描述,训练的时间还相对还比较短,后续更多训练后的效果以及时长可以保持期待一下;