PAG工具用于透明融合特效的调研分析

软件开发大郭
0 评论
/
29 阅读
/
20877 字
18 2022-10
分类:

零. 前言

不久前,腾讯宣布自家的重磅中台美术工具PAG进行了开源,PAG有自己的一套AE插件和文件格式,支持特效预览和性能监控,可谓是美术最爱的所见即所得。

透明融合特效是直播产品送礼业务常用的一种特效,其特点是让MP4支持透明度,以及将文字、图片、直播视频嵌入到MP4中,一个示例视频如下:

为了学习这款强大的中台工具,便尝试用PAG做出透明特效效果,并探索其中的实现原理。下面是对PAG的iOS端部分源码的一些阅读和自己的理解。

一. PAGFile

PAGFile是包含图层、渲染基础信息的数据结构,其大体结构如下所示:

对应类继承关系如下图所示:

可以看到,我们需要得到视频、图片等数据才能转换为对应渲染用的数据,所以我们从底层往高层看,对于视频来说,我们需要获得帧信息、透明度、宽高等信息;对于遮罩来说,我们需要得到遮罩对应的每一帧的变换信息。

目前我们能拿到的只有PAG文件的地址,得到对应的stream,需要对stream进行解码和信息提取:

void ReadTags(DecodeStream* stream, T parameter, void (*reader)(DecodeStream*, TagCode, T)) {
  auto header = ReadTagHeader(stream);
  if (stream->context->hasException()) {
    return;
  }
  while (header.code != TagCode::End) {
    auto tagBytes = stream->readBytes(header.length);
    reader(&tagBytes, header.code, parameter);
    if (stream->context->hasException()) {
      return;
    }
    header = ReadTagHeader(stream);
    if (stream->context->hasException()) {
      return;
    }
  }
}
static void ReadTag_VectorCompositionBlock(DecodeStream* stream, CodecContext* context) {
  auto composition = ReadVectorComposition(stream);
  context->compositions.push_back(composition);
}

static void ReadTag_VideoCompositionBlock(DecodeStream* stream, CodecContext* context) {
  auto composition = ReadVideoComposition(stream);
  context->compositions.push_back(composition);
}
VectorComposition* ReadVectorComposition(DecodeStream* stream) {
  auto composition = new VectorComposition();
  composition->id = stream->readEncodedUint32();
  ReadTags(stream, composition, ReadTagsOfVectorComposition);
  Codec::InstallReferences(composition->layers);
  return composition;
}

读取到对应的Stream后,根据已设计好的偏移(写入和读取约定好某几位存储哪些信息),读取对应的宽、高、透明度、视频帧等信息。

VideoSequence* ReadVideoSequence(DecodeStream* stream, bool hasAlpha) {
  auto sequence = new VideoSequence();
  sequence->width = stream->readEncodedInt32();
  sequence->height = stream->readEncodedInt32();
  sequence->frameRate = stream->readFloat();

  if (hasAlpha) {
    sequence->alphaStartX = stream->readEncodedInt32();
    sequence->alphaStartY = stream->readEncodedInt32();
  }

  auto sps = ReadByteDataWithStartCode(stream);
  auto pps = ReadByteDataWithStartCode(stream);
  sequence->headers.push_back(sps.release());
  sequence->headers.push_back(pps.release());

  auto count = stream->readEncodedUint32();
  for (uint32_t i = 0; i < count; i++) {
    auto videoFrame = new VideoFrame();
    sequence->frames.push_back(videoFrame);
    videoFrame->isKeyframe = stream->readBitBoolean();
  }
  for (uint32_t i = 0; i < count; i++) {
    auto videoFrame = sequence->frames[i];
    videoFrame->frame = ReadTime(stream);
    videoFrame->fileBytes = ReadByteDataWithStartCode(stream).release();
  }

  if (stream->bytesAvailable() > 0) {
    count = stream->readEncodedUint32();
    for (uint32_t i = 0; i < count; i++) {
      TimeRange staticTimeRange = {};
      staticTimeRange.start = ReadTime(stream);
      staticTimeRange.end = ReadTime(stream);
      sequence->staticTimeRanges.push_back(staticTimeRange);
    }
  }

  return sequence;
}

将视频、图片的信息读取出来后,封装成对应的Composition,其中视频的类为VideoComposition,遮罩图片的类为VectorComposition。

根据debug的结果可以看到,index为0的composition为透明视频

index为1的composition为遮罩图片

最后一个composition的Layers对应前两个composition生成的Layer

在构造File的时候,取最后一个compoision,生成了一个包含视频帧和遮罩图片信息的mainComposition,且读取到的图层数量numLayers为2.

File::File(std::vector<Composition*> compositionList, std::vector<pag::ImageBytes*> imageList)
    : images(std::move(imageList)), compositions(std::move(compositionList)) {
  mainComposition = compositions.back();
  scaledTimeRange.start = 0;
  scaledTimeRange.end = mainComposition->duration;
  rootLayer = PreComposeLayer::Wrap(mainComposition).release();
  updateEditables(mainComposition);
  for (auto composition : compositions) {
    if (composition->type() != CompositionType::Vector) {
      _numLayers++;
      continue;
    }
    for (auto layer : static_cast<VectorComposition*>(composition)->layers) {
      if (layer->type() == LayerType::PreCompose) {
        continue;
      }
      _numLayers++;
    }
  }
}

使用mainComposition进行Layer的构造,当CompositionType为PreCompose的时候,说明该图层是在AE插件预生成好了的;当CompositionType为Vector的时候,说明该图层是可在代码层面编辑的。

在这里,遮罩图片的CompositionType为Vector,LayerType为Image,里面包含了一些出现时机、持续时间、某帧对应效果信息:

而透明通道视频的CompositionType为PreCompose,根据VideoSequence记录每一帧的信息。

根据上述的两个Composition,构建出对应的Layer(PAGComposition类型),再让根rootLayer(PAGFile类型)的layers字段持有这两个对应的Layer,合成生成PAGFile,给业务层使用。

std::shared_ptr<PAGLayer> PAGFile::BuildPAGLayer(std::shared_ptr<File> file, Layer* layer) {
  PAGLayer* pagLayer;
  switch (layer->type()) {
...
    case LayerType::Image: {
      pagLayer = new PAGImageLayer(file, static_cast<ImageLayer*>(layer));
      pagLayer->_editableIndex = file->getEditableIndex(static_cast<ImageLayer*>(layer));
    } break;
    case LayerType::PreCompose: {
      if (layer == file->getRootLayer()) {
        pagLayer = new PAGFile(file, static_cast<PreComposeLayer*>(layer));
      } else {
        pagLayer = new PAGComposition(file, static_cast<PreComposeLayer*>(layer));
      }

      auto composition = static_cast<PreComposeLayer*>(layer)->composition;
      if (composition->type() == CompositionType::Vector) {
        auto& layers = static_cast<VectorComposition*>(composition)->layers;
        // The index order of PAGLayers is different from Layers in File.
        for (int i = static_cast<int>(layers.size()) - 1; i >= 0; i--) {
          auto childLayer = layers[i];
          auto childPAGLayer = BuildPAGLayer(file, childLayer);
          static_cast<PAGComposition*>(pagLayer)->layers.push_back(childPAGLayer);
          childPAGLayer->_parent = static_cast<PAGComposition*>(pagLayer);
          if (childLayer->trackMatteLayer) {
            childPAGLayer->_trackMatteLayer = BuildPAGLayer(file, childLayer->trackMatteLayer);
            childPAGLayer->_trackMatteLayer->trackMatteOwner = childPAGLayer.get();
          }
        }
      }
    } break;
    default:
      pagLayer = new PAGLayer(file, layer);
      break;
  }
  auto shared = std::shared_ptr<PAGLayer>(pagLayer);
  pagLayer->weakThis = shared;
  return shared;
}

至此,一个封装好的PAGFile就出来了,它包含了两个部分:视频信息和遮罩图片信息,渲染时根据PAGFile的内容进行解包,转换为对应的渲染信息。

二. PAGView

PAGView主要是通过PAGPlayer类进行特效和遮罩的渲染,根据我们前面封装好的PAGFile文件,读取到视频帧、图片、位置、变换等信息。

渲染主要原理是:根据Layout信息和Texture信息,调用GL相关的Draw操作进行渲染。

由此,PAGView的主要作用为:

将前面封装好的PAGFile进行解包,得到视频对应的Sequence信息、图片对应的imageBytes信息,进行纹理读取;

同时需要读取Layout进行视频、遮罩图片的定位,最后调用GL进行渲染。

1. 关联PAGView与PAGFile

PAGStage类继承了PAGComposition,表示他是所有图层的根节点,被PAGPlayer所持有,而PAGPlayer被PAGView持有。

下面方法是PAGStage对PAGFile进行doAddLayer方法,目的是将PAGFile下的所有图层都声明被PAGStage持有。

bool PAGComposition::doAddLayer(std::shared_ptr<PAGLayer> pagLayer, int index) {
...
  pagLayer->attachToTree(rootLocker, stage);
  if (rootFile && file == pagLayer->file) {
    pagLayer->onAddToRootFile(rootFile);
  }
  this->layers.insert(this->layers.begin() + index, pagLayer);
  pagLayer->_parent = this;
...
  return true;
}
void PAGComposition::onAddToStage(PAGStage* pagStage) {
  PAGLayer::onAddToStage(pagStage);
  for (auto& layer : layers) {
    layer->onAddToStage(pagStage);
  }
}

最后PAGStage会将layer、effect等内容,绑定一个特定的id,便于之后渲染提取。自此,PAGStage就可以知道整个渲染过程用到的所有图层、序列帧、图片信息、变换效果、位置等信息。

void PAGStage::addReference(PAGLayer* pagLayer) {
  addToReferenceMap(pagLayer->uniqueID(), pagLayer);
  addToReferenceMap(pagLayer->layer->uniqueID, pagLayer);
  if (pagLayer->layerType() == LayerType::PreCompose) {
    auto composition = static_cast<PreComposeLayer*>(pagLayer->layer)->composition;
    addToReferenceMap(composition->uniqueID, pagLayer);
  } else if (pagLayer->layerType() == LayerType::Image) {
    auto imageBytes = static_cast<ImageLayer*>(pagLayer->layer)->imageBytes;
    addToReferenceMap(imageBytes->uniqueID, pagLayer);
    auto pagImage = static_cast<PAGImageLayer*>(pagLayer)->getPAGImage();
    if (pagImage != nullptr) {
      addReference(pagImage.get(), pagLayer);
    }
  }
  auto targetLayer = pagLayer->layer;
  for (auto& style : targetLayer->layerStyles) {
    addToReferenceMap(style->uniqueID, pagLayer);
  }
  for (auto& effect : targetLayer->effects) {
    addToReferenceMap(effect->uniqueID, pagLayer);
  }
  invalidateCacheScale(pagLayer);
}

2. PAGView的每帧回调渲染

PAGView的每一帧渲染基于CADisplayLink,每帧都会回调一次updateView操作,使PAGPlayer加载对应的视频和图片信息。

+ (void)StartDisplayLink {
  caDisplayLink = [CADisplayLink displayLinkWithTarget:[ValueAnimator class]
                                              selector:@selector(HandleDisplayLink:)];
  //这里本来是默认的mode,当ui处于drag模式下时,无法进行渲染, 所以改成commonmodes...
  [caDisplayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSRunLoopCommonModes];
}
- (void)actualUpdateView {
  [pagPlayer setProgress:self.animatorProgress];
  [self flush];
}

- (BOOL)flush {
  if (self.isInBackground) {
    return false;
  }
  auto result = [pagPlayer flush];
  if (self.bufferPrepared) {
    [PAGView RegisterFlushQueueDestoryMethod];
  }
  return result;
}

- (void)updateViewAsync {
  if (self.isAsyncFlushing) {
    return;
  }
  self.isAsyncFlushing = TRUE;
  NSOperationQueue* flushQueue = [PAGView FlushQueue];
  [self retain];
  NSBlockOperation* operation = [NSBlockOperation blockOperationWithBlock:^{
    [self actualUpdateView];
    self.isAsyncFlushing = FALSE;
    dispatch_async(dispatch_get_main_queue(), ^{
      [self release];
    });
  }];
  [flushQueue addOperation:operation];
}
PAGLayer的flush操作如下,其关键的几步在于:stage->draw、lastGraphic->prepare(renderCache)、pagSurface->draw(renderCache, lastGraphic, signalSemaphore, _autoClear)

bool PAGPlayer::flushInternal(BackendSemaphore* signalSemaphore) {
...
  if (contentVersion !=  stage->getContentVersion()) {
    contentVersion = stage->getContentVersion();
    Recorder recorder = {};
    stage->draw(&recorder);
    lastGraphic = recorder.makeGraphic();
  }
  auto presentingStart = GetTimer();
  if (lastGraphic) {
    lastGraphic->prepare(renderCache);
  }
  if (!pagSurface->draw(renderCache, lastGraphic, signalSemaphore, _autoClear)) {
    return false;
  }
...
  return true;
}

2.1 图层信息的提取与封装

stage->draw对应将PAGFile解包,对所有图层的包含的信息进行提取,stage则相当于图层的根节点,他继承了PAGComposition并直接调用其draw方法:

void PAGComposition::draw(Recorder* recorder) {
...
  auto count = static_cast<int>(layers.size());
  for (int i = 0; i < count; i++) {
    auto& childLayer = layers[i];
    if (!childLayer->layerVisible) {
      continue;
    }
    DrawChildLayer(recorder, childLayer.get());
  }
...
}

根据前面我们可以知道,stage包含了两个子图层,一个是视频图层,一个是遮罩图片图层,他们也会调用对应的draw方法。

void PAGComposition::draw(Recorder* recorder) {
  if (!contentModified() && layerCache->contentStatic()) {
    // 子项未发生任何修改且内容是静态的,可以使用缓存快速跳过所有子项绘制。
    getContent()->draw(recorder);
    return;
  }
  auto preComposeLayer = static_cast<PreComposeLayer*>(layer);
  auto composition = preComposeLayer->composition;
  if (composition->type() == CompositionType::Bitmap ||
      composition->type() == CompositionType::Video) {
    auto layerFrame = layer->startTime + contentFrame;
    auto compositionFrame = preComposeLayer->getCompositionFrame(layerFrame);
    auto graphic = stage->getSequenceGraphic(composition, compositionFrame);
    recorder->drawGraphic(graphic);
  }
...
}

这里可以看到,stage可以根据图层来找到对应的序列帧信息SequenceGraphic,他通过图层的id和uniqueID进行缓存,并查找到对应的序列帧,并封装成对应的Graphic。

std::shared_ptr<Graphic> PAGStage::getSequenceGraphic(Composition* composition,
                                                      Frame compositionFrame) {
  auto result = sequenceCache.find(composition->id);
  if (result != sequenceCache.end()) {
    if (result->second.compositionFrame == compositionFrame) {
      return result->second.graphic;
    }
    sequenceCache.erase(result);
  }
  SequenceCache cache = {};
  cache.graphic = RenderSequenceComposition(composition, compositionFrame);
  cache.compositionFrame = compositionFrame;
  sequenceCache[composition->uniqueID] = cache;
  return cache.graphic;
}
std::shared_ptr<Graphic> RenderSequenceComposition(Composition* composition,
                                                   Frame compositionFrame) {
  auto sequence = Sequence::Get(composition);
  if (sequence == nullptr) {
    return nullptr;
  }
  auto sequenceFrame = sequence->toSequenceFrame(compositionFrame);
  std::shared_ptr<Graphic> graphic = nullptr;
  if (composition->type() == CompositionType::Video) {
    graphic = MakeVideoSequenceGraphic(static_cast<VideoSequence*>(sequence), sequenceFrame);
  } else {
    auto proxy = new SequenceProxy(sequence, sequenceFrame, sequence->width, sequence->height);
    graphic =
        Picture::MakeFrom(sequence->composition->uniqueID, std::unique_ptr<SequenceProxy>(proxy));
  }
  auto scaleX = static_cast<float>(composition->width) / static_cast<float>(sequence->width);
  auto scaleY = static_cast<float>(composition->height) / static_cast<float>(sequence->height);
  return Graphic::MakeCompose(graphic, Matrix::MakeScale(scaleX, scaleY));
}

视频序列帧相关的信息则最后封装为RGBAAAPicture

static std::shared_ptr<Graphic> MakeVideoSequenceGraphic(VideoSequence* sequence,
                                                         Frame contentFrame) {
  // 视频序列帧导出时没有记录准确的画面总宽高,需要自己通过 width 和 alphaStartX 计算,
  // 如果遇到奇数尺寸导出插件会自动加一,这里匹配导出插件的规则。
  auto videoWidth = sequence->alphaStartX + sequence->width;
  if (videoWidth % 2 == 1) {
    videoWidth++;
  }
  auto videoHeight = sequence->alphaStartY + sequence->height;
  if (videoHeight % 2 == 1) {
    videoHeight++;
  }
  auto proxy = new SequenceProxy(sequence, contentFrame, videoWidth, videoHeight);
  RGBAAALayout layout = {sequence->width, sequence->height, sequence->alphaStartX,
                         sequence->alphaStartY};
  return Picture::MakeFrom(sequence->composition->uniqueID, std::unique_ptr<SequenceProxy>(proxy),
                         layout);
}
std::shared_ptr<Graphic> Picture::MakeFrom(ID assetID, std::unique_ptr<TextureProxy> proxy,
                                           const RGBAAALayout& layout) {
  if (layout.alphaStartX == 0 && layout.alphaStartY == 0) {
    return Picture::MakeFrom(assetID, std::move(proxy));
  }
  if (proxy == nullptr || layout.alphaStartX + layout.width > proxy->width() ||
      layout.alphaStartY + layout.height > proxy->height()) {
    return nullptr;
  }
  return std::shared_ptr<RGBAAAPicture>(new RGBAAAPicture(assetID, proxy.release(), layout));
}

同样地,遮罩图片也可以封装成一个Graphic:

std::shared_ptr<Graphic> Picture::MakeFrom(ID assetID, const Bitmap& bitmap) {
  if (bitmap.isEmpty()) {
    return nullptr;
  }
  auto proxy = new BitmapTextureProxy(bitmap);
  return std::shared_ptr<Graphic>(
      new TextureProxyPicture(assetID, proxy, bitmap.isHardwareBacked()));
}

2.2 预渲染:Reader的加载

lastGraphic->prepare(renderCache)主要是将前面封装好的结构,进行渲染前的解码,这里只有视频帧会有具体操作,生成一个reader并放入renderCache的缓存中,只会在播放前生成:

VideoSequenceReader::VideoSequenceReader(std::shared_ptr<File> file, VideoSequence* sequence,
                                         DecodingPolicy policy)
    : SequenceReader(std::move(file), sequence) {
  VideoConfig config = {};
  auto demuxer = std::make_unique<VideoSequenceDemuxer>(sequence);
  config.hasAlpha = sequence->alphaStartX + sequence->alphaStartY > 0;
  config.width = sequence->alphaStartX + sequence->width;
  if (config.width % 2 == 1) {
    config.width++;
  }
  config.height = sequence->alphaStartY + sequence->height;
  if (config.height % 2 == 1) {
    config.height++;
  }
  for (auto& header : sequence->headers) {
    auto bytes = ByteData::MakeWithoutCopy(header->data(), header->length());
    config.headers.push_back(std::move(bytes));
  }
  config.mimeType = "video/avc";
  config.colorSpace = YUVColorSpace::Rec601;
  config.frameRate = sequence->frameRate;
  reader = std::make_unique<VideoReader>(config, std::move(demuxer), policy);
}
bool RenderCache::prepareSequenceReader(Sequence* sequence, Frame targetFrame,
                                        DecodingPolicy policy) {
  auto composition = sequence->composition;
  if (!_videoEnabled && composition->type() == CompositionType::Video) {
    return false;
  }
  usedAssets.insert(composition->uniqueID);
  auto staticComposition = composition->staticContent();
  if (sequenceCaches.count(composition->uniqueID) != 0) {
#ifdef PAG_BUILD_FOR_WEB
    sequenceCaches[composition->uniqueID]->prepareAsync(targetFrame);
#endif
    return false;
  }
  if (staticComposition && hasSnapshot(composition->uniqueID)) {
    // 静态的序列帧采用位图的缓存逻辑,如果上层缓存过 Snapshot 就不需要预测。
    return false;
  }
  auto file = stage->getSequenceFile(sequence);
  auto reader = MakeSequenceReader(file, sequence, policy);
  sequenceCaches[composition->uniqueID] = reader;
  reader->prepareAsync(targetFrame);
  return true;
}

生成reader之后,我们在渲染的时候就可以使用reader来读取视频数据,从而获取对应的纹理了。

三. 开始渲染

根据前面准备好的内容,开始进行渲染操作,调用了pagSurface->draw(renderCache, lastGraphic, signalSemaphore, _autoClear),这里的pagSurface是画布的上层,持有画布和负责一些渲染的调度。

bool PAGSurface::draw(RenderCache* cache, std::shared_ptr<Graphic> graphic,
                      BackendSemaphore* signalSemaphore, bool autoClear) {
  if (device == nullptr) {
    device = drawable->getDevice();
  }
  auto context = lockContext();
  if (!context) {
    return false;
  }
  if (surface != nullptr && autoClear && contentVersion == cache->getContentVersion()) {
    unlockContext();
    return false;
  }
  if (surface == nullptr) {
    surface = drawable->createSurface(context);
  }
  if (surface == nullptr) {
    unlockContext();
    return false;
  }
  contentVersion = cache->getContentVersion();
  cache->attachToContext(context);
  auto canvas = surface->getCanvas();
  if (autoClear) {
    canvas->clear();
  }
  if (graphic) {
      // FBO相关操作,对应纹理的获取、顶点、片段着色器的执行
    graphic->draw(canvas, cache);
  }
  surface->flush(signalSemaphore);
  cache->detachFromContext();
  drawable->setTimeStamp(pagPlayer->getTimeStampInternal());
    
    // EAGL RBO渲染操作 
  drawable->present(context);
  unlockContext();
  return true;
}

渲染主要是进行FBO和RBO相关的操作,对应的代码是graphic->draw(canvas, cache);和drawable->present(context);

FBO操作中,Recorder会将之前封装好的每个图层都一一加载,并根据预设的matrix、blendMode等信息,生成一条渲染链,以生成纹理信息和顶点坐标信息,最后调用GL底层接口进行相应的渲染操作。流程较长,可以到对应文件看到相关的渲染操作,此处就不贴代码了。

生成好FBO信息后,就需要对RBO进行一系列操作,最后回调给EAGLContext进行渲染。

void EAGLWindow::onPresent(Context* context, int64_t) {
  auto gl = GLContext::Unwrap(context);
  if (layer) {
    gl->bindRenderbuffer(GL::RENDERBUFFER, colorBuffer);
    auto eaglContext = static_cast<EAGLDevice*>(context->getDevice())->eaglContext();
    [eaglContext presentRenderbuffer:GL::RENDERBUFFER];
    gl->bindRenderbuffer(GL::RENDERBUFFER, 0);
  } else {
    gl->flush();
  }
}

四. 总结与分析

1. PAG的工作流程

PAG在透明融合特效中的流程主要分为以下步骤:

设计师通过AE插件进行设计后,生成一个封装好的.pag格式的可执行文件。
用户侧对.pag文件进行解析,得到图层相关信息。
每帧回调时,调用渲染接口,对图层信息进行提取和封装;对视频信息进行解析。
调用底层渲染接口,从而渲染到屏幕上。

2. PAG与MP4在透明融合特效渲染的对比

对于透明融合特效功能来说,

PAG的做法是:将设计师想要的操作,哪一帧该怎么样渲染哪些图层,都浓缩在了.pag文件里面。

MP4+json文件的做法是:MP4包含了特效的原始信息,通过json来知道哪一帧应该将遮罩融合到MP4中去。

总的来说,PAG是一个大厂的优秀团队制作出来的中台产品,透明融合特效只是它能实现的一小部分功能,PAG的功能非常齐全,拓展性也很好,基于OpenGL的底层设计让他们能够用一处代码复用到多端的文件中。

PAG代码对各个层级的封装确实写得很好,各个组件各司其职参与了整个从解包到渲染的流程,其思路值得我们学习,但由于架构思路和我们产品原有的实现相差较大,所以只能从抽象意义上学习他们的思路。

从渲染性能和接入成本来说,相对于直接用MP4进行特效渲染,PAG渲染占用的CPU占比会相对较大,原因可能是对自定义的文件格式进行解包占用了一定的CPU。

与此同时libpag.framework体积大小为32.4MB。如果只需要PAG实现其中的某个功能,有种大材小用的感觉,接入的成本相对较高。后续如果需要大规模使用PAG的素材库的时候再考虑接入比较好。

    暂无数据