在编码自适应比特率阶梯时,通常必须这样做 compare videos with different resolutions, which raises multiple issues. For example, 在测量峰值信噪比(PSNR)或视频多方法评估融合(VMAF)时,将640x360视频与854x480视频进行比较, what resolution do you compare them at? 如何解释PSNR或VMAF评分,哪个指标是最好的? In this column, I’ll tackle all of these issues.

Regarding the first issue, there’s a theoretically correct answer, and then there’s how it’s generally done, and they don’t always correspond. 理论上正确的答案是比较观看视频的分辨率. For example, 如果你确定视频将在480p的窗口中观看, 您应该根据需要将源文件和输出文件缩放到480p,并在那里运行比较. However, few publishers have that degree of certainty, 因此,大多数人将编码文件扩展到源视频的分辨率,并在那里进行比较. 对于视频几乎总是全屏观看的OTT提供商来说,这当然是有意义的, and is a nice compromise position for other publishers.

Some programs handle this scaling behind the scenes; for most others, you have to scale in FFmpeg, 从时间和磁盘空间的角度来看,哪个是最痛苦的. 我的一个技巧是将编码文件转换为Y4M容器格式, rather than YUV, because the Y4M header contains resolution, frame rate, 并在质量控制工具中简化比较的格式信息. If you use the YUV container format, you’ll have to insert resolution, frame rate, 或者将数据格式化到命令行或将其输入到程序本身, which can be time-consuming.

第二个问题是,一旦你得到了分数,如何解释它们. 如果您将跨分辨率文件与源文件进行比较, 要明白,在较低的分辨率下,分数会下降,因为较小的文件包含更多的缩放工件和细节丢失. 这意味着以源分辨率编码的文件将获得最高分, with lower resolutions scoring increasingly lower.

For example, in an article I wrote on per-title encoding, 我比较了从1080p到180p的编码阶梯技术. 1080p级的典型PSNR评分为45-50 dB, and dropped to around 30 dB for the lowest rung. That’s not a lot of range. PSNR的经验法则是,超过45 dB的质量通常是观众无法察觉的, 而分数低于35通常预示着可见的人工制品. But that’s only for the 1080p rung; the 180p rung will never get close to 45 dB, although the files might look good at 32 dB. 所以你无法预测一个人会如何理解一个PSNR值为38 dB的360p文件, although when you’re comparing cross-resolution results, higher is always better.

VMAF的伟大之处在于它是为这种类型的交叉分辨率分析而设计的. Specifically, 分数100被映射到以22的恒定速率因子(CRF)编码的1080p文件, 而分数为20则映射到编码为240p、CRF值为28的文件. In the same per-title analysis, typical 1080p scores were in the mid- to upper 90s, while the 180p files often scored in the single digits.

这个范围使得VMAF分数比PSNR更容易解释, 但你仍然无法预测观众会如何看待中间片段的质量, say a 480p clip with a VMAF score of 42. 但是,您确实知道6个VMAF点等于一个刚刚可注意到的差异(JND)。. Technically, 这意味着75%的观众会注意到6个点的摆动, while closer to 90 percent would notice a 12-point, two-JND swing.

识别JND的能力对于一系列编码决策非常有用, 从配置编码阶梯到选择编码器或编解码器. 如果您还没有开始使用VMAF,那么是时候尝试一下了.

[This article appears in the October 2017 issue of Streaming Media Magazine as "Quality Metrics Up and Down the Encoding Ladder."]

