osu! mapping blog

Video encoding

Introduction

In this article, I'd like to address an issue of video encoding. I believe that vast majority of anime maps do have a video set for their background, which is pretty cool - the video enhances the map's visuals to some degree. Still not as professional as a good storyboard, but it is definitely a nice thing. However, most of the mappers don't really realise how they should encode the video and why.

I don't want to only give you a guide to make this properly, but I also want to give you an idea of what you are even doing. Most of you would probably go for a HandBrake or VirtualDub and get it done as quickly as possible. While I don't intend to say any of these programs is bad, the GUI tools usually tend to give people an impression that they don't need any knowledge regarding video encoding. In reality, getting to know how things work can make a big difference and result in a surprisingly good video quality. I will go through 4 topics covering why this matters, visual comparison, to get an idea of how each video would look, decoding speed comparison, to get an idea of how much load-heavy it is and how it all can be accomplished. I will also talk a little bit about VP9 and x265.

Why does it matter?

There are 3 factors that can be considered - quality, file size and decoding speed. Quality is certainly something you want to consider, do you want a relatively clean and smooth video or a video with Minecraft textures? I guess most of us, both mappers and players would enjoy an as clean video as possible. If something is here as a visualisation, you'd expect it to look nice, not just to be here because it exists. File size also matters, but how much? You are limited to about 30 MB of file size for the .osz file (please note that .osz is mostly smaller size than the folder's content, try exporting the beatmap to check the real size). Using the full 30 MB is not a bad thing - you are allowed to do so by Ranking Criteria and it is for a purpose of better quality of the video (or of the mapset in general). Many people say that using all the size is not good because people reject to download huge beatmaps, especially with a slow internet. However, everyone has an option to download the map without the video. If you are concerned about your internet speed, I'd guess choosing the file without the video is going to be your choice. Now I'm not saying that you need to use the 30 MB limit every single time, I just want to make you aware that you can do that if you want to enhance the video quality. Using a lower file size is still an acceptable thing, in that case, you should decide if using those 30 MB wouldn't help you (or if it would matter to you) and you should also care about the way you encode the video, because even with the same file size, there can be a pleasant difference in quality. You might not know what "decoding speed" is - in short, it's how quickly the video can be processed. If the speed is too fast, it's very likely that you sacrificed a lot of quality, if it's too high, you might be using some algorithms that are too difficult to decode, most people don't even realise that there are options that can cause the video to lag for low-end PCs without actually doing a big difference in quality.

Visual comparison

You can notice the biggest differences visually, so I've taken few shots of several encodes to get an idea of how much the quality is affected. x265 seems to export the snapshots 1 frame later for some reason, it's still not difficult to compare, so I will keep that as-is, x264 comparisons are more important here anyway. x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass) x264 (Veryfast, 4.0, Main) | x264 (Veryfast, 5.2, Main) | x264 (Veryfast, 5.2, High) | x264 (Veryfast, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode) | x264 (Placebo, 5.2, High, fastdecode, 2pass) | x264 (Placebo, 5.2, High, no-cabac, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation, 2pass) | x264 (Placebo, 5.2, High, no-cabac, animation) | Xvid | VP9 | x265 (Placebo, 5.2, 2pass) | x265 (Placebo, 5.2, fast-decode, 2pass)

Decoding comparison

Obviously, a visual comparison would not tell you how load-heavy it is. I measured how long time it takes (in ms) to decode each of these videos, only CPU-based because the GPU measurement would vary heavily depending on the GPU I own. CPU is more objective in this regard. The higher number = higher encoding time. The problem is that most of the beatmaps which have a video use Level 4.1 or 4.0, Main Profile, High encoding speed and don't disable some useless things that increase the load. That also harms the quality as you can compare in the visual comparison.

How to get the best result?

It depends on what you really need. There were approximately 3 different decode times - 2s, 4s and 6s, while 2s had somewhat okay-ish quality, 6s had a bad quality and 4s had a very good quality. Choosing the 6s option therefore isn't reasonable at all. But there's an important thing to think about. All the files are the same file size, but the 2s solution looks definitely worse than the 4s solution. If you were satisfied with the quality that 2s solution provides, using a little bit lower bitrate for the 4s solution would actually lower the load near the 2s and would not have to be so large in size while still having the quality of the 2s solution. You'd have to experiment with it though. My personal choice is the 4s solution, but feel free to go for the 2s solution, there's nothing extremely wrong about it.

Now, how does it work, you might have some encoder GUI such as HandBrake or VirtualDub. There is a high chance that the version of x264 encoder will be outdated. For that reason, I like to use the CLI x264. Go for the one without 10b, I might cover this in another article. Going by the date is generally the best option, but if the file size is suspiciously low (e.g. 03-Dec-2016), avoid them. They are most likely missing libraries to decode non-raw video. Another tool I use is ffmpeg (go for the Static, you need only ffmpeg.exe from bin/binary or similar folder), because that allows me to convert the .h264 file to .avi. You could encode from source to x264 .avi just with ffmpeg, but again, the version might be outdated and ffmpeg deals a little bit different with several things. You'll most likely return back to the Handbrake or any other GUI alternative, but you'll just understand what to set here and that you should use the additional options.

If you have these two tools, you should get the best available source of the video. That might be a blu-ray version, official website, bought game etc. (You can download most of these things legally too, cloud storages are a good option) Now, you need to know several settings that affect how your video is encoded.

--profile

There are several profiles, for example baseline, main or high. They have an impact on the quality and you'd most likely want to use main or high. Using high will encode the video for a longer time, but will result in a better quality. Profile sets the boundaries of encoder, the simplest profiles disable some algorithms that may enhance the image and bitrate management. Recommended: high

--preset

Preset mainly determines the speed of the encoder. There are several options such as ultrafast, veryfast medium, slow, veryslow or placebo. Some of these options have a bigger impact on decoding speed, some don't make any difference. It forces to skip certain steps, resulting in worse quality of image if you choose fast presets, slower presets try to use the whole file size as efficiently as possible. Although I used placebo in this test, I don't recommend it as veryslow has almost negligible difference in quality and seems to have a little bit lower load than placebo. I might extend the graph at some point, but I currently don't have the data for veryslow. Recommended: veryslow

--tune

Tune affects the algorithms that process the video, such as deblocking filter, CABAC or weighted prediction for B or P frames. There are several tunes, but my favourite are film, animation and fastdecode. To provide the fastest decoding, use fastdecode, this enables most of the algorithms, so you might lose a bit of quality, but you may eventually enable some of the settings later. If the video is animated, using animation is usually very good idea. It might improve the quality of the picture, but sometimes won't really do a difference. Recommended: fastdecode (alternatively animation or film)

--level

Level sets the limits of decoder. You don't need to really know what it does, but it essentially sets how quickly your decoder can work and how much data it can store in memory, storing too low amount of data in memory might result in random lags when waiting for another frame. Most people would have a problem with Double Time, in that situation, the frame rate of the video is double, so you are loading twice as many frames, but your memory might not be fast enough to free the memory and load next frames. Recommended: 5.0, 5.1 or 5.2

--vf resize

This is just my addition, not a really important setting, but if you have a source that is very high resolution, you might resize it to the rankable resolution. Example: --vf resize:1280,720

--no-cabac

CABAC is an algorithm that helps storing the video more efficiently. It will force better usage of bitrate, but decoding becomes much slower, --no-cabac disables this algorithm, that's why I recommend using this.

Encoding

If you understand each point and are able to decide which settings to choose, here's a little example of how it works. All the commands must be executed in a command line or .bat file. This example is the x264 (Placebo, 5.2, High, no-cabac, animation).

x264 --bitrate 1700 --profile high --preset placebo --tune animation --no-cabac --level 5.2 --vf resize:1280,720 -o D:\output.x264 D:\original_input.avi & ffmpeg -i -r 24 D:\output.x264 -r 24 -c copy D:\output.avi

The same, but 2pass, which results in a little bit better quality could look like this. This is if you are very nitpicky and can wait.

x264 --pass 1 --bitrate 1700 --profile high --preset placebo --tune animation --no-cabac --level 5.2 --vf resize:1280,720 -o NUL D:\original_input.avi & x264 --pass 2 --bitrate 1700 --profile high --preset placebo --tune animation --no-cabac --level 5.2 --vf resize:1280,720 -o D:\output.x264 D:\original_input.avi & ffmpeg -i -r 24 D:\output.x264 -r 24 -c copy D:\output.avi

Other codecs

x264 is pretty much a standard in osu! (and basically everywhere else). Some people tend to use Xvid, but it is very ineffective - if you compare the load of the x264 (Veryfast, 5.2, High, fastdecode) and then the quality, you'll realise that using Xvid is non-sense. We don't have support for x265 or VP9, so using them is impossible. Nevertheless, I added them to the graph to show that while x265 provides an outstanding quality, the load might be a little bit edgy. (Although think about the fact that we could reduce the file size) VP9 however (which is the codec that for example YouTube is using for all of its videos) is not so much different in load and provides a very pleasant quality as well. I personally think that supporting VP9 would be a very clever thing and could be implemented into the game relatively easily. We'll see if that ever happens! What I said might be quite contradictory, because I claimed that the decoding speed is an important thing. File size, quality and decoding speed must support each other - The tested example was very low quality and had a high decoding time, when it's clearly possible to accomplish higher quality with the same file size and lower decoding time. If VP9 was supported, the decoding speed would be relevant because the quality is really outstanding for the file size. It might still lag on low-end PCs with double time, but that can happen even with the very high decoding speeds if it's a low-end PC, there's a fair trade off unlike in the case of bad usage of x264.