Video streaming
Towards a Cross-layer Video Streaming Optimization
Video streaming is omnipresent and, due to recent global events, the number of people being at home watching streamed video only increased further. The main issue is that the network conditions under which user stream video is not always ideal. This results in a lowered visual quality due to the "adaptive bitrate algorithms" (ABR) that try to select a quality of video that is small enough, in terms of video bitrate and thus amount of data, to be streamed under all current network conditions. As those algorithms are not perfect, the visual quality degrades unnecessarily. In the worst case the ABR misjudges the network condition, the video does not get downloaded in time and the video stalls, presenting the viewer, typically, with a spinning indicator until enough new video is downloaded to continue playback. While there is vast literature on optimizing video streaming, virtually all prior work follow a piecemeal approach-either "tweaking" the transport layer or making the client "smarter."
With our system, which we called VOXEL, we follow a more holistic approach. First, we recognize that some video frames are more important than others, i.e., simply dropping certain frames does not degrade visual quality and thus does not influence the end-users quality of experience (QoE). But we start at the transport layer, avoiding TCPs need to transfer every single byte, even when this results in head-of-line blocking.
But we go further as to not only distinguish video frames by type but to analyze the entire video to rank each individual frame by their actual influence in the overall visual quality of the video. With this fine grained information, we can, instead of blindly reducing the video bitrate, hoping the visual impact will not degrade the QoE, reduce the required amount of data precisely to the network condition while knowing exactly what the impact on the QoE will be. We, therefore, created a new kind of ABR that does not aim to maximize the bitrate but the visual quality. This synergy of video streaming tailored transport, one time in-depth video analysis and visual quality aware ABR, results in VOXEL reducing the rebuffering, even in challenging network conditions by at least 25% and up to 90%, all while providing a visual quality that is at least on-par with state-of-the-art streaming solutions.
In addition to measuring the QoE with objective quality metrics like SSIM, VMAF and PSNR, we also conducted a real user survey where we recruited 54 participants from different universities and asked them to watch short video clips that were recorded from streaming experiments under identical challenging network conditions with VOXEL and the state-of-the-art. 84% of the participants preferred watching the version streamed with VOXEL. When asked if they would continue a stream that behaves like the shown clips, 74% of participants would have abandoned the video when streamed via the state-of-the-art. In contrast, only 36% would have stopped watching a VOXEL clip.
One reason for this preference is the vastly reduced rebuffering, as confirmed by the participants. As the dropping of frames in VOXEL can introduce visual artifacts, the Mean Opinion Score (MOS) for "glitches" and "clarity" were slightly lower for VOXEL, though, the MOS for the overall watching experience was much higher for VOXEL. Lastly, to ease adopting, VOXEL is entirely backwards compatible to existing streaming solutions and each component can incrementally be deployed.
Application of VOXEL to 360 degree video
Coordinator: Mirko Palmer
We want to apply VOXEL to 360 degree video, commonly referred to as VR video. Avoiding rebuffering a primary goal there as to not confuse or discomfort users when the video suddenly stops and rebuffers. The main problem is that, compared to regular video, one does not only have a single flat video stream but a spherical projection of several so called tiles, arranged in a grid, each of them being videos themselves, that are stitched together to form the 360 degree sphere. This results in a vastly increased complexity in terms of quality selection of each individual video tile, or in case of VOXEL, where to drop which frames, in order to avoid rebuffering.
Another aspect, different to regular video, is that the viewer can freely rotate their head and thus focus on different parts of the 360 scene. On one hand, this eases the steam as video data that is, behind the user’s head, so to speak, does not need to be transferred in the highest quality. Though, if the user suddenly turns, they do expect the quality to be as high as possible.
As a result, to avoid rebuffering, we have to anticipate where the user will look next, and maximize the quality of each tile, given the current network situation, i.e., the network transfer budget available to select tile qualities and again, what fraction of frames to not even transfer as the lack of them would not negatively influence the user’s quality of experience (QoE).
Video Streaming with Cross-layer information Sharing
Today, content consumption on the internet is omnipresent. Since the global pandemic and a move towards working from home, the amount of content consumption, specifically, video- streaming has increased substantially. Though, the network conditions are not always ideal to support the high throughput requirements for content consumption. The state-of-the-art solution for overcoming insufficient throughput for video-streaming is to employ some form of adaptive bitrate (ABR) algorithm. An ABR algorithm selects a specific video quality that has a throughput requirement lower than the available throughput. This selection is repeated every few seconds to adjust to account for a change in the available throughput. These algorithms, however, are not perfect: they can misjudge the network conditions and either download a quality lower than necessary, impacting a users’ quality-of-experience (QoE) or select a quality that requires more data than the current network conditions allow, resulting in stalls due to the video notbeing delivered in time. The latter results in a significant degradation of user’s QoE. Virtually all prior work follow a piecemeal approach—either “tweaking” the fully reliable transport layer or making the client “smarter.” Departing from prior work, we follow a holistic approach and design a cross-layer video-streaming solution, called VOXEL [1]. We use VOXEL to demonstrate how to combine application-provided “insights” with a partially reliable protocol for optimizing video streaming. First, we recognize that some video frames are less important than others, i.e., intelligently dropping specific frames does not degrade visual quality, and thus it does not affect end-users’ QoEs. We rank the individual frames constituting each video segment in terms of their impact or influence on the overall quality of the video, and use this ranking to determinewhen (and where) a reliable delivery is required. To this end, we present a novel ABR algorithm that explicitly trades off losses for improving end-users’ video-watching experiences. This synergy of a video streaming tailored transport, a one time in-depth video analysis and a visual quality aware ABR, results in VOXEL reducing the rebuffering, even in challenging network conditions, in the 90th-percentile, by up to 97%, while providing a visual quality that is at least on-par with state-of-the-art streaming solutions. The rebuffering reduction capabilities of VOXEL were evaluated extensively in a full end-to-end system. We conducted several experiments from emulating a diverse set of network conditions in lab to streaming video over the internet from a datacenter in France. Our results from all experiments show that VOXEL, indeed, is at least on par, but in most cases outperforms the state-of-the-art. To evaluate the objective visual impact of dropping frames, we utilized SSIM for its practicability in terms of its robustness compared to its computational complexity.
Investigators: Mirko Palmer, Qi Guo, and Anja Feldmann, in cooperation with Balakrishnan Chandrasekaran (Vrije Universiteit Amsterdam), Ramesh K. Sitaraman and Kevin Spiteri (UMass Amherst, USA), and Malte Appel (Internet Initiative Japan)
References
• [1] M. Palmer, M. Appel, K. Spiteri, B. Chandrasekaran, A. Feldmann, and R. K. Sitaraman. VOXEL:Cross-layer optimization for video streaming with imperfect transmission. In CoNEXT ’21, 17th International Conference on Emerging Networking Experiments And Technologies, Virtual Event, Germany, 2021, pp. 359–374. ACM.