What is video-conferencing?
The goal of video-conferencing is that two or more people can have a real-time communication. This communication may include video, audio, chat and file- and screen sharing which is normally done over IP networks. Basically, it is rather simple to generate a basic video conferencing software. The implementation is based on audio/video device interfaces which connect two clients of a network.
Despite the assumption that all video-conferencing providers use the same technology, it has to be highlighted that this is certainly not true.
eyeson is based on principles of WebRTC (Web Real-Time Communication) standards. WebRTC is a free, open source project which provides real-time communications in web browsers and mobile applications via simple APIs (application programming interfaces). Most commonly used browsers like Google Chrome, Mozilla Firefox or even Apples Safari support WebRTC. In comparison, Skype or Zoom are NOT capable of providing WebRTC. This means native programs have to be installed before video calls can be started.
When it comes to video-conferencing, scalability is a decisive factor since it is responsible for keeping an excellent video and audio quality for the connected clients while the number of video call participants rises. The only technology which guarantees this effect is the MCU (multipoint control unit) topology. However, it is the least used technology due to its complexity and needed expertise in this certain field of development. Further, its operational costs are quite high as well.
Basically, three technologies are dominating the video-conferencing market right now which are the following:
- Peer-To-Peer (Mesh)
- Selective Forwarding Unit (SFU)
- Multipoint Control Unit (MCU)
The lowest cost solution for video-conferencing is mesh. For this technology, you do not need any intermediate infrastructure (except for call-signaling) like a media-server, cloud, etc. as each client sends and receives all streams to and from all the other video call participants. It is easy to implement, though, it reveals major issues in terms of scalability when more than 2 people enter the video call. Skype and Facetime are using this technology always while Google Hangouts is using it when solely 2 people are in the call.
However, what happens when the number of video call participants increases in a mesh-based technology?
Well, now the problem starts. The bandwidth is constantly rising since the number of video and audio streams is increasing as well. The processing capacity like the CPU or the limited network is overwhelmed and as a result, the video-conference quality is suffering tremendously.
Choppy sounds, freezing pictures and dropped video calls are the results and they get even worse with every additional participant. These issues can be resolved using either the SFU or the MCU technology.
Selective Forwarding Unit (SFU)
This technology is based on a central unit, which receives all streams from the clients. These clients need then to decide which streams they want to receive from the other video call participants. This means that the higher the number of participants the more streams have to be downloaded which increases the bandwidth. So the SFU does a selective forwarding but no audio/video processing. Note: the prerequisite is that the client has the full correspondence of the SFU media server.
So what are the advantages of the SFU compared to the P2P?
The streams of all participants are received by one central unit which then selectively forwards them to all participants. The latency is minimal, though, the additional steps of encoding and decoding are slowing down the process in case a high number of participants take part in the group video call.
However, it is the most popular technology of the WebRTC communities. Google Hangouts is using this technology if more than 2 people enter the call. In comparison, eyeson uses the SFU technology if only 2 people are in the call. In case that a third is entering the call, it switches automatically to the MCU.
Is there a solution for group video calls with a high number of participants? Indeed, there is!
Multipoint Control Unit (MCU)
The MCU is responsible that the streams from all participants get mixed. So the MCU encodes, decodes, mixes up all the streams into one at a single time and sends it out to all participants. Therefore each client has only one common stream which reduces significantly the bandwidth usage.
An unlimited number of people can join the group video call without reducing the decisive audio and video quality for the connected clients. Moreover, it improves the whole video call quality since additional central processing like noise filtering, echo-reduces, image processing etc. can be done very easily as well.
Further, it reduces the bandwidth consumption and keeps it stable and low. eyeson is using this technology if more than 2 people are in the video call. So eyeson provides its customers with the SFU topology for max 2 people and then switches automatically to the MCU topology if additional participants join the call.
But why are Skype, Facetime, Hangouts or Zoom not using the MCU technology as well?
Although scalability is the most important part when it comes to video conferencing, the MCU topology needs a lot of expertise, computing resources on the server and further, it is considerably complex. However, it clearly beats up the mesh and SFU technology in case that more than two people have a video call.