Why Messenger, Whatsapp video calls have limits on no. of participants !!??

Ever wondered why do giant social networks like Whatsapp, Duo etc put a limit on the maximum number of people that can join a single video/audio call? In this article, I am gonna explain it in a very simplified manner.
More or less, most of the social media platforms use webRTC library provided by Google to ease the development as well as reduce cost when it comes providing a seamless video/audio chat experience to its clients. Don’t get scared by the term webRTC now 😬. While working with a similar product (Check it on : http://sync.myorg360.com/), I came to know about the challenges that you need to tackle while developing a platform like Zoom, Cisco webex, Microsoft teams etc which have the capability of handling around 1000 participants in a single session. How do they do it, while others like Whatsapp, Duo don’t ?? Well, without wasting much time, let’s dive into the details straightaway.
Let me give you a pictorial representation of different types of architecture which one can adopt while developing a video conferencing software. Typically, there are broadly used architecture present, namely p2p (peer-to-peer), SFU (Selective Forwarding Unit), MCU (Multipoint Conferencing Unit). Each type has their own advantages and limitations. Before jumping into the detailed description, let’s see visual representations of these first.

Each colorful bubbly figure represents a client, i.e be it our phone or browser etc. In the above architecture, there are 5 clients each connected to each other in a peer-to-peer fashion. The incoming/outgoing arrows represent the channels/connections which will allow flow of media streams (audio,video etc). As you can see, in p2p architecture, for n number of participants, the total number of channels (both in and out) is equal to n*(n-1), which is proportional to n². Also the bandwidth consumption is more (in simple terms, your data will be consumed very fast, say 1Mbps 😛), as we need to transfer the same media (say our voice and video) to other clients repeatedly.

Well, this figure represents SFU architecture, where the black box is simply a server (media server to be very specific), whose task is to take incoming streams from all clients and simply pass it to others via separate outgoing channels for each client. In this architecture also, total number connections is proportional to n², but just have look at the bubbly shaped colorful clients. Here each client needs to manage only n channels, while in p2p it’s always more than that. So definitely the bandwidth consumption will be less in this case. But nothing comes for free right 🌝. You need to manage media servers (those black boxes) and believe me, prices of such servers will cost you in lakhs and lakhs per month 🌚. But when it comes to supporting more number of participants in a video call, one will definitely prefer SFU over p2p.

This figure represents a MCU architecure, in which the same black box server lies in the center. The difference of MCU from SFU is that, in MCU the server (black box) receives streams from the clients. blends them together into a static layout consisting all the streams, and sends a copy to each client. So in this architecture, you don’t have the flexibility in terms of dynamic layout during a call, opt specific resolutions stream, dynamically turn off a particular streams etc. But yes, it consumes very little bandwidth as compared to SFU and p2p. But in SFU, you get the flexibility to changing the layout (I am talking about the screen and frame size you see during a video call) dynamically, opt for specific streams, turn off particular incoming audio and video etc. Also in SFU, e2e encyption can be guaranteed as it does not do anything with the streams unlike MCU and acts as a simple pipe say.
Okay. So much of theory. You might be thinking,” Where’s the answer to the question?”. Well messengers, whatsapp etc use p2p architecture, because webRTC was developed by Google keeping p2p communications in mind and IT IS OPENSOURCE. Yes completely free. So no additional cost for those black boxes. Now comes the answer. Yet, due to the limitations discussed above (if you directly jumped to the answer then please read the above paragraphs first :D )in p2p connections, they are bound to put a limit on maximum number of participants that can participate in a video call. Else the end-user experience will obviously be very bad as data consumption, app crashes, RAM consumption will increase exponentially.
But if you are planning to develop a bigger platform like Zoom, Google Meet, Teams etc, obviously you will be needing either SFU or MCU based upon your requirements and other factors.
Well, that’s it for now. If you liked this article, then don’t forget to give claps (You can give 50 claps too 🙈) and share it with others. If you have any comments, points or new concepts, let me know in the comments :).
