banner-img

How We Moved from SMS to Voice Messages and What’s Next

Blogs0 min read

In the late 1990s, online communication looked like this: you sat down at your computer, listened to the annoying sound of the modem dialing up, opened ICQ, and typed “Hi! How’s it going?” to someone you’d never met in person. It was a miracle. Text ruled the world.

SMS added mobility. 160 characters, 10 cents per message, and you were no longer tied to a computer. People texted under their desks in class, on the subway, and in lines. Text became ubiquitous.

WhatsApp finally killed SMS in 2009. Free, fast, with group chats and delivery receipts. The world of communication split into “before” and “after”. But text remained text. Without a voice. Without a face. Without emotions, except for those you put into words and the occasional emoji.

Speed vs. Politeness in Voice Messages

Voice was the first to crack the text monopoly. Not calls (they’ve always been there), but specifically voice messages. The feature, which appeared in WhatsApp around 2013, initially sparked a wave of hatred.

“Why send a 30-second voice message when you can write two lines?” “I don’t want to listen, I want to read”, “This is a waste of my time” — comments like these flooded the forums.

But voice messages survived. Why? Because they convey what text cannot: intonation, pauses, sighs, a smile in the voice. You can write “I’m fine” — and that could be true or a lie. But if you say “I’m fine” in a tired voice — the other person will understand everything without words.

Voice messages have won out in situations where speed matters: you’re walking, carrying bags, or holding a child’s hand. Typing is hard; speaking is easy. The trade-off is that they’re inconvenient to listen to during meetings or on public transportation. The conflict between these formats remains unresolved to this day.

Emojis, stickers, and a new visual language

The next step is visual. Emojis have evolved from a Japanese curiosity into a universal language. A smiley face with hearts is understood in Tokyo, Moscow, and Buenos Aires without translation. By 2015, there were over a thousand emojis in Unicode. People began to form sentences and even short stories out of them.

Stickers took it a step further. These are no longer standard icons, but original images with characters. Huge custom sets created by artists. Millennials and Gen Zers exchange stickers more often than text. It’s faster, more fun, and more emotional.

Psychologists have noticed an interesting effect: emojis and stickers have brought nonverbal communication back into written speech. When you use a crying emoji, you’re not just stating a fact (“I’m sad”), you’re showing an emotion. It’s a step back toward pre-text communication — gestures and facial expressions. But in a digital package.

Chatroulettes as the fourth wave of communication

The next logical step is video. Not pre-recorded clips (there were already plenty of those on YouTube), but live video. The person you’re talking to is right here, right now. You see their face, their room, their morning hairstyle. And they see you.

This is the fourth wave. The first was text. The second was voice messages. The third was emojis and stickers. The fourth is videochats. They combine the best of the previous ones: the speed of text (you can type in the chat next to the video), the emotionality of stickers, and the intonation of the voice. Plus, they add facial expressions and eye contact.

There are more and more video services. One example is the website Crush Roulette. You log in, turn on your camera, and the system finds you a conversation partner in a second. No profiles. Just a live face across from you.

What’s important: these services don’t try to replace real-life communication. They create a parallel environment where you can practice conversation without fear of judgment. Or simply to pass the evening chatting with someone from another country. Built-in translation, filters by age and country — these are no longer just gimmicks, but industry standards.

Why have video chats become the fourth wave, rather than just an add-on? Because they solve a problem that other formats don’t: they provide a sense of presence without any obligations. You’re here and now — but if something goes wrong, you click “next” in a second. It doesn’t work that way in real life. In videochat — it does.

What comes after video

If video is the fourth wave, what will be the fifth? There are several candidates.

  1. AR (augmented reality). You’re interacting with a person, surrounded by virtual objects: you’re giving a presentation right in the air, drawing something on a table, changing the background in real time. This already exists in some apps, but it hasn’t gone mainstream yet.
  2. Avatars. You don’t show your face, but your movements are transferred to a 3D model. Just like in the movie “Avatar”, only in real time. This solves the problem of “not wanting to show your face” while preserving body language.
  3. Silent interfaces. Communication without words and even without a face. Conveying mood, energy level, and interest through biometrics. It sounds fantastical, but the first experiments are already underway.

Which of these scenarios will prevail? Most likely, none in its pure form. The evolution of communication isn’t linear. We aren’t abandoning old formats — we’re adding new ones. Text isn’t going anywhere. Voice messages aren’t dead. Emojis haven’t gone out of style. Video chats won’t replace everything else. We just have more tools now.

Three habits that will disappear in 5 years

Which communication formats will die out? Certainly not all of them, but some things will change.

  • Long text exchanges without emojis or voice. The younger generation doesn’t understand why you’d write “I’m so happy for you, that’s great news” when you can send a sticker of a dancing cat or a short voice message saying “Awesome!” Efficiency trumps verbosity.
  • Sending voice messages longer than a minute. People are losing patience. The trend is short voice messages of 10–20 seconds. Anything longer is better recorded as a note and sent as a file so the person can listen to it on fast forward. Or just call.
  • Expecting an instant reply in messaging apps. The paradox: the more communication channels there are, the more tolerant people are of delays. “He read it and didn’t reply” is no longer a disaster. If they reply in an hour, they’re busy. In a day, they forgot. In a week — well, it happens. Speed is no longer the top priority. Sincerity and appropriateness have taken the lead.

In five years, we’ll communicate differently. But not because technology will solve everything for us. But because we ourselves will learn to choose the right tool for the right moment. Text — for facts. Voice — for emotions. Stickers — for quick reactions. Video — for presence. And every format will have its place.

Share on Social Media

Image of Barsha Bhattacharya
Barsha Bhattacharya is a senior content writing executive. As a marketing enthusiast and professional for the past 4 years, writing is new to Barsha. And she is loving every bit of it. Her niches are marketing, lifestyle, wellness, travel and entertainment. Apart from writing, Barsha loves to travel, binge-watch, research conspiracy theories, Instagram and overthink.

Related Blog

Cover Image for 35 Top Ullu Web Series Cast (Actresses and Actors) in 2026

35 Top Ullu Web Series Cast (Actresses and Actors) in 2026

Cover Image for Strengthening Public Safety Through Digital Innovation in New Jersey

Strengthening Public Safety Through Digital Innovation in New Jersey

Cover Image for Where Successful Franchise Investors Find Their Next Opportunity

Where Successful Franchise Investors Find Their Next Opportunity

Cover Image for The Benefits of Choosing to Rent Karaoke Equipment Easily for Corporate Team-Building Events

The Benefits of Choosing to Rent Karaoke Equipment Easily for Corporate Team-Building Events