Analysis of Foreign Language Usage in Twitter: Tunisia

It’s common knowledge that Twitter today can play a key role in communications and organising for not just marketing, but civil society issues, such as the Arab Spring and Occupy. Most of our research is for multi-national corporations and governments around public policy and diplomacy. As part of this, Twitter is one of many channels that we look at. This means we have to work in multiple languages. And we’d hazard a guess that very few firms deal in langages in social media like we do. With that context set, we’ve found some interesting aspects to the use of Twitter around languages.

It is often assumed that for the country in question, citizens will use mostly if not always, the native language. Turns out this isn’t the case. Languages will be mixed up with up to three languages in a single “tweet” and multiple hashtags. In research into the use of Twitter in Tunisia, Haiti, Sudan and Afghanistan we found that these languages will comprise of; 1) native tongue (Arabic, Kreyol, Tribal), 2) English and 3) the Colonial country’s language such as Spanish or French. In the case of Tunisia (and other former French colony countries) we also found that French words/numbers may be used that “sound” like an Arabic word. Not even Google Translate can manage this level of complexity, especially when you add numerical characters and an ever changing Lexicon of words.

Here’s an example of a tweet in Tunisia around the recent election: “j’ai vote ta7ya tounes #TyElec #vote” Which basically means “I voted in the Tunisia election” and what “ta7ya tounes” translates to is Arabic for “long live Tunisia.” The use of “ta7ya” is French and a number that makes up the sound of an Arabic word. Our study looked at 16,700 distinct users all located in Tunisia. We found that 67% of all tweets contained at least one English word with 14% of tweets being fully English. 92% of the tweets we looked at (over ¬†a 1 month period from February to March 2012) contained a hashtag. Overall French and English dominated the tweets, which is interesting given it is predominantly an Arabic speaking language. We anticipated a high level of French with Tunisia being a former French colony.

The use of hashtags in Twitter for civil society issues has a complex set of communications goals that are used to identify; 1) Event, 2) Location, 3)Timing, 4) Opinion or view. They may also be used to establish political standing, tribal or community standing and a sub-set of events or issues and at times to add another layer of context. An example of an added layer of context is the use of the #Syria and #tugov (means Tunisian government) where the Syrian ambassador to Tunisia was asked to leave. This event took place during the election period and was a subset issue of the national election.

In the cases of Haiti we found little use of Kreyol (at only 12% of all tweets examined) and a much higher use of English (68% of the tweets) than French where we had expected French to be used. The Haitian tweets we looked at were from Haiti and we excluded Haitian diaspora. This is an interesting finding for a country that until the 2010 earthquake used very little English in social media. In Sudan we found that hashtags are often used by people to first identify their tribe or region of Sudan. Arabic and English dominated with very little use of tribal languages.

All of this goes to show that Twitter has become a key communication tool for people around the world. For those that don’t understand Twitter and only see the “silly” tweets, this finding we are releasing and our other research shows that Twitter plays a key role in the voice of civil society today and we suspect it will only increase. For building any analytics with natural language processing, it will be a daunting task and always be limited by the rapid changes in hashtags, short life-spans for some hashtags and their evolving nature in ¬†Twitter communications. We also posit that the dominant use of English and French is that nationals and civil society groups are intending to reach an international audience, including news media.

This issue also adds a layer of complexity for foreign governments with digital diplomacy and public diplomacy programs that use social media. They will need to develop an understanding of the meanings and context of hashtags as they evolve and to understand how words may be played with and what the use of former colonial languages may be signalling, if anything.