What Can Hashtags Tell Us?

The use of hashtags (#) in Twitter has become a staple communication element and this has followed in Google+ as well. That the hashtag has become a ubiquitous part of text communications style in social media is proved best perhaps by their now common use by major brands in advertising – instead of a major brand placing a web address in a TV or print ad, they may simply use a hashtag. But more than #Like or #Fail on a brand, what can hashtags really tell us? Turns out, quite a lot.

So we know hashtags are popular to define a location, emotion or subject matter for discussion. But they can provide a lot of “contextual information” beyond basic emotions or brands. As we conduct research across social media channels daily, we’ve gathered a library for the Natural Language Processing element of our software on an international scale. With over 45,000 hashtags, we’ve been able to dig a little deeper and gain some insights. We’ll share some key ones.

The Obvious Hashtags:

Location: As in #Halifax or #LAX or #YYC (Airport codes are quite commonly referenced by Twitter users)

Event: As in #SXSW11 for the South by SouthWest event. An event can be “at the moment” or ongoing/leading up to a specific time such as #Election2012  for the US Election (great guide here to those hashtags), or #TyElec for the recent election in Tunisia.

Brand: Most brands have hashtags associated with them, if the sentiment is negative or positive then it may be accompanied by Fail or Win.

Emotion: A very common one is #Fail and then the likes of #Happy or #Love or #Smile – these vary across the spectrum.

The Deeper Elements & Building Context

But hashtags can give us a far deeper set of insights, including in some instances that a “tweet” or message may carry multiple meanings and have several different target audiences. The use for example of French, Arabic and English in countries experiencing civil unrest are often aimed at foreign governments and news agencies as much as locals.

As an example, during the recent Tunisian election, it was common to include the hashtag “#zaba” which was a reference to the recently ousted president and a reminder to anyone viewing the tweet message that this is why they were having an election and as a badge of support. This is an element of context and goes to the next challenge – analysing hashtags. A single tweet or a group of tweets can start to add a deeper sense of “place” and context by indicating timing, immediate and surrounding locations, how an event is unfolding or provide an indicator of what is about to happen.

The Challenge of Analysing Hashtags

The obvious ones are easy such as #Fail or #Love, but hashtags can evolve and we often see “groupings”. Again to reference our research on Tunisia, the hashtag #Gonhim had 39 variations such as #Ghonim or #Ghanim…each referencing the same extremist preacher. With the Icelandic volcano a couple of years ago, the clever #ashtag was popular, but so was simply #volcano. Hashtags also tend to evolve very rapidly, adding or deleting characters or becoming entirely new ones.

For researchers this presents a number of challenges. For marketers it is fairly easy around brands. For those monitoring or researching civil society issues such as elections, civil actions (e.g. #Occupy or #OWS) or social commentary, it is vastly more complex. Multiple languages may be used, the speed of change and intensity will add another dynamic. Yet they they content a rich source of material and can lead to social media channels where a thread can be followed and deeper insight gained.

Analysis of Foreign Language Usage in Twitter: Tunisia

It’s common knowledge that Twitter today can play a key role in communications and organising for not just marketing, but civil society issues, such as the Arab Spring and Occupy. Most of our research is for multi-national corporations and governments around public policy and diplomacy. As part of this, Twitter is one of many channels that we look at. This means we have to work in multiple languages. And we’d hazard a guess that very few firms deal in langages in social media like we do. With that context set, we’ve found some interesting aspects to the use of Twitter around languages.

It is often assumed that for the country in question, citizens will use mostly if not always, the native language. Turns out this isn’t the case. Languages will be mixed up with up to three languages in a single “tweet” and multiple hashtags. In research into the use of Twitter in Tunisia, Haiti, Sudan and Afghanistan we found that these languages will comprise of; 1) native tongue (Arabic, Kreyol, Tribal), 2) English and 3) the Colonial country’s language such as Spanish or French. In the case of Tunisia (and other former French colony countries) we also found that French words/numbers may be used that “sound” like an Arabic word. Not even Google Translate can manage this level of complexity, especially when you add numerical characters and an ever changing Lexicon of words.

Here’s an example of a tweet in Tunisia around the recent election: “j’ai vote ta7ya tounes #TyElec #vote” Which basically means “I voted in the Tunisia election” and what “ta7ya tounes” translates to is Arabic for “long live Tunisia.” The use of “ta7ya” is French and a number that makes up the sound of an Arabic word. Our study looked at 16,700 distinct users all located in Tunisia. We found that 67% of all tweets contained at least one English word with 14% of tweets being fully English. 92% of the tweets we looked at (over  a 1 month period from February to March 2012) contained a hashtag. Overall French and English dominated the tweets, which is interesting given it is predominantly an Arabic speaking language. We anticipated a high level of French with Tunisia being a former French colony.

The use of hashtags in Twitter for civil society issues has a complex set of communications goals that are used to identify; 1) Event, 2) Location, 3)Timing, 4) Opinion or view. They may also be used to establish political standing, tribal or community standing and a sub-set of events or issues and at times to add another layer of context. An example of an added layer of context is the use of the #Syria and #tugov (means Tunisian government) where the Syrian ambassador to Tunisia was asked to leave. This event took place during the election period and was a subset issue of the national election.

In the cases of Haiti we found little use of Kreyol (at only 12% of all tweets examined) and a much higher use of English (68% of the tweets) than French where we had expected French to be used. The Haitian tweets we looked at were from Haiti and we excluded Haitian diaspora. This is an interesting finding for a country that until the 2010 earthquake used very little English in social media. In Sudan we found that hashtags are often used by people to first identify their tribe or region of Sudan. Arabic and English dominated with very little use of tribal languages.

Conclusion
All of this goes to show that Twitter has become a key communication tool for people around the world. For those that don’t understand Twitter and only see the “silly” tweets, this finding we are releasing and our other research shows that Twitter plays a key role in the voice of civil society today and we suspect it will only increase. For building any analytics with natural language processing, it will be a daunting task and always be limited by the rapid changes in hashtags, short life-spans for some hashtags and their evolving nature in  Twitter communications. We also posit that the dominant use of English and French is that nationals and civil society groups are intending to reach an international audience, including news media.

This issue also adds a layer of complexity for foreign governments with digital diplomacy and public diplomacy programs that use social media. They will need to develop an understanding of the meanings and context of hashtags as they evolve and to understand how words may be played with and what the use of former colonial languages may be signalling, if anything.

Dangerous Assumptions About Social Media in Developing Nations

An assumption we’ve seen made by some large international aid organisations and Western governments about citizen use of social media in developing nations is that their citizens don’t use it. That because official literacy rates are high and broadband Internet access is assumed to only reach the elites, the general population isn’t engaged. This is a very dangerous assumption and one we’ve shown to be wrong on a number of occasions. Here’s our findings;

 1. Underground Internet: In projects with partner research firms and from other independent findings we often uncover a larger than expected “underground Internet” population. These are people who access the Internet from Internet Cafe’s and pirated connections. In countries like Haiti, Iraq and Afghanistan, it is not uncommon for someone in an apartment complex or neighbourhood to buy a DSL or high-bandwidth line and then rent access to others around them either wirelessly or through a hard wired router. Assuming that the non-elite can’t or aren’t buying PC’s is again a misconception. There is access to these machines through black markets and retailers. Granted, they may be Windows 98 running IE4, but they can still access online forums and basic services – enough that people can engage in online dialogue.

2. Assumed Illiteracy: On the ground surveys and “official” reporting by a government may allude to high illiteracy rates. Unfortunately, this is not always the case. Measurement of literacy is questionable. People learn character recognition and find literacy by many different means. Often, programs run by NGO’s, especially religious NGO’s are not counted. Yet they are increasing the literacy rate far faster than might be assumed by a foreign government. In our research projects we found computer literacy rates to sometimes be higher than 20% of what the official government reported. The government of a developing nation reporting low literacy rates helps ensure more aid funding to improve its education programs.

3. The Mobile & Wireless Connection: In many cases, developing nations completely bypass landline infrastructure and go to wireless mobile infrastructure. The systems than get installed range from Edge to G3 networks with many being 3G networks. SmartPhones are affordable, as are data packages. More so than in western nations where data packages are often more costly. Coupled with real literacy rates, the accessibility of mobile devices by those in developing nations translates to quick use of social media apps like Twitter, Facebook and others.

4. The Facebook Delusion: We see this quite often. An assumption by a Western NGO or government that because the population in a developing nation that uses Facebook is primarily elites, that non-elites are not connected to social networks. The reality is that outside of Western and developed nations, Facebook is often not the primary social network. In fact, Facebook will often be far down the list. Those in developing nations and other parts of the world will likely use a social network more integrated with their culture. Like Latin Americans using Orkut ahead of Facebook or Haitians preferring forums over Facebook.

5. False Frame of Reference Assumptions: In developing nations, we primarily use Facebook, LinkedIn, Twitter, Flickr and Blogger or WordPress as the top social media apps to access. It is often assumed that these are the only social media channels available. The truth is that there are literally thousands of other tools out there for blogs, images, videos and microblogs as well as social networks. When an NGO or government agency doesn’t see activity in a quick Facebook search (and by the way, search in Facebook is terrible) they assume there is little to no engagement. It is natural for people to make assumptions based on our known frames of reference.

6. The Unconsidered Digital Diaspora: Almost every developing nation has diaspora; sometimes first generation refugees, and often second, third or fourth generation. Regardless, there are always diaspora connected to their country of origin. These communities often collect information from families and friends living in the home nation and then communicate events and issues via social media platforms. This can be a rich source of information often untapped and unrealised by their host nation.

As a result of these assumptions, larger NGO’s and governments may miss several key opportunities that could help them a) improve aid delivery, b) engage in deeper digital diplomacy and c) understand better the situation on the ground politically and in aid terms. Unfortunately, this gap in understanding can’t be laid at the feet of government centres like Ottawa, Washington or London. Such assumptions may also reside in the central cities where their field headquarters are by staff who may not be as connected to the ground as sometimes is assumed.

We see this as a transitional phase in truly understanding the impact of the social web in the developing and developed world. As many people in government do not use social media tools for more than entertainment and family communications, it is easy to assume that is how others use these tools. These are complex times and the communications dynamic is shifting daily and weekly. A lot has to be learned and just blindly jumping in can also be dangerous.

The Web is a Fractured Place Making Analytics Harder

Worried about snooping marketers watching your every move online? Fearing total loss of privacy through your online activities and the apps you use? Well, the truth is that you have less to fear than you think and marketers and researchers are not as able to snoop as much as they like. There are several reasons why businesses and governments don’t know as much about you as you might think. Conspiracy theories and fears on this are largely irrational and here’s why.

1. The Internet is Fractured: Although it is easy to think of the Web as a single, unified and harmonious system, its actually not. In fact, it’s far from it. There are sites you cannot access because there are country restrictions (e.g. Hulu in the USA only.) Governments may restrict certain sites (e.g. the China firewall) or forms of content and access to certain apps. Some areas of the Internet, known as the Deep Web, require manual passwords and access, where no search bots or social media analysis tools go. Then there are the increasing number of apps accessed through SmartPhones and tablet devices, while these apps use the Internet as their backbone, they may be self-contained and the data they feed to third parties is limited or in “bulk” so personalization is limited. These new “silos” are growing ever more popular for games or productivity tools.

2. Social Media Analytics and Monitoring  Software is Terrible: Some tools, like Radian6 or Visible Technologies are reasonably good. But they are the first generation of these monitoring tools. They face a number of challenges, not least of which is their limitations on number of social media apps they can follow and then how much “data” they can suck out of those channels (i.e. the Twitter firehose.) They often have limited filters are restricted to keyword searches and sentiment analysis is based on keywords, not contextual understanding of phrases and sentences. They deliver some nice dashboards, but that is just more information, not intelligence. Most have now implement “engagement tools” into their software, downplaying the analytics and focusing on engagement with people. But they are limited.

3. Web Analytics Hasn’t Evolved Much: There are many more Web analytics tools than social media monitoring (or reputation management) apps out there. Best known of course is GoogleAnalytics followed by WebTrends. They are fairly good, but really only tell the story in aggregate. Issues such as ISP’s who use bulk assigned IP addresses don’t tell you much. You’ll never know exactly “who” and “why” someone is on your website. They fail on “intent” and “context”. They help, but not as much as some might hope for or consumers fear. They can add good value to SEO and advertising, but they aren’t a silver bullet. Here’s a good blog post on the WAO/Factor about analytics we tend to agree with.

4. Market Confusion: Then there’s the human factor of a capitalist free market; there are many competitors and conferences, it’s hard to know which really is the best tool. As a result, marketers, analysts, researchers and so on, are left finding a tool/service that generally fits their business. And then there are still many traditional industries and sectors that pay scant attention to the available information online – because there really is no clear route. Competition is great, it drives innovation, but it can also lead to greater confusion.

So then, if the Web is so fractured and analytics tools not so good, what are the options? We’ve found it to be having a realistic approach to what analytics tools can actually accomplish and combining that with humans. It’s why we’ve always employed the hybrid approach and always will. We have our own software, which helps, but the human element is still needed. Humans understand humans far better than machines probably ever will. Right now and likely for a long time yet, we’re adding more digital content than anyone or any software can possibly analyse effectively.

Language Use in Digital Diplomacy Via Social Media

In our research projects in the use of social media in Public Diplomacy and Digital Diplomacy, we’ve noted some interesting aspects around what language people will use in their primary communications. This is important, as what language is being used in a social media channel can be a prime indicator of “who” a message or communication is aimed at. For example, with the Syria crisis ongoing an in the Egypt crisis of 2011, we would see abrupt changes in the primary language used, especially in video content, between Arabic and English. In 87% of the videos analysed on video channels such as YouTube and Vimeo relating to the Syria crisis, English was used, especially in narrated videos. If the intent of the authors was to reach an Arabic audience, they would use Arabic, but instead used English. A similar pattern evolved with the Egypt crisis of 2o11. We’ve noted similar patterns in the Sudan and Haiti.

English is the primary language used online and certainly the main language an organisation would use to gain the attention of western news media and governments. When tagging videos, blog posts or images and using hashtags on Twitter, these are predominantly English. Also keeping in mind that the top social media channels such as Twitter, blogging platforms or YouTube are Western tools delivered mostly in English.

Digital diplomacy is not just the bane of governments, it is a powerful soft power tool used by well organised non-state actors and ad-hoc groups to gain attention from not just western governments and news media, but from the general population and perhaps diaspora communities where the originating native tongue is not spoken as much; such as with third generation diaspora. Understanding language usage can be an important element of defining primary and secondary messages to various audiences. As more governments and state/non-state actors engage in these back-channel public diplomacy tactics, new subtleties and dynamics will begin to emerge in the world of digital diplomacy.