Best Twitter marketing

Adwords management for third-parties in SEM: An optimisation model and the potential of Twitter Predicting the Political Alignment of Twitter Users
Dr.HannaNgman Profile Pic
Dr.HannaNgman,Mauritius,Researcher
Published Date:23-12-2017
Your Website URL(Optional)
Comment
TopicandSentimentAnalysisonOSNs: aCaseStudyofAdvertisingStrategiesonTwitter Shana Dacres, Hamed Haddadi, Matthew Purver Cognitive Science Research Group School of Electronic Engineering and Computer Science Queen Mary University of London, UK firstname.lastnameeecs.qmul.ac.uk ABSTRACT Online Social Networks (OSNs) such as Facebook, Twitter, and YouTube have emerged as highly engag- Social media have substantially altered the way brands and ing marketing and in uence tools, increasingly used by businesses advertise: Online Social Networks provide brands advertisers to promote brand awareness and catalyze with more versatile and dynamic channels for advertisement word-of-mouth marketing. Researchers have also long than traditional media (e.g., TV and radio). Levels of en- recognised the e ectiveness of OSNs as a rich source gagement in such media are usually measured in terms of for understanding the spread of information about the content adoption (e.g., likes and retweets) and sentiment, real world 20. For example, Asur et al. 1 analyzed around a given topic. However, sentiment analysis and topic Twitter messages (tweets) to predict box-oce ratings identification are both non-trivial tasks. for newly released movies. Their ndings shows that In this paper, using data collected from Twitter as a case OSNs can be used to make quantitative predictions that study, we analyze how engagement and sentiment in pro- outperform those of markets forecasts, by focusing on moted content spread over a 10-day period. We find that the sentiment expressed in the tweets. Brands also now promoted tweets lead to higher positive sentiment than pro- recognise the potential of OSNs for gathering market in- moted trends; although promoted trends pay off in response telligence and insight. In 2012, Twitter announced that volume. We observe that levels of engagement for the brand 1 79% of people follow brands to get exclusive content. and promoted content are highest on the first day of the This provides the opportunity for brands to participate campaign, and fall considerably thereafter. However, we in real-time conversations to listen to and engage users, show that these insights depend on the use of robust ma- respond to complaints and feedback, drive consumer ac- chine learning and natural language processing techniques tion and broadcast content. Understanding the real en- to gather focused, relevant datasets, and to accurately gauge gagement of the end users with the brands and their sentiment, rather than relying on the simple keyword- or OSN presence has given rise to a number of data ana- frequency-based metrics sometimes used in social media re- lytics, sentiment analysis and social media optimisation search. startups and academic research projects. However, the techniques required pose a number of challenges and Categories and Subject Descriptors pitfalls often ignored by researchers and analysts, and H.3.1 Information Storage and Retrieval: Con- adopting a particular method naively can lead to prob- tent Analysis and IndexingLinguisticprocessing; I.2.7 lems. Signi cant progress in Natural Language Process- Arti cial Intelligence: Natural Language Process- ing (NLP) and Machine Learning (ML) has produced ingText analysis models for topic modelling designed for social media 23, and high accuracies in sentiment detection (e.g. 26), even with the possibility of detecting sarcasm 11; General Terms but care still needs to be taken while using and relying Algorithms, Measurement on the relevant tools and techniques straight out of the box 8. In this study we present a focused case study by ex- Keywords amining the content and volume of users' brand engage- Sentiment analysis, Topic analysis, Social Media, On- ment on OSNs to determine the e ect of choice of pro- line Advertising, User Engagement 1 http://advertising.twitter.com/2012/05/ 1. INTRODUCTION twitter4brands-event-in-nyc.html 1 arXiv:1312.6635v1 cs.SI 23 Dec 20132 motion channel on a brand's in uence. We do this by a brand's in uence on their social audience (although it analysing the engagement level of Twitter users, their has been noted that brand names are more important adoption of brand hashtags, and the sentiment they ex- online for some categories 6). Cheung et al. 5 ex- press, to determine the similarities and di erences be- amined the way information spreads di erently within tween two separate advertising strategies on this net- social networks as opposed to word-of-mouth (WOM) work: promoted tweets, and promoted trends. We pose broadcasting, by focusing on electronic word-of-mouth a number of questions regarding brands and advertis- (eWOM), showing comprehensiveness and relevance to ing on OSNs: How does the sentiment for a promotion be the key in uences of information adoption. The clos- strategy spread over time? What are the engagement est work to ours in understanding brands on Twitter is levels for each day of promotion? What is the engage- the study by Jansen et al. 10, who found that 20% of ment level (e.g. retweets and mentions) for promoted tweets that mentioned a brand expressed a sentiment or brands and how do these a ect the sentiments expressed opinion concerning that company, product or service. towards a brand? Here, we examine and compare such mentions and sen- In order to answer these questions, we use Twitter's timents across di erent promotion strategies available Streaming API service to collect engaged users' pro les to brands on Twitter, thus speci cally investigating ad- 3 and tweets in regards to promoted in uences (tweets vertising e ectiveness (see Section 3). and trends) over a busy 10 day shopping period for a se- In a study on the spread of hashtags within Twitter, lection of brands across di erent industries. We observe Romero et al. 24 used over 3 billion tweets 2009-2010 the need to accurately lter the resulting tweets for to analyze sources of variation in how the most widely topical relevance, and compare simple keyword-based used hashtags spread within its user population. Their methods with a discrimative Machine Learning (ML) results suggested that the mechanism that controls the approach. We then classify the tweets by sentiment spread of hashtags related to sports or politics tends (positive, negative or neutral), and again compare a to be more persistent than average; repeated exposure range of existing methods and tools. We then use this to users who use these hashtags a ects the probabil- data to establish the driving factors behind the success ity that a person will eventually use the hashtag more of promoted in uences and di erences between adver- positively than average. However, they only examined tising strategies. For both tasks, the choice of classi ca- hashtags that succeeded in reaching a large number of tion method makes a signi cant di erence, highlighting users. In regards to the focus of promoted in uences the care that must be taken when choosing techniques within Twitter, this raises the question; what distin- for this kind of analysis. guishes a promoted item that spreads widely, possibly The rest of the paper is organized as follows: In Sec- with positive sentiment, from one that fails to attract tion 2 we present the some recent related studies. In attention or is associated with mainly negative senti- Section 3 we describe our case study, dataset and its ment? Our study aims to answer this by examining the characteristics. In Section 4 we brie y discuss our sen- sentiment and spread of tweets in relation to brands' timent analysis and text classi cation methodology and promoted items. the challenges which only become apparent upon thor- Analysis Methods ough manual inspection of the data. Section 5 presents our results and the insights gained from our analysis. Sentiment analysis has been approached across many We conclude the paper and present potential future di- domains, including products, movie reviews and news- rections in sentiment and content analysis in Section 6. paper articles as well as social media (see e.g 18 for a comprehensive overview). Typically, the methods em- ployed depend either on existing language resources (e.g. 2. RELATED WORK sentiment dictionaries or ontologies) or on machine learn- ing from annotated datasets. The former can provide Influence on OSNs deep insight, but are somewhat in exible in the face of Our primary interest in this work is in understanding the non-standard and rapidly changing language used the factors which govern the e ectiveness and in uence on OSNs, for which few suitable linguistic resources cur- of campaigns on OSNs. Several recent studies have ex- rently exist. The latter are more scalable and can be amined individuals' in uence on OSNs 3, and the e ec- trained on relevant data (e.g. 14), but generally de- tiveness of online advertising 4, 2, but little attention pend on large amounts of manual annotation (expen- has been paid to identifying the driving factors behind sive and often problematic in terms of accuracy) and in 2 some cases the existence of grammatical resources for In this work, engagement is de ned as adoption of the con- tent by e.g., replying to a tweet, mentioning the brand name, the language and text domain in question (e.g. 26). or including the hashtag in a tweet. We are not able to mea- 3 sure external engagement such as sharing content on other Data availability limits us to e ects within OSNs; we can- OSNs, or clicking on the links in the tweet. not determine e ects on actual clicks or sales. 2Industry Promotion type Brand However, some approaches leverage the existence of im- Promoted tweet International CES plicit labelling in the datasets available (distant supervi- Electronics Promoted tweet SONY sion), to avoid the necessity for manual annotation: for Promoted trend Nintendo UK example, user ratings provided with movie or product reviews 19, 4); or author conventions such as emoti- Travel Promoted tweet Marriot cons and hashtags on OSNs 7, 17, 21). Hybrid ap- Entertainment Promoted tweet BBC One proaches also exist, e.g. the use of prede ned sentiment Automobile Promoted trend Vauxhall dictionaries with weights learned from data (e.g. 27). Heath Care Promoted tweet Paints like Me Identifying the topic of text has also received much Promoted trend ASOS attention in NLP research, with methods ranging from Retail Promoted trend PespiMax the use of existing topic resources or ontologies (e.g. 12) Promoted tweet JRebel to unsupervised models for discovery of topics (e.g. 23). Telecomms Promoted trend O2 Network The use of machine learning to detect the relevance (or Table 1: Industry sectors and sample brands otherwise) of text to a known topic also has a long his- tory, perhaps most well-known in the form of Na ve brands with an active advertising campaign during our Bayes ltering for spam ltering 25. study period, across di erent industry domains, rang- However, research into OSN behaviour or in uence ing from entertainment to health-care. For each pro- sometimes ignores the spread of sophisticated methods moted item, the brand names was used to crawl Twit- available. Sentiment analysis is often performed based ter for tweet data posted in English for a 10 day pe- on de ned dictionaries (e.g. 28), and topic identi ca- riod. If the promoted item also included a hashtag, tion is often ignored, with datasets ltered purely on the hashtag was also included in the parameters of the keywords or simple Boolean queries. Recently, Goncalveset crawl's GET function. This included all tweets that al. 8 examined the di erence in performance across contained keywords such as BrandName, BrandName, various sentiment analysis approaches on online text, BrandName,PromotedHashtag and other brand related nding signi cant variations. The e ect of these varia- terms. These parameter values were selected to keep the tions in a speci c analysis problem is less clear, though: dataset both relevant to brand-related tweets, and also how much does the variation in sophistication (and ac- manageable for searching purposes. Followers and fol- curacy) of these methods actually matter? 22 com- lowing information was also tracked on a daily basis for pared statistical and lexicon-based methods and found each brand. signi cant di erences at the level of individual mes- Details of the selected brands and their promoted sages, although a correlation at the level of their in- type are provided in Table 1. Given that we were in- tended analysis (user pro les). Here, we investigate terested in promoted items for branding purposes, a the e ect when considering individual advertising cam- range of di erent brands from di erent industries were paigns (promoted items). For text relevance, we com- selected. The aim was to include both major, and small pare the use of keywords to Na ve Bayes classi cation brands when selecting promoted items. In addition, a via Weka 9. For sentiment analysis, we examine three major brand and a small brand enable a comparison of existing and freely available tools: the widely-used Data sentiment while weakly controlling for follower count. 4 Science Toolkit's text2sentiment based on a senti- ment lexicon 16; the lexicon-based but data-driven Dataset hybrid SentiStrength 27; and a statistical machine- We identi ed di erent industries' promoted items for 10 5 learning-based approach, Chatterbox's Sentimental (see 21). th th day periods between 17 December 2012 and 7 Jan- uary 2013. We used non-parallel crawling periods in 3. DATA COLLECTION order to avoid the query limits set by the Twitter API. We set up a crawler to use the Twitter Streaming In total, around 180,000 individual tweets were col- 6 API to collect the tweets of interest and all associated lected by crawling Twitter continuously, excluding De- metadata (e.g., ID, username, user's social graph), with st cember 21 2012 when there was a 6 hour outage in the details stored in a MySQL database. In this section we crawler API. The crawler collected tweets from around brie y describe our dataset and data collection strategy. 120,000 di erent Twitter users engaged in spreading the promoted tweets and trends. Tweets across all topics Identifying promoted brands and with no geographical limits were gathered, as long Twitter distinguishes promoted tweets and trends by as they featured the brand's name/hashtag. When a the use of a Promoted tag. We collected tweets from 11 brand contained more than one directly relevant hash- 4 tag, e.g., Coke and CocaCola, we included all the http://www.datasciencetoolkit.org/ 5 relevant hashtags. http://sentimental.co/ 6 https://dev.twitter.com/docs/streaming-apis Twitter users do often repeat their tweets to bene t 3from repeated exposure. However, in order to remove problem and investigated various supervised machine noise and bias in analysis caused by spam tweets, we learning approaches using the Weka toolkit 9. First, removed users who had posted the exact same tweet we performed a pilot study over a 200-tweet develop- more than 20 times during our measurement periods, ment set to determine a suitable feature representation along with their tweets. Twitter users, tweets and tweet and classi cation method; the data was manually la- timestamps were also cross-analysed to check for spam- belled as O2-related or otherwise to give a binary de- ming accounts. In one case a single user was removed cision problem. We tested a variety of classi ers in- for adding over 8,000 spam tweets to the database. Af- cluding Naive Bayes, Naive Bayes Multinomial, ID3, ter manual inspection of many tweets and accounts, we IBK and J48 decision trees; features were based on the are con dent that nearly all spam has been removed tweet text using a standard bag-of-words representa- 7 from our dataset. tion (see e.g. 13) with various scaling methods, with the addition of user ID and date of tweet. Given the small size of the dataset, we restricted the feature space 4. TEXT PROCESSING & CLASSIFICATION to be based on the most common 100 words. We also In this section we present the details of our tweet tested using a simple manual keyword-based lter to re- classi cation (using ML) and sentiment analysis (using move some common negative instances (using keywords existing NLP tools). arena, academy, etc) before training (see \manually l- tered" results in the gures). Tests were performed us- 4.1 Topic Classification ing ten-fold cross-validation in order to simulate perfor- One of the major challenges during cleaning the dataset mance on unseen data. Best performance (overall ac- and removing spam was ensuring topic relevance. Our curacy) was obtained using only bag-of-words text fea- expectation was that this would not be an issue: as in tures, with stopwords removed and a TF-IDF weight- much previous work, our study is looking at all sen- ing, after manual ltering. The best performing classi- timent expressed towards the brands, as long as the ers in cross-validation were J48 and Naive Bayes (NB), tweet matched the parameters of the tweet selection with 71% and 91% accuracy respectively. We then com- as explained in Section 3. However, whilst sampling pared their performance on a held-out test set: the NB tweets for spammers, a general problem surfaced. We model outperformed the J48 model with 84% accuracy found that a keyword-based approach tends to be too compared to 71% for J48, with training and prediction broad to accurately identify tweets referring to a par- also noticeably faster for NB (the tree structure of the ticular brand, O2 (a UK mobile telecommunications J48 model made it very slow with larger training sets). provider and network). Our parameters for collect- ing tweets for this brand were to match tweets con- 100% taining O2WhatWouldYouDo and O2 (the hashtag being 95% promoted was O2WhatWouldYouDo and O2 is the of- 90% 85% cial brand Twitter handle). Over the 10 day period, Trainingset 80% 90,000 tweets were collected that matched these key- 75% Testset5manually words. However, examining a random sample of 200 70% filtered tweets from this dataset showed that over 70% were not 65% Testset5Nonfiltered 60% referring to the O2 Network brand; many were referring 55% to the \O2 Academy" (a chain of concert venues), the 50% \O2 Arena" (a dome-shaped monstrosity in London), 200 400 600 800 1000 2000 Set)size) or other senses of `O2' such as oxygen. We also noticed that Twitter users have recently established a new way of using the letter sequence `O2' as a replacement for Figure 1: NB accuracy with increasing training the letters `to': e.g. \CokeWave Thang What Picture data. You Want Me O2 Put As My BackGround", \what im To determine a suitable training set size, we then goin o2 do o2day". Experiments with boolean com- varied the training set while testing performance on a binations of O2 with other keywords were not success- held-out test dataset of 30 manually labelled tweets. ful. A major challenge therefore becomes to lter out Increasing training set size improved performance (see non-brand-related tweets automatically: the problem is Figure 1): we tested up to a 2,000-tweet training set; not trivial, given the variability and unpredictability of while the curve suggests performance may improve be- language, vocabulary and spelling on Twitter, and the yond this point, the accuracy on the held-out test set is short length of tweets (up to 140 characters); and man- approaching that on the training set so large improve- ual removal of approximately 70% of large datasets is 7 prohibitively labour-intensive. We used Weka's StringToWordVector lter for text feature We therefore approached this as a text classi cation extraction and scaling. 4 Percentage)between lexicon-based and machine learning-based ap- 90% proaches. We examined existing tools for Twitter sen- 80% timent analysis using both of these approaches in order 70% to determine the most suitable for our data. As a baseline lexicon-based tool we used the freely 60% 8 available Data Science Toolkit . The sentiment anal- 50% Accuracy yser is based on a sentiment lexicon 16; we therefore Recall anticipate its coverage to be low but take it to be repre- 40% Precision sentative of commonly-used lexicon-based approaches. 30% For a more robust tool for comparison, we exam- ined two alternatives. As a hybrid lexicon/machine- 20% learning tool we chose SentiStrength 27. This method 10% uses a predetermined list of words commonly associated 0% with negative or positive sentiment, which are given Manuallyfiltered Nonfiltered an empirically determined weight (learned from data); new texts are classi ed by summing the weights of the Figure 2: Classi cation results using Naive words they contain. Thelwall et al. 27 report accu- racy on Twitter data of 63.7% for positive sentiment Bayes. and 67.8% for negative when predicting ratings on a 1-5 scale, and accuracies near 95% when predicting a 90" simple binary positive/negative label. However, even On:topic"(O2:related)" 80" though their word lists and weightings are determined Off:topic"(non"O2: 70" for OSN data (including Twitter), this approach may related)" su er when faced with social text with new words, un- 60" Correctly"classified" expected spellings and context-dependent language and 50" meaning (see 15). Incorrectly"classified" For a purely ML-based option we used Chatterbox's 40" 9 True"posiEve"(on:topic"&" Sentimental API, based on statistical machine learning correct)" 30" over large, distantly labelled datasets 21. This data- False"negaEve"(on:topic" based approach means it might be expected to handle 20" &"incorrect)" slang, errorful or abbreviated text better. Purver & False"posiEve"(off:topic"&" 10" incorrect)" Battersby 21 report accuracies approaching 80% us- True"negaEve"(off:topic"&" ing a similar technique on smaller datasets; Chatterbox 0" correct)" Manually"filtered" Non"filtered" 10 report 83.4% accuracy in an independent study. Before applying the sentiment analysis tool, and in order to compare the two approaches, a few hundred Figure 3: Classi cation details per class using random tweets were selected from the database and Naive Bayes. were read and manually labelled for positive or nega- tive sentiment, and both tools were tested on the re- ments are unlikely. The NB classi er trained on 2,000 sulting set. Results showed accuracy below 50% for tweets was therefore used for the experiments below. the lexicon-based Data Science Toolkit, 63% for the hy- Figure 2 shows results when tested on a larger, unseen, brid SentiStrength approach, and 84% for the ML-based randomly selected test set of 100 tweets; the version Chatterbox approach. Error analysis showed one sig- with manual ltering achieves 78% accuracy, 77% recall ni cant source of the latter di erence to be sentiment and 66% precision. Figure 3 gives details of the per-class expressed in hashtags (e.g. the negative shambles), predictions: without manual ltering, false positives are which were detected better by the ML-based approach, more common than false negatives (i.e. too much irrel- presumaby due to their absence from SentiStrength's evant data is slipping through); levels are much closer predetermined lexicon. We therefore use Chatterbox as with ltering. the \robust" tool in our experiments below, and com- 4.2 Sentiment Analysis 8 http://www.datasciencetoolkit.org/ Having identi ed tweets with relevant content, we 9 http://mashape.com/sentimental/ now required a method for sentiment analysis deter- sentiment-analysis-for-social-media mining the positive or negative stance of the writer. As 10 See http://content.chatterbox.co/Sentiment% discussed in Section 2 above, many methods for senti- 20Analysis%20Case%20Study%20-%20Chatterbox%20and% ment detection exist, with the major distinction being 20IDL.pdf. 5had an average of only 15% mentions per day. This in- dicates that for a brand to successfully engage users in 400   Interna2o the content of the promoted item, a promoted tweet is nal  CES   350   better for this purpose. For example, out of the O2 Net- SONY  UK   300   work's30,000 tweets, 7,965 included an `'mention to Marrio?   the brand (25%). 250   UK   Results con rmed that the greatest percentage of en- 200   BBC  ONE   gagement for a brand's promoted item takes place on 150   the rst day of promotion. On average, 24% of en- Nintendo   100   gagements around the promoted item take place on the JREBEL   rst day. The e ect is most pronounced for promoted 50   trends, with 34% of engagement on average on the rst Pa2ents   0   Like  Me   day of promotion, after which the engagement falls dra- 1   2   3   4   5   6   7   8   9   10   matically by an average of 25% to 9% by day two and Day   continues to fall thereafter, even if the item is promoted for several days. For promoted tweets, the e ect is less Figure 4: Distribution of promoted tweets vol- pronounced: 19% of the engagement takes place on the rst day of promotion, with engagement decreasing by umes over time. 8% by the second day of promotion. However, it does not continue on a steady decline thereafter, but it rises and falls over the next 8 days, although never again 7000   Nivea   reaching the peak of the rst day of promotion. This 6000   could be due to the fact that a promoted tweet is usually o2   promoted for several days on Twitter where it occasion- Network   5000   ASOS   ally appears at the top of di erent user's timeline were 4000   users are repeatedly exposed to the item. This nd- Vauxhall   ing can be said to conform to Romero et al.'s theory 3000   11 Pespi  UK   of repeated exposure 24. They found that repeated 2000   exposure to a hashtag within Twitter had a signi cant marginal e ect on the probability of adoption of that 1000   hashtag. 0   In general, though, these results show that adoption 1   2   3   4   5   6   7   8   9   10   of a promoted item is not a slow gradual shift over sev- Day   eral days (as might be assumed) but rather an immedi- ate incline when exposure to the item is new to users. Figure 5: Distribution of promoted trends vol- umes over time. Effects on user sentiment The sentiment breakdown for each promoted brand item pare to the Data Science Toolkit as a purely lexicon- can be observed in Figures 6 and 7, with Figure 6 based baseline. showing the results obtained using our chosen machine learning method and Figure 7 those obtained using a 5. RESULTS keyword-based method (see section 4 above). We ob- serve that in most cases, the percentage of positive sen- Response volume over time timent was higher than that of negative and neutral for promoted items. Notable exceptions are the results for To examine the spread of engagement for each promoted two brands, NiveaUK and O2, where neutral and/or item over the 10 day period, we analysed the volume of negative levels outweigh positive; the ASOS brand also unique tweets each day in response to each promoted shows little di erence between negative and positive lev- item, then averaged the results across all brands. Fig- els. However, comparison of the gures that would have ures 4 and 5 display the distribution of this volume in been gained using a keyword-based approach (Figure 7) response topromotedtweets (4) andpromotedtrends (5) shows misleading results in precisely these interesting per brand. On average, promoted trends led to much cases: apparent positive levels are higher than nega- higher response volumes. However, the highest percent- ages of mentions within responses were from promoted 11 Also see http://advertising.twitter.com/2013/03/Nielsen- tweets, where an average of 18% of tweets each day in- Brand-E ect-for-Twitter-How-Promoted-Tweets-impact-brand- cluded an `' mention to the brand; promoted trends metrics.html 6 Tweets   Tweets  tive in all cases. Neutral cases also appear much more Posi)ve  Sen)ment   common; this is due to the low coverage of the keyword 80%   lexicon causing large numbers of results with apparently Average   70%   zero sentiment. Use of the more accurate tool (as ob-  Tweets   60%   jectively assessed see section 4) therefore does appear crucial.  Trends   50%   40%   0.8   30%   0.7   0.6   20%   0.5   10%   0.4   %  posiJve   0%   0.3   %  negaJve   1   2   3   4   5   6   7   8   9   10   0.2   %  neutral   0.1   Day   0   Figure 8: Positive sentiment distribution over time Nega(ve  Sen(ment   Figure 6: Sentiment analysis by brand - machine 30%   learning Average   25%   Tweets   20%   0.7   Trends   0.6   15%   0.5   0.4   10%   %  posiIve   0.3   5%   %  negaIve   0.2   %  neutral   0.1   0%   0   1   2   3   4   5   6   7   8   9   10   Day   Figure 9: Negative sentiment distribution over time. Figure 7: Sentiment analysis by brand - key- words percentage of positive sentiment tweets (58%); positive On average, across all brands (promoted tweets and sentiment then continues to dominate over the 10 day 12 trends), the average percentage of tweets and retweets period, never falling below 36% of the tweets. Exam- which contained a positive sentiment is 50%, that which ining promoted trends, we found that, on average, only contained a negative sentiment is 12%, and 38% of tweets 37% of tweets relating to a promoted trend contained a had a neutral tone. positive sentiment. On the rst day of promotion, 26% Figures 8 and 9 then show the distribution of posi- of tweets expressed a negative sentiment, 32% expressed tive and negative sentiments in this response trac over a positive sentiment and 42% expressed no sentiment at time. On average, positive sentiment outweighs nega- all. This shows that Twitter users do not tweet as pos- tive sentiment; on the rst day, 49% of the tweets were itively about a promoted trend as they would about a positive. In general, promoted tweets lead to more pos- promoted tweet. Instead, a large proportion of tweets itive sentiment and less negative sentiment than pro- relating to a promoted trend contained no emotional moted trends. words, or if they did, the positive and negative senti- In total, 47% of tweets relating to a promoted tweet ments balanced each other out. They generally con- are positive in sentiment. Day one received the highest tained just the promoted hashtag or generally had an 12 objective, matter-of-fact tone (e.g., - \Get 3G where I We assume that retweeting users share the same sentiment as the original tweet. live... O2WhatWouldYouDo"). 7 ASOS   CES   Sony   Vauxhall   Marriot   BBCOne   Nintendo   PepsiMaxUK   JREBEL   PaJentsLikeMe   NiveaUK   O2   ASOS   CES   Sony   Vauxhall   Marriot   BBCOne   Nintendo   PepsiMaxUK   JREBEL   PaIentsLikeMe   NiveaUK   O2  the results. ASOS promoted a trend, AsosSale, on the th th 19 and 20 of December to highlight their Boxing Day ASOS   th sale on the 26 of December (day 8 of data collection). 7,000   Although the promoted hashtag was virtually discarded Tweet   6,000   by day two of data collection, we found that user en- posi5ve   5,000   gagement (use of hashtag, mentions and tweets) for the forthcoming sale continued. This trend is also apparent 4,000   nega5ve   in Vauxhall's tweet volumes for their sale which stated 3,000   th neutral   on the 27 of December (day one of promotion), and 2,000   ended the day after our 10 day data collection period. 1,000   hashtags   The engagement for Vauxhall remained at a consistent 0   level throughout the event (see Figures 5 and 11), de- 1   2   3   4   5   6   7   8   9   spite the rapid drop-o in use of the promoted hashtag. Day   6. CONCLUSIONS & FUTURE DIRECTIONS In this paper we present a measurement-driven study Figure 10: Hashtag related engagements for of the e ects of promoted tweets and trends on Twitter ASOS. on the engagement level of users, using a number of ML and NLP techniques in order to detect relevant tweets and their sentiments. Our results indicate that use of Vauxhall   accurate methods for sentiment analysis, and robust l- 1000   tering for topical content, is crucial. Given this, we then Tweets   see that promoted tweets and trends di er considerably 800   in the form of engagement they produce and the overall posi4ve   sentiment associated with them. We found that pro- 600   nega4ve   moted trends lead to higher engagement volumes than 400   promoted tweets. However, although promoted tweets neutral   obtain less engagement than promoted trends, their en- 200   hashtags   gagement forms are often more brand inclusive (more 0   direct mentions); and while engagement volumes drop 1   2   3   4   5   6   7   8   9   10   for both forms of promoted items after the rst day, this e ect is less pronounced for promoted tweets. We also Day   found that although the volume of tweets is highest in promoted trends, they do not lead to the same level of positive sentiment that promoted tweets do. Hence ad- Figure 11: Hashtag related engagements for vertisers should carefully assess the trade-o s between Vauxhall. high level of engagement, drop-o rate, direct mentions, and positive user sentiment. Taken together with the analysis of engagement vol- In the next stage of this study we will investigate the ume, these results show that when an item is promoted, e ect of individuals' in uence on the take-up of pro- the brand and the item get adopted immediately and moted tweets and trends by their social graph. We will regarded quite positively by the engaged users. Twit- investigate new data at ner granularity (hourly) for ter users welcome the promoted item on Twitter, which events that are time-sensitive, such as major concert has a positive e ect on the tweets expressed. The en- ticket sales. This is our rst attempt at understand- gagement level reduces to an average of 10% of the to- ing this space. The advertising campaigns have very tal tweets on day two, when the item is no longer being di erent structure and we need to understand these in promoted, or is no longer seen as \new and interesting". details. Promoted trends typically stay on the trends However, on average, the positive sentiment expressed list for a day, and promoted tweets are selectively shown still outperforms that of negative sentiment and neutral to a subset of users for a period of time selected by the sentiment each day. advertiser. Without accounting for such nuances, broad statements on the impact of the two forms of advertising Effect of hashtags on engagement and sentiment are not conclusive. However in this paper we focussed We then performed two example case studies, using the on insights in using sentiment analysis methods and ac- ASOS and Vauxhall brands, to examine the use of hash- curate data labelling. We believe our ndings could tags within promoted items. Figures 10 and 11 show provide a new insight for social network marketing and 8 Tweets   Tweets  advertisements strategies, in addition to comparing dif- 11 C. Liebrecht, F. Kunneman, and A. Van den ferent methods of classifying and ltering relevant con- Bosch. The perfect solution for detecting sarcasm tent. in tweets not. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Acknowledgments Analysis, pages 2937, Atlanta, Georgia, June 13 The authors thank Chatterbox for providing unlim- 2013. Association for Computational Linguistics. ited access to the Sentimental API. 12 P. Malo, P. Siitari, O. Ahlgren, J. Wallenius, and P. Korhonen. Semantic content ltering with Wikipedia and ontologies. In IEEE International 7. REFERENCES Conference on Data Mining Workshops (ICDMW), pages 518526. IEEE, 2010. 1 S. Asur and B. A. Huberman. Predicting the 13 C. Manning and H. Schutze.  Foundations of future with social media. CoRR, abs/1003.5699, Statistical Natural Language Processing. MIT 2010. Press, 1999. 2 T. Blake, C. Nosko, and S. Tadelis. Consumer 14 Y. Mejova and P. Srinivasan. Crossing media heterogeneity and paid search e ectiveness: A streams with sentiment: Domain adaptation in large scale eld experiment. 2013. blogs, reviews and twitter. Proc. ICWSM, 2012. 3 M. Cha, H. Haddadi, F. Benevenuto, and K. P. 15 A. Naradhipa and A. Purwarianti. Sentiment Gummadi. Measuring user in uence in twitter: classi cation for indonesian message in social The million follower fallacy. In in ICWSM 10: media. In Cloud Computing and Social Proceedings of international AAAI Conference on Networking (ICCCSN), 2012 International Weblogs and Social, 2010. Conference on, pages 15, 2012. 4 T. Y. Chan, C. Wu, and Y. Xie. Measuring the  16 F. A. Nielsen. A new ANEW: evaluation of a lifetime value of customers acquired from google word list for sentiment analysis in microblogs. In search advertising. Marketing Science, Proceedings of the ESWC2011 Workshop `Making 30(5):837850, Sept. 2011. Sense of Microposts': Big things come in small 5 C. M. Cheung and D. R. Thadani. The packages, pages 9398, May 2011. e ectiveness of electronic word-of-mouth 17 A. Pak and P. Paroubek. Twitter as a corpus for communication: A literature analysis. Proceedings sentiment analysis and opinion mining. In of the 23rd Bled eConference eTrust: Implications Proceedings of the 7th conference on International for the Individual, Enterprises and Society, 2010. Language Resources and Evaluation, 2010. 6 A. M. Degeratu, A. Rangaswamy, and J. Wu. 18 B. Pang and L. Lee. Opinion mining and Consumer choice behavior in online and sentiment analysis. Foundations and Trends in traditional supermarkets: The e ects of brand Information Retrieval, 2(12):1135, 2008. name, price, and other search attributes. 19 B. Pang, L. Lee, and S. Vaithyanathan. Thumbs International Journal of Research in Marketing, up?: sentiment classi cation using machine 17(1):55 78, 2000. learning techniques. In Proceedings of the ACL-02 7 A. Go, R. Bhayani, and L. Huang. Twitter conference on Empirical methods in natural sentiment classi cation using distant supervision. language processing - Volume 10, EMNLP '02, Master's thesis, Stanford University, 2009. pages 7986, Stroudsburg, PA, USA, 2002. 8 P. Goncalves, M. Araujo, F. Benevenuto, and Association for Computational Linguistics. M. Cha. Comparing and combining sentiment 20 M. J. Paul and M. Dredze. You are what you analysis methods. In Proceedings of the 1st ACM tweet: Analyzing twitter for public health. In Int'l Conference on Online Social Networks Conference on Weblogs and Social Media, (COSN'13), 2013. ICWSM, 2011. 9 M. Hall, E. Frank, G. Holmes, B. Pfahringer, 21 M. Purver and S. Battersby. Experimenting with P. Reutemann, and I. H. Witten. The WEKA distant supervision for emotion classi cation. In data mining software: An update. SIGDKDD Proceedings of the 13th Conference of the Explorations, 11(1):1018, 2009. European Chapter of the Association for 10 B. J. Jansen, M. Zhang, K. Sobel, and Computational Linguistics (EACL), pages A. Chowdury. Twitter power: Tweets as 482491, Avignon, France, Apr. 2012. Association electronic word of mouth. Journal of the for Computational Linguistics. American Society for Information Science and 22 D. Quercia, J. Ellis, L. Capra, and J. Crowcroft. Technology, 60(11):21692188, 2009. Tracking \gross community happiness" from 13 http://chatterbox.co 9tweets. Technical Report RN/11/20, University College London, Nov. 2011. 23 A. Ritter, C. Cherry, and B. Dolan. Unsupervised modeling of Twitter conversations. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 172180, Los Angeles, California, June 2010. Association for Computational Linguistics. 24 D. M. Romero, B. Meeder, and J. Kleinberg. Di erences in the mechanics of information di usion across topics: idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 695704, New York, NY, USA, 2011. ACM. 25 M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to ltering junk email. In AAAI Workshop on Learning for Text Categorization, Madison, WI, July 1998. AAAI Technical Report WS-98-05. 26 R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013. To appear. 27 M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol., 63(1):163173, Dec. 2012. 28 A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010. 10