Literature Review on Social Media in Financial Market


We posted a brief introduction on Sentiment Analysis (see Introduction to Sentiment Analysis) and reviewed several literature on UGC/WOS (see Literature Review on UGC/WOM). Today, we are going to read some papers on taking good use of sentiment analysis and social media to analyze financial market.

In the paper AI and Opinion Mining, Chen et al, 20101, present how online opinion- mining works by investigating the stock performance of a large US corporation, Wal-Mart. Finally, they find that:

message volume in the forum holds a significant negative relationship with stock return, with high volume indicating subsequent negative returns. Disagreement and subjectivity also held significant relationships with volatility, where less disagreement and high levels of subjectivity predicted periods of high stock volatility.

This study clarifies the possibility of using sentiment analysis to explore the principles of financial market.

Role of social media and sentiment analysis in financial market

Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web

Das et al, 20072 develop a methodology for extracting small investor sentiment from stock message boards.


  • Messages are classified by our algorithms into one of three types: bullish (optimistic), bearish (pessimistic), and neutral (comprising either spam or mes- sages that are neither bullish nor bearish).
  • Five algorithms: Naive classifier, Vector distance classifies, Discriminant classifier, Adjective/Adverb Cl, Bayesian classifier;
  • Voting approach: a simple majority across five algorithms;

Schematic of the Algorithms and System Design Used for Sentiment Extraction:


Our data comprises 24 tech-sector stocks, present in the Morgan Stanley High-Tech Index (MSH). These stocks were chosen so as to focus on the tech sec- tor, and also because their message boards showed a wide range of activity. For a period of two months, July and August 2001, we downloaded every message posted to these boards. This resulted in a total of 145,110 messages. These messages were farmed by a lengthy process of Web-scraping.


We examine the statistical relationship of the stock series (MSH) to the sentiment series (SENTY). We regress the aggregate sentiment level on lagged values of the stock index and sentiment. We also regress the stock index level on the same lagged values.

The regressions in levels

The regressions in levels show that tech index is strongly related to its value on the previous day, and weakly related to the sentiment index value from the previous day at the 10% significance level.

The regressions in changes

The regressions in changes show that the sentiment index on a given day is significantly related to its prior day’s value, but not to that of the stock index.


Whereas the sentiment index has expected contemporaneous relationships with various market variables, the disagreement measure we create evidences very little correlation to other variables. The overall evidence suggests that market activity is related to small investor sentiment and message board activity.


We developed a methodology for extracting small investor sentiment from stock message boards. Five distinct classifier algorithms coupled by a voting scheme are evaluated using a range of metrics.

Customers as advisors: The role of social media in financial markets

Chen et al, 20133 investigates the extent to which peer-based advice transmitted through social media affects the stock market.

Theories and Models

Frequency of negative words used in an article captures the tone of the report.


The sample period spans from 2006 to 2010 and is determined by the availability of Seeking Alpha(SA) data:

  • Text data: Seeking Alpha (SA) and Wall Street Journal (WSJ) articles;
  • Financial-analyst data: the Institutional Brokers’ Estimate System (IBES) file;
  • Financial-statement and financial- market data: COMPUSTAT and the Center for Research in Security Prices (CRSP);
  • Institutional-holdings data: Thomson Financial;


Our observations are on a firm/trading-day level. Depending on the regression specification, we have between 30,212 and 30,255 observations.

Dependent variables

Measure of abnormal returns:

  • one-day holding-period returns for day t and day t+1 (${ARet}_{i,t}$ and ${ARet}_{i,t+1}$, respectively)
  • two-day holding-period returns from day t to t+1 and day t+1 to t+2 (${ARet}_{i,t,t+1}$ and ${ARet}_{i,t+1,t+2}$, respectively)

where t is the day on which the article appears on the SA website.

Independent variables

  • ${NegSA}_{i,t}$, which is the average fraction of negative words across all articles published on SA about company i on day t;
  • ${NegWSJ}_{i,t}$, which equals the average fraction of negative words across all WSJ articles about company i on day t (if there are any such articles and zero otherwise);
  • ${DummyWSJ}_{i,t}$, which is an indicator variable denoting whether no WSJ article is written about company i on day t;
  • ${Upgrade}_{i,t}$ and ${Downgrade}_{i,t}$, which are the number of analyst upgrades (downgrades) on company i on day t;
  • ${PosES}_{i,t}$ and ${NegES}_{i,t}$, which are indicator variables denoting whether there is a positive (negative) earnings surprise for company i on day t;
  • $ln({Turnover}_{i,t})$, which is the natural logarithm of the average share turnover from thirty days to three days prior to day t;
  • ${Volatility}_{i,t}$, which is the sum of the squared raw daily returns from thirty days to three days prior to day t;
  • ${PastReturn}_{i,t}$, which is the cumulative abnormal return from thirty days to three days prior to day t.


  • The opinions revealed on SA strongly associate with the corresponding companies’ stock returns, even after controlling for the effect of traditional advice sources,
  • The effect of peer-based advice on stock returns is stronger for articles that receive more attention and for companies held mostly by retail investors.

Social media and firm equity value

Luo et al, 20134 scrutinizes the predictive relationships between social media and firm equity value, the relative effects of social media metrics compared with conventional online behavioral metrics, and the dynamics of these relationships.

Theories and Models

  1. Social Media as a Leading Indicator of Firm Equity Value:

  2. Social Media as a Stronger Indicator in Predicting Firm Equity Value Compared with Conventional Online Consumer Behavioral Metrics:

    • Social media metrics tend to be more socially “contagious” than Web traffic and Internet searches;
    • Social media is more visible than conventional online media;
    • Social media metrics can denote a higher degree of customer engagement with the firm than do traffic and search metrics;
  3. Dynamics of the Predictive Value of Social Media:

    • Information is transmitted and diffused through the wide reach of social media at the unparalleled speed;
    • Social media content can be voted on, linked, reproduced, broadcast, and spread more quickly, creating information richness and diffusion speed unmatched by conventional online behavioral metrics;


The daily data of publicly traded firms in the computer hardware and software industries were collected from multiple sources (Alexa, CNET, Lexis/Nexus, Google search, CRSP, COMPUSTAT, and Yahoo Finance) during the period of August 1, 2007 to July 31, 2009.


They employ a time-series technique, namely, VARX Model.

  • $t$=time
  • ${\alpha}_{i}$ (i = 1,2,…,10) = constant
  • ${\delta}_{i}$, ${phi}_{ij}^{k}$
  • ${\tau}_{i,l}$ (i,j = 1,2,…,10, l = 1,2,…,11) = coefficients, K = lag length
  • $x_i$ (i = 1,2,…,11) = an exogenous variable
  • ${\epsilon}_{i}$ (i = 1,2,…,10) = white-noise residual

The lag order in VARX is selected by Schwartz’s Bayesian information criterion (SIC) and final prediction error (FPE).

Dependent variables:

  • $RTN$ = firm return
  • $RSK$ = risk

Independent variables

  • $AVR$ = rating level
  • $NUR$ = rating volume
  • $POS$ = number of positive blog posts
  • $NEG$ = number of negative blog posts
  • $PGV$ = page views per user
  • $REC$ = reach
  • $GSI$ = Google search intensity
  • $GSV$ = Google search instability,


  • Social media-based metrics (Web blogs and consumer ratings) are significant leading indicators of firm equity value.
  • Conventional online behavioral metrics (Google searches and Web traffic) are found to have a significant yet substantially weaker predictive relationship with firm equity value than social media metrics.
  • Social media has a faster predictive value, i.e., shorter “wear-in” time, than conventional online media.


  • Prior studies (e.g., Moe and Fader 2004, Chevalier and Mayzlin 2006, Dellarocas et al. 2007, Dhar and Chang 2009, Ghose and Yang 2009) examine the relationship between digital user metrics and product sales.
  • This study focuses on investigating the multiple sources of digital user metrics and the relative effects;
  • This study examines the enduring effects of social media.

1. Chen, Hsinchun, Zimbra, David (2010). AI and opinion mining. IEEE Intelligent Systems, 25(3), 74—80
2. Das, Sanjiv R, Chen, Mike Y (2007). Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management science, 53(9), 1375—1388
3. Chen, Hailiang, De, Prabuddha, Hu, Yu Jeffrey, Hwang, Byoung-Hyoun (2013). Customers as advisors: The role of social media in financial markets.
4. Luo, Xueming, Zhang, Jie, Duan, Wenjing (2013). Social media and firm equity value. Information Systems Research, 24(1), 146—163
-------------End of postThanks for your time-------------
BaoDuGe_飽蠹閣 wechat
Enjoy it? Subscribe to my blog by scanning my public wechat account