eFORCE
Blogs Home | Corporate Website

Monday, November 23, 2009

What are users twittering on twitter?

Or what types of conversations are happening on twitter? We decided to run some statistics to find the answer. Basically, we divided twitter messages into three different categories: (i) Messages that are posted in response to other messages, (ii) Messages that are forwards or re-tweets of other tweets, and (iii) Messages that contains URLs, which could be pass along links or news story or link to blogs or of commercial purpose like deals/coupons/etc.   We were thinking of having more categories – especially categorizing links into different buckets: news, promotion, etc but then decided against it. On social media platform like Twitter, there is a thin (and subjective) line between personal and professional content & usage. Is tweet with link to my blog, a news item for my friends or self marketing?

We collected data for a period of one month (Oct 09) on range of different companies, brands and topics. All in all, there were over 1 million tweets that were classified into 3 different buckets. Here are results:

  • 15.8% of tweets were part of conversation, replies in response to other tweets.
  • 5.9% of tweets were forwards or re-tweets of other tweets
  • 40.7% of tweets contained at least one URL.

I must say, % of tweets with URL(s) is staggering high. It also differs vastly from this study by Pear Analytics that gained wide attention few months ago. Couple of differences that I would like to point out between Pear Analytics study and our study: (1) While Pear Analytics study examined roughly 2000 tweets, our study examined over 1 mil tweets, and (2) I think the main difference between the two studies is that the Pear Analytics study examined random tweets whereas we targeted specific content (for example, content related to Best Buy, Wal Mart, or Healthcare reform, Halloween etc),  which in turn could have helped filter out lot of “pointless babbel” that Pear Analytics study mentions.


Why is this important? Most of the content with links is references to news items that users think their friends/followers find useful, self promotion, or commercial like deals, coupons, etc.  When Twitter started out, it was based on a simple idea: know what your friends are doing. However, as the service grew, its users have also found other uses of the service besides posting “I’m having big burrito guilty feeling”.  

Percentage of tweets with links for the month of November so far? 42.6%, while other numbers (for re-tweets and conversations) are roughly same as that of Oct 09 figures.



For more information on our ContextMine product and how it can help, visit our product site at http://www.context.com/contextmine. If you are interested in trial version, send us an email at: dshah at context.com

posted @ Monday, November 23, 2009 7:19 PM | Feedback (0)

Tuesday, November 17, 2009

Contd.. Who is influencing conversation about your brand on social media sites?

In my previous post here, I had highlighted stats for Youtube and Twitter. Recently, I got asked about stats for few more social media sites. So here it is:

Note: As I had mentioned before, data were collected for a period of month (oct-09) for various brands across different verticals.


FriendFeed:
  • Average number of messages per user: 7.21
  • Average number of messages by top 0.1% of users: 2163
  • Average number of messages by next 1% of users: 152
  • Average number of messages by next 10% of users: 19
  • % users who had posted exactly 1 message during that time frame: 52%
  • % of users had 5 messages or less during that time frame: 85%

Digg:
  • Average number of messages per user: 1.85
  • Average number of messages by next 1% of users: 22
  • Average number of messages by next 10% of users: 5
  • % users who had posted exactly 1 message during that time frame: 75%
  • % of users had 5 messages or less during that time frame: 95%

I will try to provide stats for few more sites like Metacafe, DailyMotion, Reddit, etc in another post.

One quick note: FriedFeed lags behind Twitter (by a wide margin) in terms of total no of unique users but it does have pretty high user participation ratio.



For more information on our ContextMine product and how it can help, visit our product site at http://www.context.com/contextmine. If you are interested in trial version, send us an email at: dshah at context.com

posted @ Tuesday, November 17, 2009 3:44 PM | Feedback (0)

Friday, October 30, 2009

Who is influencing conversation about your brand on social media sites?

Social media sites are proving to be useful channels for advertisers and sales people to connect to their targeted audience. You can not only listen to their conversation but also take an action by guiding conversation, providing correct information, promoting your brands, and more. But how do you know which audience to connect with? How do you know which users are having greater impact than the others? How often people are talking about your brand on social media sites?

Below are results from some of our reports. Data mined were collected for several brands/companies (across different verticals like auto, appliances, etc) over a period of one month from some of the popular social media sites like Twitter, Youtube, FriendFeed, Digg, etc.  

Note: It only takes into consideration user messages pertaining to brands/companies for which we generated stats.

Twitter:
  • Average number of messages per user: 1.83
  • Average number of messages by top 0.1% of users: 167
  • Average number of messages by next 1% of users: 22
  • Average number of messages by next 10% of users: 4
  • Over 75% users had posted exactly 1 message during that time frame.
  • Over 95% of users had 5 messages or less during that time frame.

YouTube:
  • Average number of messages per user: 3.23
  • Average number of messages by top 0.1% of users: 483
  • Average number of messages by next 1% of users: 91
  • Average number of messages by next 10% of users: 9
  • Over 74% users had posted exactly 1 message during that time frame.
  • Over 93% of users had 5 messages or less during that time frame.


On social media sites, there are users who are participating and there are users who are talking about your brand, driving & influencing conversation, and are passionate – do you know who they are?


For more information on our ContextMine product and how it can help, visit our product site at http://www.context.com/contextmine. If you are interested in trial version, send us an email at: dshah at context.com

posted @ Friday, October 30, 2009 5:47 PM | Feedback (0)

Thursday, October 29, 2009

Are you in control?

In today’s social media driven web world, more and more users are using web to not only find and search information but are also becoming its active participants and adding content via different channels like forums, blogs, audio/video, ratings, reviews and feedback, and micro-blogs. And so, increasingly customers and prospective clients are turning to web to find relevant information about your company, reviews &feedback and in general read what others have to about your products and services.  While these channels can provide an opportunity to gauge interest of consumers about your brands and products, negative message and opinion on these channels can also hurt your company’s reputation.


Do you still wonder usefulness of some of these channels and whether your brand’s online reputation has an impact on the company’s bottom line?  Consider some of these findings:
  • As holiday season is approaching, shoppers are busy researching and scouting prior to purchase.  Recent study by OTX/Google states that internet is ranked as the most useful source of information with 79% of consumers finding it very or extremely helpful. 
  • You may not be selling your products online, but check out this finding from Comscore research study: Online consumer generated reviews have significant impact on offline purchase behavior as well. With consumer willing to pay at least 20 percent more for services receiving an “Excellent”, or – 5 star, rating than for the same service receiving a “Good”, or 4 star rating.
  • Social media web sites have become one of the most frequented sites per web statistics. Recently released DEI Study states that the consumers who visit social media sites are more likely to take action and ‘directly engaging with consumers online using brand representatives will motivate purchase Intent and increase pass-along’.

There is a conversation happening about your brand on social media sites. Question is who is shaping that conversation? Are you in control of your company’s online message?

For more information on our ContextMine product and how it can help, visit our product site at http://www.context.com/contextmine. if you are interested in trial version, send us an email at: dshah at context.com




posted @ Thursday, October 29, 2009 12:34 PM | Feedback (0)

Tuesday, May 05, 2009

Some random thoughts on using Hadoop for database driven applications

‘Can Hadoop help shorten duration of my 48 hr batch process?’, ‘can Hadoop cluster help replace my cluster of high-end servers?’ so you ask. We have heard a number (or variations) of these questions from our clients in past several weeks.   Some of our clients are in the finance or publishing industries – they house large set of databases ranging from MySQL to Oracle to maintain their array of data and run various data warehousing, cleansing and mining jobs. As data sizes have grown exponentially, so have database cluster and server configuration – as well as time it takes to run daily/weekly batch processing jobs and reports.

MapReduce algorithm has gained lot of popularity (and not to mention lot of press coverage as well) over last few months. The algorithm’s inventor, Google, is using it for its own search technology. Hadoop, which is its open source version has been heavily supported by Yahoo and used at its various projects. So obviously, the technology has been proven to efficiently handle datasets that ranges from terabytes to petabytes. Question is whether it can be applied to RDBMS/data warehousing domain? Can it solve the problem of long running ETL jobs Or can it be used to truncate time it takes to run some data mapping jobs? Can it help DBA/IT Developers to get out of constant loop of performance tuning, indexing, tuning and indexing?

All good questions.

But alas, trying to utilize Hadoop for database driven application is not as straightforward as it may sound.  And it may turn out to be not as efficient either. For starters, Hadoop platform is built on top of regular file system. Its architecture has two main components: Hadoop distributed file system (HDFS), which distributes & stores data over several machines, and mapreduce programming framework, which can be used to divide long running jobs into several smaller (map) tasks and then combine results during reduce phase.  Hadoop usually serializes and stores its objects on a regular file system. Since Hadoop itself is built on top of regular file system and not say DBMS, it is difficult to utilize its platform to scale your traditional RDBMS applications. Companies like AsterData and GreenPlum has databases built on top of their own mapreduce  framework.

 Having said that, there are several options to use Hadoop for your database jobs:
  • Check out Hadoop DBInputFormat API (http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/db/package-summary.html), which is part of v0.19 and later. It allows users to make JDBC calls from within Hadoop framework.
  • Frameworks built on top of Hadoop platform: Hive, PIG (and Cascading, which is open source but not part of ASF umbrella). These frameworks are designed to provide support for structured data analysis. In addition, Hive uses SQL based syntax. These frameworks can be used to load data from your databases inside Hadoop, perform analysis and generate results or even export results into flat files and load it back into your database.

One must also give careful consideration to what jobs can be transferred to Hadoop.  One rule of thumb we try to follow is analyzing whether a particular job spends more time executing SQL statements or in other computation activities like data analysis, transformation or aggregation.  In most cases, later type of jobs will benefit more from being executed in hadoop cluster environment. Otherwise, firing SQL statements from your 100s-1000s of mapper tasks would be a sure way to get your DBA’s attention!
 
We will try to share some more insights into this topic over next couple of blogs. Stay tuned!

posted @ Tuesday, May 05, 2009 2:53 PM | Feedback (1)

Home
Contact
RSS 2.0 Feed
Login
November, 2009 (2)
October, 2009 (2)
May, 2009 (1)

Powered by: