Text Mining and Social Media
As Ted Underwood mentioned in the reading, some of the biggest obstacles around text mining is not only finding the data needed, but finding the skills to collect the correct data.
A reason being that our topic revolves around social media, which can be traced back not just from MySpace, but to early social networking services such as email, chat services and other early internet social structures. Also, as modern history goes, text mining can be easier as we will have more resources as sites, blogs and social applications become more accessible and popular.
After our group uncovers more secondary documents, as we feed them into a Wordle-like application we can see common themes such as undecided, voting, and different kind of feelings that stem from being a first-time voter. These similarities can help us focus on what aspect of the sources we should focus our attention towards, and can help us specify our final historical question.
In the case of secondary sources, my group may find itself in the same predicament the Underwood found himself in his own research.
However, Many of our sources with social media can be a primary source – with interviews, blogs to mine through, and various social networks to comb through by means of twitter hashtags, trending topics, and blogging categories.
Thanks for your post, Caroline. It is good to see you raise the importance of getting “clean” data to work with. With born-digital social media sources I think this is definitely a surmountable hurdle. However, the challenge will still remain of how you make meaning out of the data, and the analyses of the data that you run. Text mining will, as you correctly point out, allow you to recognize predominant themes. But what metrics will you use to assess how those themes are changing over time? Will it be possible to determine why they are changing?