Text mining will be an essential component in visualizing the answers to the questions Contra will present. Since presedential debate transcripts are so widely available, we can use data mining to specifically find out how many times, The War on Drugs was mentioned by the candidates. This can give us a statistical basis to comparatively look at the funding that was pumped into the war on drugs in the subsequent presedential term. In this current election we see this domestic war scarcely being mentioned if at all by both candidates. I hypothesize that a correlation could emerge showing that the war continues to have an increase in funding, while it being less of a platform for campaigns to run on. Highlighting this information will open up further questions, such as; why? We could also illustrate on whether or not states support an increase, or decrease in budget within this war. This can be done by seeking out secondary sources in which different representatives give their personal opinion on the campaign.
Being in the midst of the debates, as we are today, it would be hard to read through any newspaper or watch the nightly news without hearing of the Presidential and Vice-Presidential debates. Pundits and analysts are quick to jump on every word, interruption, and perceived mistake in order to determine who “won.” With this in mind, our group The Instigators, aim to analyze all possible information in order to see if the debates truly make a difference. Many believe that at this point in the campaign cycle, most voters have already decided well in advance who they plan on voting for; however, for those still on the fence, could the debates truly sway them in either direction?
Given these circumstances, data mining will serve as an invaluable tool in the examination of the countless information released in response to the debates. Finding the correlations between live-real-time reactions from online sources such as Twitter, Facebook, and various RSS feeds, will shed light on the question of voter-impact. It is imperative to approach this question on importance of the debates from many different angles, in order to provide a more nuanced response to a complex question. While many individual sources will claim to provide their own idea of the “winner” of each debate, the general data that will be received by our group may in fact not be as simple as yes/no, Democrat/Republican.
As a sub-focus, it may also be important to mine data regarding third party candidates and their lack of inclusion in all of the debates.
One of the main questions that my group is considering basing our project on is: how often do presidents and opposing candidates break or keep their promises while in office or running for office? The idea of text mining is to thoroughly read different articles or other forms of written research and to find patterns throughout the research. These patterns consist of repeating statements or words that a person, in this president or opposing candidate), would say. This is a very beneficial way of doing our research because when a candidate is running for president or any type of office they usually use specific phrases to get the public’s attention and they constantly repeat these phrases whenever they are in public.
Like, my group member Tatsiana stated in her post, we attempt to compare what the president did during his term and what he said when he was campaigning. Unfortunatly the class is only until december, so we will not be able to do the project for the entire 4 year term of the president for this current election, but hopefully we will get an idea of what he will do up until the due date of this proejct. Along with this current election, we will also look at past presidencies and make the same coomparison. It is important that I say this becase both professors use this current election as a topic for postings and for topics of discussion in class, so it would only seem appropriate to integrate this election into our project.
Throughout the process of gathering information and data to help support our argument concerning debates on voter behavior, text mining will be pertinent. Going through large amounts of numbers and words to grab and portray what is important and what is essential for the project is the basis of text mining. Many broadcast companies now have a system which they give a room full of undecided voters a device to grasp their emotions and reactions to what the candidates are saying. Word by word, topic by topic, we can understand what ideas presented affect this group of people. It is either positive, negative or no response- such as if the moderator is speaking. Digging through these numbers could help us convey our argument. Making a correlation between what a candidate says, to how people react, to then the numbers at the polls could be attained by text mining. It would be important to understand how many people are actually studied with these devices and how much it actually represents the general public or those who are all undecided. I believe these studies have large amounts of numbers and information that can help us make direct correlations to election day results. I am a little unsure as to how we actually will attain these numbers and what programs are used to get the pertinent data.
As Ted Underwood mentioned in the reading, some of the biggest obstacles around text mining is not only finding the data needed, but finding the skills to collect the correct data.
A reason being that our topic revolves around social media, which can be traced back not just from MySpace, but to early social networking services such as email, chat services and other early internet social structures. Also, as modern history goes, text mining can be easier as we will have more resources as sites, blogs and social applications become more accessible and popular.
After our group uncovers more secondary documents, as we feed them into a Wordle-like application we can see common themes such as undecided, voting, and different kind of feelings that stem from being a first-time voter. These similarities can help us focus on what aspect of the sources we should focus our attention towards, and can help us specify our final historical question.
In the case of secondary sources, my group may find itself in the same predicament the Underwood found himself in his own research.
However, Many of our sources with social media can be a primary source – with interviews, blogs to mine through, and various social networks to comb through by means of twitter hashtags, trending topics, and blogging categories.
One of the potential questions our group is considering to research is “How common is it for a president to break his promises made during the presidential campaign?” In order to answer this question and draw the parallels between current and historian elections, we would have to process quite large amount of text and find just the information that we need to prove our historian question. Text mining will be the essential tool in our analyses.
We will use lexical analyses that are based on searching of the key words in candidates’ speeches to find out the major promises that they made during their presidential campaign. Also, we will look for frequency of their promises – that is how often in their debates, interviews, speeches and other public appearances do they repeat these promises. Sometimes we may identify a certain patterns in their speeches that are related to their promises.
Then, we are going to compare the information that we gathered about campaign promises with the real actions these candidates made once they are elected to the presidential office. By doing this, our goal is to find out if it is common in politics for presidential candidates to make false promises to the voters, and if the voters can trust these candidates.
Text mining involves a program analyzing large volumes of unstructured data for the purpose of extraction of specific words and key phrases.
Since both of our historical questions proposed so far involve social media, we will need to use as many social media websites as we can because larger amounts of data will be better for comparison and analysis.
Unlike Ted Underwood, who needed literary works for his project, we can obtain the necessary information straight from the social media websites.
As far as the necessity of learning how to program, I am not sure whether it will be necessary for our project or not. The public toolsets for text mining, given as examples on professor Underwood’s website, seem sufficient enough for the job.
Text mining will help us divide and categorize information, thereby revealing patterns.
In our case, text mining will be used to determine how the names of presidential candidates, “Presidential Election 2012,” and popular political issues, are being used by young/first time voters. This election is arguably is first to be so immersed by the social media, which makes it perfect for this project.
I am not sure if my response is adequate enough for the posted question. Perhaps if I came to class last Wednesday, it would have been better. Unfortunately, the train tracks between my house and Baruch were broken at Prospect Park station.
Our group can use text mining to answer the historical question that the group has proposed about if and how the outcome of presidential debates determined who won the election. Text mining would allow us to see if there were key words or phrases used by candidates during the debates that proved to have a positive or negative effect on voters, and as result, attracted voters or deterred them away. Another way text mining will be beneficial in our project is to determine if other aspects apart from analytics played a role in deciding the outcome of elections based on a candidate’s performance during the debates. During debates, candidates present various types of data to present their case to voters: statistical data, such as their previous track record while serving in their current governmental post; and conditional data, such as what they expect to accomplish if they are chosen as president. Because debating not only deals with factual data presented by the candidates but also the manner in which the candidates convey the data, such as their behavioral disposition, body language, tone of voice, eye contact, etc., data mining will help capture the effects of these different factors and what role they played in steering the outcome of the election. However, we keep in mind that our analysis is on the premise that the election process is very complex and trying to keep all variables stable poses multifaceted challenges.
By Monday, Oct. 15, at 8:00am:
- Complete Reading:
- Richard White, “What is Spatial History?” Spatial History Lab: Working paper; Submitted February 1, 2010.
- Explore Hypercities.com. Come with a question about historical maps for our guest speaker.
- Blog Post(s):
- Each member of your group
- In 200-300 words answer the following questions: How could your group use text mining to answer the historical question(s) you’ve proposed thus far?
- One member of the group:
- post 3-5 secondary sources your group will be reading to provide background information.
- For secondary sources, you might look at JSTOR, search the library catalog, or consult a librarian. Comment on this post if you have any questions that you think we can help you with.
- Each member of your group
Announcements
- Blog commenting: “The Magnificent Seven ds106 Comment Challenge“
Reading
- Big Data
- “And because we [historians] look for stories—for ways of synthesizing diverse strands into narrative themes—we usually look for interactions among variables that to other eyes might not seem related.”
- Importance of collaboration: e.g., joining “the historian’s facility with sifting and contextualizing information to the computer scientist’s (or marketing professional’s) ability to generate and process data.”
Ted Underwood, “Where to start with text mining,” The Stone and the Shell, August 14, 2012
- “Quantitative analysis starts to make things easier only when we start working on a scale where it’s impossible for a human reader to hold everything in memory.”
- quantitative v. qualitative?
- Close reading v. distant reading
- OCR challenges with primary sources
- Wordle
- Tools? Some programming needed.
- “you can build complex arguments on a very simple foundation”
- What can we do?
- Categorize documents
- Contrast the vocabulary of different corpora
- Trace the history of particular features (words or phrases) over time (e.g. ngram viewer, Bookworm)
- Cluster features that tend to be associated in a given corpus of documents (aka topic modeling)
- Entity extraction
- Visualization (e.g. geographically, network graph)
Group Projects
Group 1
Caroline, Anton, Eli, Cameron, Leanardo
Group 2
Estevan, Tatsiana, Phillip, Jordan Burgos
Group 3 – Instigator
Felipe, Jordan Smith, Robert, Pablo
Group 4 – Contra
Guang, Cary, William, Stephen, Shaif
History 3460: Digital History
Group Name: Instigator
Group members: Robert Sorenson, Jordan Smith, Felipe Francois
Archiving History Digitally
There are a few questions that our group, Instigators, seeks to find answers to on the current presidential election. How much influence do the debates between candidates affect and change the outcome of the polls coming closer to the election? How much influence do the debates affect who actually wins the elections? Have any previous elections been decided solely on debate performance?
Although we have considered the actual debates as a guide for our research questions we don’t actually know the scope to which we should go about answering our questions. Some of the outlets we are considering for our debate feedback are CNN (they display a meter for a cohort of undecided voters during the live debates to depict their feelings toward what each candidate is saying), Gallop, Fox, and the New York Times as well as other online sources.
Some obvious challenges we anticipate are with respect to collecting data. Such as getting access to recordings, transcripts and poll data of the debates. Choosing a single method of collaboration could also be a tough decision because it will be a basis for understanding what we each gather. Another problem is how to put the information together so it makes sense and reflects the actual debate. Lastly, keeping our own biases at bay so that we represent the facts as they appear.

The group name we decided on going with is, Contra. This will coincide with the theme for our project, The War on Drugs. For those unfamiliar with the specifics, this is a 40+ billion dollar a year mission that is funded by US tax dollars, and has been seen as infective at best. Many facets of our governments spending is coming into question, but why has such an expensive and largely inefficient program seemed to have fallen on the back burner of many discussions.
TWoD is extremely relevant to our society on a social and international level. Most people in prison are due to drug charges. Drugs affect every aspect of our life, from the legal to illegal. TWoD spans almost a hundred years and touches on conspiracies, economics, racism, and foreign policy. But it likely won’t even be mentioned during the presidential debates unlike the Reagan era, where it was a running platform.
Some difficulty may arise in our current method of communication through email. We have created a google doc that will hopefully give us the ability to collaboratively grow an idea from the same digital workspace. We’re undecided on what digital format were going to use for the project, but the recent creation of our google doc should aid greatly into a decision coming shortly.
Group 1
Caroline, Anton, Eli, Cameron, Leanardo
A name for your group
NET POLITICS.
2-3 historical questions you are considering answering in your project
1.) How does social media (twitter, Facebook, tumblr, reddit, etc) affect young/first time voters?
2.) How has social media and the internet affected the 2012 Presidential Election? Does social media influence first time voters to vote?
A brief description of the expected scope of your project:
We hope to focus on 2012, but using other elections for comparison purposes, on how young voters (high school seniors/college freshman) are affected by these sites. We feel that we can talk about all the memes, gifs, and twitter accounts that have popped up during the election – including that Big Bird 2012 campaign that started during/after the first presidential debate.
A list of challenges and potential problems that you are having now, or anticipate will arise as you work on the project
Currently we find a challenge to be focusing our generally broad ideas into one big topic – the internet is a huge place and memes fall in-and-out of popularity regularly, and it’s also difficult to gauge how something as abstract as the internet is affecting something as concrete as voting for a presidential candidate.
Group Members: Phillip Bleustein, Estevan Roman, Tatsiana Vashkevich
Name of Group: Group 2 ( Will change in future)
2-3 Historical Questions We are Considering:
- How does the appearance effect the outcome of the presidential election?
- What role does religious affiliation of the candidate play during the candidate campaign?
- What are the boundaries of the exaggerated/unachievable promises during presidential campaign?
A Brief Description of the Expected Scope of Our Project:
- We are hoping that we can create the parallels between common issues and facts that play important role during 2012 presidential elections and other historical presidential elections. This historic perspective may help us to uncover deep issues that truly matter and continue to be a driving force that carries winning candidate to the top.
A List of Challenges and Potential That You Are Having now, or Anticipate as you Work on the Project:
- Not being together for a majority of the project, and having to collaborate our work on our google doc.
- There are always external factor that come in to play unexpectedly and having to embrace those factors and work through them could be a problem at time.
Optional: discuss technologies, formats, and work-flow that you may employ:
- Embedding pics, video, audio, likely all through when applicable.
- Google Docs
- Youtube and Blogger
By Monday, Oct. 8, at 8:00am:
- One person from each group should make a post on behalf of the entire group. For the list of members in the groups, see this post.
- The post should include:
- A name for your group
- 2-3 historical questions you are considering answering in your project
- A brief description of the expected scope of your project
- A list of challenges and potential problems that you are having now, or anticipate will arise as you work on the project
- Optional: discuss technologies, formats, and work-flow that you may employ
- You are free to establish your own collaborative process. We highly recommend Google Docs
By Wednesday, October 10, at 8:00am:
Leave one comment on each of three different posts (other than your own group). In each comment, raise at least one question about the proposed plan. You are encouraged to say something positive, but remember to also challenge their thinking (remember, history is contested).
By Wednesday, October 10, at 5:50pm:
- Complete Reading:
- James Grossman, “‘Big Data’: An Opportunity for Historians?” March 2012.
- Ted Underwood, “Where to start with text mining,” The Stone and the Shell, August 14, 2012.
Announcements
- No class Monday.
Reading
- Thiemer, Brier and Brown, “A Practical Guide to Collaborative Documentation in the Digital Age”
- Compare the processes: http://911digitalarchive.org/ and http://braceroarchive.org/
- Key concepts
- An archive or a collection?
- “archivist-historians”
- born-digital vs. digitized acquisitions
- inequality of access to digital media
- review different methods of inputting information: text and image scans, emails, websites, listservs, text via form on site, images and video via upload, call-in system, collaborations with other collectors (e.g. Sonic Memorial Project and Here is New York: A Democracy of Photographs), digital and analog interviews and sound recordings (including collaborations with Middle East and Middle Eastern American Center, and the Chinatown Documentation Project
- Insuring a range of perspectives
- Challenges: more standardized open-source database and web publishing platform, more complete metadata, redesigned web site, permanent archival home (expected to turn over to LOC in 2013), 508(c).
- Quotes from the Bracero Historical Archive that are useful for planning your group project
- “First, decide what kind of collaboration you wish to have, since that decision informs the rest of the process, from technical to communication considerations. If your partners will merely be commenting on each others’ work, you can afford to think more about ways to share files and accommodate the comment process.”
- “If your partners will each be contributing work to the project, or if there are task- sharing aspects to your project, you must also ensure that partners have the ability to contribute efficiently and that you can hold each other accountable for your contributions.”
- “Make sure each partner understands exactly what their contributions are, and when those contributions are due. You will use meetings or other communications to manage those deliverables, but it is crucial that all partners are agreeing to the same thing.”
- “Flexibility is key. No project is able to anticipate all problems or challenges before they occur, but simply acknowledging that challenges may arise, and allowing time and budget for those challenges is helpful. For example, deciding as a partnership that in the event of an unanticipated technical problem, Partner A will take the lead in resolving it, means that you will not lose valuable time assigning that responsibility at a critical moment.”
Group Project Breakout Discussions
Group 1
Caroline, Anton, Eli, Cameron, Leanardo
Group 2
Estevan, Tatsiana, Phillip, Jordan Burgos
Group 3
Felipe, Jordan Smith, Robert, Pablo
Group 4
Guang, Cary, William, Stephen, Shaif
Who created the artifact?
– Zubeida Mustafa
When was the artifact created?
– First Quarter, 1969
Where was it created?
– In the Pakistan Institute of International Affairs / Pakistan Horizon
Why was the document created?
– To express the author’s take on the Presidential Election of 1968 and the cause for its outcome.
Why is the document a primary source?
– It is the firsthand account of the 1968 Presidential Election.
How trustworthy is the source?
– Very. It is a “non-official, non-party, and non-profit making body”.
What other questions might you ask the source in order to better understand what it reveals about the events of 1968?
Did Nixon win the 1968 election solely because of the public’s views of the Democrats and their policies on the Vietnam War?
I would look for other artifacts that would possibly contain surveys done on the American public to see if they actually felt strongly against the Democratic party during the 1968 election. Also I would wonder if this is the only reason why the Republicans won the election, or did the people who voted for the Republicans strongly believe that they had a better campaign for that voting year. In order to find out more, I would conduct more research and see if there were voting polls taken during that time period, and if so, what the general public thought.
I would also try to find out if the Republicans won because of the Democratic candidate, Hubert Humphrey, or they would have won regardless of who the Democratic candidate was because either (a) Nixon had a stronger campaign, (b) the general public was upset at President Johnson’s actions in the Vietnam War, or (c) Hubert Humphrey was not a very likable candidate and/or his platform was weak.
By October 3, class time:
- Read “A Practical Guide to Collaborative Documentation in the Digital Age,” The Bracero Archive.
- Come to class with some ideas for a historical argument related to the 2012 presidential election project to talk through with your groupmates.
Recent Comments