Text Mining – THATCamp CHNM 2013 http://chnm2013.thatcamp.org The Humanities and Technology Camp Thu, 03 Apr 2014 15:36:28 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.12 JSTOR Data for Research workshop http://chnm2013.thatcamp.org/05/29/jstor-data-for-research-workshop/ Wed, 29 May 2013 15:18:07 +0000 http://chnm2013.thatcamp.org/?p=399

In this workshop we will provide both a general overview of the JSTOR Data for Research (DfR) service and a “how to” for using Hadoop and cloud computing for text mining large datasets. For the big data mining portion of the workshop we will be using a large dataset consisting of the JSTOR Early Journal Content (EJC) collection. A bundle of metadata and full text for the approximately 460,000 articles in the EJC collection can be downloaded from the DfR site. For this tutorial we have pre-loaded the EJC content into Amazon Web Service (AWS) data storage and will provide instructions on how to use the AWS Elastic Map Reduce (EMR) service for efficiently mining this dataset. In this tutorial we’ll show how to create an AWS account, develop and submit Map-Reduce jobs (written in Python) and retrieve results. The examples provided will include the generation of ngrams from full text and the identification of the top words in articles via the calculation of TF*IDF scores.

]]>
R for humanists http://chnm2013.thatcamp.org/05/14/r-for-humanists/ http://chnm2013.thatcamp.org/05/14/r-for-humanists/#comments Tue, 14 May 2013 14:05:02 +0000 http://chnm2013.thatcamp.org/?p=278

Text mining (TM) has been one of the most frequently discussed methodologies in the humanities in the last year, along with many tools can help with some basic and some not so basic TM methodologies. Although it may seem like overkill, learning how to use the statistical software package R for TM is a great way to learn more about some fundamental processes and how you can get more control over your own TM explorations.

This introductory workshop will demonstrate how to:

install R
use the R console (like the command line)
create a set of text files to explore
explore the basic TM features
create a visualization of document similarity

 

]]>
http://chnm2013.thatcamp.org/05/14/r-for-humanists/feed/ 4