Open Refine


In class this week, we tackled the question of data. The ideas of data and big data are ones that have popped up in class discussions throughout many of my courses here at Creighton. I think it is a relevant topic because in the digital age we live in, we are constantly looking at and creating data. On top of data we also have metadata (data about data). The stream of information seems endless, but for historians it can be quite the opposite. Sometimes, data is lacking for certain events or people and we have to work hard to piece together enough data to analyze. Another unique challenge presented by historical data is the uncertainty or incompleteness of many data sources. Handwriting on written sources can often be difficult to read and interpret. Additionally, if artifacts are not dated then we have to rely on dating techniques that are not always foolproof. Historical data is also typically created by humans, and are therefore subject to human errors. Miriam Posner described in a blogpost how immersive a historian’s experience is with his or her evidence. Historian’s don’t think of their evidence as data in a spreadsheet, but as a product of a human and society that can be experienced and studied. However, she rightly points out that historical data can be organized in a way to look like the tabular idea of data. Archiving historical photographs into a dataset is a great way to compile it for analysis using a digital tool. Because historical data is so difficult to find, digital programs can also assist in mining the data before the organization stage.

OpenRefine window view.

We can overcome these difficulties with historical data by being deliberate in how we organize and analyze it. In class, we looked at several different programs that would be useful for data organization (,  OpenRefine, etc). History is about creating narratives, and the fluidity of that goal can sometimes be difficult to incorporate into computers that think in terms of 1s and 0s. However, by putting historical data into a format that can become readable by computers, historians can create interesting visuals and analyses using digital tools. OpenRefine is a perfect example of how useful digital tools can be for data organization. I a considering using a data organization system like OpenRefine to help me organize the data I collect from Isabella Bird’s book on her descriptions of the different areas of Colorado. I also plan to use some textual analysis and data organization tools will be necessary to properly analyze the results from that process. OpenRefine hasn’t changed too from the last time we learned how to use the software in the 395 Digital History Course. I liked the way you can clean up the data by using the Text Facet option. This allowed for an easy consolidation of a large data set into the correct categories. I think this tool would be particularly useful for analyzing data that has been put together by a larger team that could result in inconsistencies in data entry.

OpenRefine Text Facet Function

One of the challenges I found with this tool is that for larger datasets its hard to view it all together in a way that would allow for analysis of a whole dataset or group of the dataset. Also the data replace feature was a little confusing for me because I learned a slightly different version of that code for Python. Overall, I think that OpenRefine is a great starting tool for data analysis, but I was left with questions of what next? How does OpenRefine assist with data visualization if at all? Once compiled what kind of tools for analysis does it have?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s