Wednesday, October 24, 2012

So, What Does the Future Hold?


One of the great beauties of Omeka, which is no design accident, is the built-in web of metadata that the user is bound to discover as she encounters the architecture of the site and starts to furnish it with digital treasures.  The choice to use the metadata is hers, but the opportunities are abundant and the pay back will be significant once the site goes live.  Tried and tested Dublin Core is the first expression, creating context around each “item” and the opportunity for access via stipulated information elements.  There is then the opportunity to add metadata about the nature of the item, including its original format and any textual data that is known.   Further opportunity for discovery is offered via tagging, which attaches a string of key words or concepts to the item and so matches them up with the searcher.  This perhaps more tedious part of online exhibit planning requires compliance to rules for the best results, but as Studs Turkel made clear in “Digital History Hacks (2005-08)”  it’s worth taking significant steps when adding meta tags to web sites.  In his case some clever use of digital tools for mining a database, such as the one AOL put out for the universe to play with for a while, pulled up some helpful information for those interested in meta tags.  Understanding how people search and what people search for helps those tagging to know how to include related words and concepts to facilitate searches which will find your site.

So it goes with organized archives, libraries and curated collections.  At the other extreme of the digital universe Google works with a different model.  This model depends on BIG DATA, algorithms and statistical analysis to make the best match between seeker and sought.  Peter Norvig,  Google’s Director of Research, uses Sherlock Holmes as his comparison.  Holmes’ success is not ”flash in the pan” brilliance, but a calculated observation and manipulation of the data.  This is the kind of task that is best dealt with in the digital world where information can be extracted using computational methods that find patterns, categorize, and determine what is relevant.  It seems that with such methods the bigger the data the better, and now that the Internet is more than 100 million times bigger than it was in the beginning the Google search engine is even more likely to be accurate and relevant than it has ever been. 

Although the site had to be taken down when Google took away the support of the API that sustained it, Daniel Cohen’s Syllabus Finder harnessed Google’s search engine along with its own purpose built search tools to aggregate about 1.4 million documents from the web that were identified as syllabi and put them into a huge database.  They were collected over the seven-year period from 2002 to 2009 and since last year this has been available as a huge database for data mining and analysis.  Cohen built his search engine by first generating a “dictionary of notions” or terms which commonly crop up in such documents.  Users could put their own search terms into the Syllabus Finder and it would send optimized queries to Google to extract the best matches.  It also had the refinement to pull out useful information from the relevant data such as where a syllabus was from.  Cohen points out that this method of data mining using specialized filters or sorting programs and harnessing a mammoth search engine such as that of Google or Yahoo has great potential for pulling relevant information from huge amounts of unstructured data.  In fact he even goes so far as to suggest that rather than seeking high-quality digitization and thorough text markup it may be more cost effective and more worthwhile to digitize a larger amount at a lower standard and rely on API’s to provide the path for mining data and synthesizing knowledge.  The greater the data, even if that data is not of the highest quality, the greater the likelihood of accuracy.

One historian who has been maximizing the value of the web and its infinite connectedness is Patrick Leary, who’s web site, Victoria Research Web provides historians of the 19th century with a smorgasbord of resources and the tools to maximize their use.  Here rather than shunning the lesser texts they are brought to light to be subjected to the historic lens.  Leary himself is acutely aware of the great significance of the Internet and accompanying computer tools, which he sees as having revolutionized the role of the researcher.  What one person could do travelling to research libraries and archives and taking note of significant information has been expanded exponentially.  A comparison might be the travel itself.  When people were limited to their own bodies for transport the going was tough and long.  Today we can cross the Atlantic in 6 or 7 hours.

Considering the great changes that we have witnessed in new technologies over the last ten or fifteen years, what I wonder does the future hold?  Let’s hope the electricity stays on!

No comments:

Post a Comment