One of the great beauties of Omeka, which is no design
accident, is the built-in web of metadata that the user is bound to discover as
she encounters the architecture of the site and starts to furnish it with
digital treasures. The choice to use the
metadata is hers, but the opportunities are abundant and the pay back will be
significant once the site goes live.
Tried and tested Dublin Core is the first expression, creating context
around each “item” and the opportunity for access via stipulated information
elements. There is then the opportunity
to add metadata about the nature of the item, including its original format and
any textual data that is known. Further
opportunity for discovery is offered via tagging, which attaches a string of
key words or concepts to the item and so matches them up with the searcher. This perhaps more tedious part of online
exhibit planning requires compliance to rules for the best results, but as
Studs Turkel made clear in “Digital History Hacks (2005-08)” it’s worth taking significant steps when adding meta tags to web sites. In his case some clever use of digital tools
for mining a database, such as the one AOL put out for the universe to play
with for a while, pulled up some helpful information for those interested in meta
tags. Understanding how people search
and what people search for helps those tagging to know how to include related
words and concepts to facilitate searches which will find your site.
So it goes with organized archives, libraries and curated
collections. At the other extreme of the
digital universe Google works with a different model. This model depends on BIG DATA, algorithms
and statistical analysis to make the best match between seeker and sought. Peter Norvig, Google’s Director of Research, uses Sherlock Holmes as his comparison. Holmes’ success is not ”flash in the pan” brilliance,
but a calculated observation and manipulation of the data. This is the kind of task that is best dealt
with in the digital world where information can be extracted using
computational methods that find patterns, categorize, and determine what is
relevant. It seems that with such
methods the bigger the data the better, and now that the Internet is more than
100 million times bigger than it was in the beginning the Google search engine
is even more likely to be accurate and relevant than it has ever been.
Although the site had to be taken down when Google took away
the support of the API that sustained it, Daniel Cohen’s Syllabus Finder harnessed Google’s search engine along with its own purpose built search tools
to aggregate about 1.4 million documents from the web that were identified as
syllabi and put them into a huge database.
They were collected over the seven-year period from 2002 to 2009 and
since last year this has been available as a huge database for data mining and analysis. Cohen built his search engine by first
generating a “dictionary of notions” or terms which commonly crop up in such
documents. Users could put their own
search terms into the Syllabus Finder and it would send optimized queries to
Google to extract the best matches. It
also had the refinement to pull out useful information from the relevant data such
as where a syllabus was from. Cohen
points out that this method of data mining using specialized filters or sorting
programs and harnessing a mammoth search engine such as that of Google or Yahoo
has great potential for pulling relevant information from huge amounts of
unstructured data. In fact he even goes
so far as to suggest that rather than seeking high-quality digitization and
thorough text markup it may be more cost effective and more worthwhile to
digitize a larger amount at a lower standard and rely on API’s to provide the
path for mining data and synthesizing knowledge. The greater the data, even if that data is
not of the highest quality, the greater the likelihood of accuracy.
One historian who has been maximizing the value of the web
and its infinite connectedness is Patrick Leary, who’s web site, Victoria Research Web provides historians of the 19th century with a smorgasbord of
resources and the tools to maximize their use.
Here rather than shunning the lesser texts they are brought to light to
be subjected to the historic lens. Leary
himself is acutely aware of the great significance of the Internet and
accompanying computer tools, which he sees as having revolutionized the role of
the researcher. What one person could do
travelling to research libraries and archives and taking note of significant
information has been expanded exponentially.
A comparison might be the travel itself.
When people were limited to their own bodies for transport the going was
tough and long. Today we can cross the
Atlantic in 6 or 7 hours.
Considering the great changes that we have witnessed in new
technologies over the last ten or fifteen years, what I wonder does the future
hold? Let’s hope the electricity stays
on!
No comments:
Post a Comment