BAT 2600: Week 12 Reading Notes

Current developments and future trends for the OAI Protocol for Metadata Harvesting

Author(s):Thomas G. Habing , Kat Hagedorn , Sarah L. Shreeves and Jeffrey A. Young

Source:Library Trends. 53.4 (Spring 2005): p576.

Mission of OAI – “to develop and promote interoperability standards that aim to faciliatet the efficient dissemination of content.”

- Developed the Protocol for Metadata Harvesting – a tool that “facilitates interoperability between disparate and diverse collections of metadata through a relatively simple protocol based on common standards.”

OAI world = data providers and repositories – they make their metadata available through the protocol.

Mission of Open Language Archives Community (OLAC) is to create “a worldwide virtual library of language resources through development of community-based standards for archiving and interoperability and a network of interoperable repositories.”

- Uses OAI Protocol to provide access to metadata harvested from 27 data providers

Sheet Music Consortium – group of 4 academic libraries (UCLA, Johns Hopkins, Indiana, Duke) that are building a freely available collection of digitized sheet music.

National Science Digital Library (NSDL) provides access to collections of science-based learning objects.

- Again, OAI protocol is primary means of aggregating the metadata describing this content.

Comprehensive, searchable registry of OAI repositories being developed by UIUC

ERRoLs = Extensible Repository Resource Locators – ERRoL Resolution service automatically extends features to any OIA repository in the UIUC registry instead having to change repository by repository.

Web Search Engines: Part 1

Hundreds of thousands or servers needed for larger search engines.

I never really understood how spamming worked until I read this. Really interesting that spammers create invisible content.

Web Search Engines: Part 2

Inverted file = a concatenation of the postings lists for each distinct term. Two phases to creation – scanning and inversion.

Search engines assign a popularity score to pages based on frequency of clicks and other factors.

Avg query length 2.3 words

http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104

BrightPlanet is the only search technology that can make dozens of direct queries simultaneously – ie can organize and retrieve both “deep” and “surface” web content.

Deep web sites tend to be narrower with deeper content than surface sites.

95% of deep web is publicly accessible.

Search engines w/largest # of websites indexed (Google, Northern Light, etc) index no more than 16% of the surface web!

Deep web is 500x larger than surface web

“These observations suggest a splitting within the Internet information search market: search directories that offer hand-picked information chosen from the surface Web to meet popular search needs; search engines for more robust surface-level searches; and server-side content-aggregation vertical "infohubs" for deep Web information to provide answers where comprehensiveness and quality are imperative.”

BAT 2600

Friday, March 30, 2012

Week 12 Reading Notes

No comments:

Post a Comment