Current developments and future trends for the OAI Protocol for Metadata Harvesting
Mission of OAI – “to develop and promote interoperability standards that aim to faciliatet the efficient dissemination of content.”
- Developed the Protocol for Metadata Harvesting – a tool that “facilitates interoperability between disparate and diverse collections of metadata through a relatively simple protocol based on common standards.”
OAI world = data providers and repositories – they make their metadata available through the protocol.
Mission of Open Language Archives Community (OLAC) is to create “a worldwide virtual library of language resources through development of community-based standards for archiving and interoperability and a network of interoperable repositories.”
- Uses OAI Protocol to provide access to metadata harvested from 27 data providers
Sheet Music Consortium – group of 4 academic libraries (UCLA, Johns Hopkins, Indiana, Duke) that are building a freely available collection of digitized sheet music.
National Science Digital Library (NSDL) provides access to collections of science-based learning objects.
- Again, OAI protocol is primary means of aggregating the metadata describing this content.
Comprehensive, searchable registry of OAI repositories being developed by UIUC
ERRoLs = Extensible Repository Resource Locators – ERRoL Resolution service automatically extends features to any OIA repository in the UIUC registry instead having to change repository by repository.
Web Search Engines: Part 1
Hundreds of thousands or servers needed for larger search engines.
I never really understood how spamming worked until I read this. Really interesting that spammers create invisible content.
Web Search Engines: Part 2
Inverted file = a concatenation of the postings lists for each distinct term. Two phases to creation – scanning and inversion.
Search engines assign a popularity score to pages based on frequency of clicks and other factors.
Avg query length 2.3 words
BrightPlanet is the only search technology that can make dozens of direct queries simultaneously – ie can organize and retrieve both “deep” and “surface” web content.
Deep web sites tend to be narrower with deeper content than surface sites.
95% of deep web is publicly accessible.
Search engines w/largest # of websites indexed (Google, Northern Light, etc) index no more than 16% of the surface web!
Deep web is 500x larger than surface web
“These observations suggest a splitting within the Internet information search market: search directories that offer hand-picked information chosen from the surface Web to meet popular search needs; search engines for more robust surface-level searches; and server-side content-aggregation vertical "infohubs" for deep Web information to provide answers where comprehensiveness and quality are imperative.”
No comments:
Post a Comment