BAT 2600: March 2012

Friday, March 30, 2012

Week 12 Reading Notes

Current developments and future trends for the OAI Protocol for Metadata Harvesting

Author(s):Thomas G. Habing , Kat Hagedorn , Sarah L. Shreeves and Jeffrey A. Young

Source:Library Trends. 53.4 (Spring 2005): p576.

Mission of OAI – “to develop and promote interoperability standards that aim to faciliatet the efficient dissemination of content.”

- Developed the Protocol for Metadata Harvesting – a tool that “facilitates interoperability between disparate and diverse collections of metadata through a relatively simple protocol based on common standards.”

OAI world = data providers and repositories – they make their metadata available through the protocol.

Mission of Open Language Archives Community (OLAC) is to create “a worldwide virtual library of language resources through development of community-based standards for archiving and interoperability and a network of interoperable repositories.”

- Uses OAI Protocol to provide access to metadata harvested from 27 data providers

Sheet Music Consortium – group of 4 academic libraries (UCLA, Johns Hopkins, Indiana, Duke) that are building a freely available collection of digitized sheet music.

National Science Digital Library (NSDL) provides access to collections of science-based learning objects.

- Again, OAI protocol is primary means of aggregating the metadata describing this content.

Comprehensive, searchable registry of OAI repositories being developed by UIUC

ERRoLs = Extensible Repository Resource Locators – ERRoL Resolution service automatically extends features to any OIA repository in the UIUC registry instead having to change repository by repository.

Web Search Engines: Part 1

Hundreds of thousands or servers needed for larger search engines.

I never really understood how spamming worked until I read this. Really interesting that spammers create invisible content.

Web Search Engines: Part 2

Inverted file = a concatenation of the postings lists for each distinct term. Two phases to creation – scanning and inversion.

Search engines assign a popularity score to pages based on frequency of clicks and other factors.

Avg query length 2.3 words

http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104

BrightPlanet is the only search technology that can make dozens of direct queries simultaneously – ie can organize and retrieve both “deep” and “surface” web content.

Deep web sites tend to be narrower with deeper content than surface sites.

95% of deep web is publicly accessible.

Search engines w/largest # of websites indexed (Google, Northern Light, etc) index no more than 16% of the surface web!

Deep web is 500x larger than surface web

“These observations suggest a splitting within the Internet information search market: search directories that offer hand-picked information chosen from the surface Web to meet popular search needs; search engines for more robust surface-level searches; and server-side content-aggregation vertical "infohubs" for deep Web information to provide answers where comprehensiveness and quality are imperative.”

Wednesday, March 28, 2012

Week 11 Lab

For Web of Knowledge, I used the following query:

"digital library" in topic AND virtual reference in topic AND 2008-2012 in year published, or Topic=("digital library") AND Topic=(virtual reference) AND Year Published=(2008-2012).

For Google Scholar, I used the following query:

virtual OR reference "digital library" between 2008 and 2012.

Tuesday, March 27, 2012

Week 11 Reading Notes

http://www.dlib.org/dlib/july05/mischo/07mischo.html

“there is a huge difference between providing access to discrete sets of digital collections and providing digital library services.”

I find it interesting how Google fits into all this – I personally think Google, for general search on the web, lacks a real competitor (I would never use anything else). I find Google scholar to be quite cumbersome, however. I wonder if this has to do with something in this article which I didn’t understand: “metadata searching vs. full-text searching.”

- Does this simply mean the searching of metadata vs. the searching of metadata + the rest of the information source in full?

- Does Google Scholar perform a metadata search or full-text?

http://www.dlib.org/dlib/july05/paepcke/07paepcke.html

1994 National Science Foundation launched Digital Libraries Initiative.

Brought together librarians and computer scientists

“While information accession now rests on a highly technical infrastructure, the core function of librarianship remains. The information must be organized, collated, and presented.”

Interesting that Computer Scientists now seem to count on librarians for publishing and organizing of their scholarly material, and that this may be the road for other disciplines.

http://www.arl.org/bm~doc/br226ir.pdf

“The development of free,

publicly accessible journal article collections in disciplines such as high-energy

physics has demonstrated ways in which the network can change scholarly

communication by altering dissemination and access patterns; separately, the

development of a series of extraordinary digital works had at least suggested

the potential of creative authorship specifically for the digital medium to

transform the presentation and transmission of scholarship.”

DSpace is a model institutional repository system

This is exciting to me as a student and library professional in general, but I wonder how it applies to public libraries…

Saturday, March 17, 2012

Week 10 Reading Notes

http://www.ibm.com/developerworks/xml/tutorials/xmlintro/xmlintro-pdf.pdf

http://www.ibm.com/developerworks/xml/library/x-stand1/index.html

http://www.w3schools.com/Schema/default.asp

XML = Extensible Markup Language; markup language you can use to create your own tags

XML simplifies data interchange, enables smart code, and enables smart searches.

XML Elements can’t overlap

Like HTML, must have end tags, but unlike HTML IS case sensitive.

Based on SGML

XML Schemas are the Successors of DTDs, they support data types, use XML Syntax, and secure data communication.

I, personally, will be thrilled when we move on to something other than webpage design stuff. I find this to be not particularly applicable to the library profession, and would be grateful to anyone who wouldn’t mind enlightening me as to why we are bothering to learn this?

All of this stuff makes perfect sense if I am following the tutorials step-by-step, but I can’t imagine memorizing it all, especially considering how infrequently I’ll use it.

Week 9 Lab

http://www.pitt.edu/~bat37/lab9.html

Thursday, March 1, 2012

Week 9 Reading Notes

http://en.wikipedia.org/wiki/HTML5

http://www.w3schools.com/html5/default.asp

HTML5 is…

- the fifth revision of HTML. It includes all the features of HTML4, XHTML 1 and DOM Level 2 HTML. “the new standard”

- meant to also run on low-powered devices like smart phones and tablets.

- Still a work in progress

- A cooperation between W3C and WHATWG

- Meant to reduce the need for external plug-ns

New features include <canvas> element for 2D drawing, <video> and <audio>, and content-specific elements like <article>, <footer>, etc…

I am most excited about potentially not needing a million different plug-ins for each browser that I have to update constantly and keep up with. It is definitely more of a problem on networked computers like at my work than at home, but both will be made more convenient.

I don’t like the geolocation feature – is this why Google maps is always trying to ask my current location? I don’t want that kind of info out there – its too far. Really freaky stuff… especially for computer illiterates whose privacy may be compromised without them even being aware of it.

Application caches sound interesting, but I am not sure I fully understand what they do. How can you browse a web page without being connected to the internet?

http://www.w3schools.com/html/html_xhtml.asp

"XHTML is a stricter and cleaner version of HTML"

I don’t understand why it is bad to have errors in HTML code if the browser still displays them properly…

I also don’t understand why we should bother to learn it if HTML5 covers it and more.