danah boyd and Michael Gorman slug it out

I was supposed to blog the danah boyd keynote [note: she has now posted it here], but it’s now difficult to view it outside the context of the subsequent Michael Gorman luncheon speech. When I dutifully met with the other members of the LITA Forum 2005 committee on Sunday morning, they remarked that attendees had found boyd "provocative", and at least anecdotally it sounds like you all enjoyed the juxtaposition of the two speakers. We think of them as on opposite sides of the spectrum. Are they?

Both addressed the failings of tools like Google to capture and use important metadata, as follows:

Gorman: Google lacks both precision and recall — it retrieves too many results, that are not the best results for the question at hand. [I would argue here that Google lacks access to the best results (buried as they are in proprietary and/or password-protected OPAC and vendor databases) so we and our vendors have set Google up to fail in recall. And if you consider that for ridiculously broad queries it retrieves only 30,000 hits among its seventy-bazillion indexed pages then perhaps it’s not doing too badly on precision.]
boyd: Google doesn’t distinguish relevance according to date, which particularly with blogs, can be misleading — "I still get daily comments on postings from 1997" — nor does it provide any indication of context or community — these comments come from people who randomly found her blog and of course aren’t interested in her or her research, but have wandered in by accident. "How do we create safe space?" if everything is indexed. [I would argue here that until Google gets around to becoming massively more complex, this might be why it’s still necessary for librarians to hand-index the Internet, crazy as it sounds, to point people not only towards their topics of interest (websites generally) but perhaps also towards their communities of interest (blogs and forums).]
Gorman: We need better metadata hand-encoded in pages. But maybe that’s not enough.
boyd: We need better metadata automatically derived by search engines. But maybe that’s not enough.

Gorman in his luncheon speech and otherwise, lately seems to exhort librarians to band together with publishers of reliable information, against the tide of bloggers and full-text web search engines, to create a searchable corpus of the retrievable and reliable.

boyd in her keynote told us that librarians have in the past been accused of being intellectual property pirates themselves, and exhorted us to band together against the publishing establishment that now scorns bloggers and sues Google: "Librarians are some of the best defenders of civil liberties. Put on your eyepatch and say arrr!"

Okay, here are a few notes from boyd’s actual talk:

Like many digital media, blogging is a form of orality. It’s primarily communication, not publishing.

Blogs are persistent (they archive and their archives are indexed) but their content tends to focus on the moment. Blogs are worth looking at for the same reason libraries keep letters from the 18th century: they’re about people performing their lives, the modern equivalent of an archive of old letters.

Blogging [in the diary sense I gather] is used as a way to feel out what is appropriate, to model bloggers’ own lives and then see the result. It’s a form of talking and performing in public. But then, because of the persistence of blogs, it’s different from talking to the public nearby. Blogs acquire a public dispersed not only in space across the web, but in time. [Imagine Anaïs Nin not realizing her diaries would be seen by everyone, and still seen now?]

Blogging contains a lot of remix — pulling in pieces of others’ communications. But then they serve as redistribution of intellectual property. What is fair use? What happens when a remix becomes popular — is it still fair use? It wasn’t too long ago that librarians were seen as pirates.

(Cf. Roy’s End of the World As We Know It in the opening keynote … darn good thing he didn’t include the song in his posting of his presentations!) [Note, shortly after this keynote I went to the Breaking out of the Box presentation in which Raymond Yee called scholarship itself a form of remix. What do you think of that?]

From the Q&A after the presentation:

One thing for librarians to know is, for bloggers, if it doesn’t exist online (can’t be linked to), it doesn’t exist.

People are getting into niches that are no longer about geography: There’s a huge decline in suicide rates among gay, lesbian, bi, transgender teens in the current generation because they are getting online young and learning they’re not alone. On the other hand, there are pro-bulimia and pro-cutting sites (although mental health professionals say it’s better that these sites are out there than for the bulimics and self-cutters to hide silently). Blogs (or the web at large) don’t really promote cross-cutting communities; people for the most part seek others with whom they have something in common.

Q: Libraries try to distribute information with a known level of reliability. How do we separate the wheat from the chaff among blogs?
A: Trying to separate it now is premature; what seems like chaff could be critical documentation of a period in time, in retrospect. Storage prices now allow us to save everything. It’s searching it that is still a problem. Metadata, context are lacking in our available search; there’s a need to learn to retrieve by quality indicators. Search engines don’t have this down yet.