w/c 27 August – Discovery News Roundup

August 31, 2012

Earlier this month OCLC announced their recommendation that member institutions use the Open Data Commons Attribution (ODC-BY) License when releasing WorldCat-derived library catalogue data. You can read David Kay’s response to that announcement here on the Discovery blog. And last week there was news that OCLC and Europeana are collaborating on a project developing ‘semantic similarity’ that will improve the experience of searching aggregated metadata by identifying items that are near duplicates or related to each other. The wider significance of this project is that it will feed into the Europeana Data Model and will provide “opportunities to develop new data services for third parties.”

It’s long been asserted that content is king, and more recently that context is queen but now Associated Press are investing in marrying the two in order to speed up the distribution of content. Clearly the Associated Press business model is based on syndication rather than aggregation but Melody K. Smith’s assertion that “[f]indability only works when a proper taxonomy is in place.” seems worth some thought with regard to its relevance for our sectors.

TechCrunch’s article pitching Mendeley’s open API against Elsevier’s closed API is flawed but it’s worth reading for the comments it provoked, particularly from Elsevier’s Director of Platform Integration, Ale De Vries. You can read more about the growth of Mendeley’s API service on the Guardian Technology blog – it’s interesting to note that their future plans involve developing their API service into a multi-directional dataflow that will allow applications built on their API to talk to each other and to upload data to Mendeley.

Seb Chan’s candid blogpost reflecting on the Cooper-Hewitt Design Center’s experience of openly releasing their collection metadata is a useful and timely reminder that a) issues around the quality of released metadata need to be addressed if we want anyone to use the data we’re releasing and b) “collection metadata [has value as a tool for discovery] but it is not the collection itself.” Seb’s point about museum collections being no match for the comprehensiveness of libraries and archives highlights the importance of open metadata, by enabling cross-institutional aggregation, and the work of OCLC and Europeana’s ‘semantic similarity’ project I mentioned above. In an ideal world it will also enable the public permeability that Seb touches on by connecting our collections with the boundless ‘amateur web’ corpus.

News from Discovery projects


Remember the licence!

August 24, 2012

Go for open, no banana skins

Following what might be regarded as the game-changing Harvard release of open bibliographic metadata with a CC0 licence in April 2012, OCLC has taken considerable steps to recognise the importance of open metadata to library services and wider resource discovery practice.

On 6th August, the Library Journal headlined the OCLC recommendation that member institutions that would like to release their catalogue data on the Web should do so with the Open Data Commons Attribution License (ODC-BY). For more details, see: http://bit.ly/MP63Dc.

However, the Discovery programme has consistently emphasised that attribution is a big banana skin in terms of practical implementation and on account of the associated Fear, Uncertainty and Doubt (the FUD factor), whilst ironically carrying little likelihood of practical enforcement under the law. This position is at the heart of the Discovery principles and is very well articulated in a subsequent Creative Commons blog post – see http://creativecommons.org/weblog/entry/33768

So we propose that open metadata is increasingly mission critical as libraries reach out to new services and that public domain licensing is the best (perhaps only?) way to engender widespread community confidence in this journey.

Don’t forget the licence

On the downside, Cloud of Data’s Paul Miller recently posted his analysis of the use of open licenses associated with data releases registered at the OKFN Data Hub. Paul’s headline findings were that:

  • Half of the 4,000 registered open data sets have no license at all
  • Only 12% of licensed data sets use either CC0 or ODC-PDDL

These stats do not reflect badly on libraries, archives and museums as the Data Hub has attracted open data releases from a wide variety of sources. However, it would be good to see more public references to the UK institutions and Discovery projects that have released open metadata explicitly linked to a public domain licence – i.e. CC0 or ODC-PDDL

So why not consider the following options:

The Data Hub

The Data Hub is maintained by CKAN and was the source of information for Paul Miller’s blogpost. There is a simple slideshow tutorial about registering releases (whether uploads or links) at http://docs.ckan.org/en/latest/publishing-datasets.html

The web upload form is at http://thedatahub.org/dataset/new. As well as being linked to the submitter’s details, it is limited to just

  • title
  • license
  • free text description

It would be good to see UK open metadata releases registered there, with a clear link to CC0, ODC-PDL or whatever other licence has been selected. Given the limited data entry form, why not include reference to the Discovery principles and / or your project in the free text description description.

The Creative Commons CC0 exemplars webpage

http://wiki.creativecommons.org/CC0_use_for_data

Clearly this applies only to those of you that have opted for the CCO license. As you can see, you’ll be in good company. My assumption is that you should simply email info@creativecommons.org (perhaps marked FAO Timothy Volmer) with your request to be on the page, providing a simple statement in line with the style of the page plus a logo.

Postscript – On recommending choice

Without doubt, Attribution has its place in the scheme of things digital – but not ideally in relation to the assertion of uncertain ‘rights’ amidst the mosaic of public domain information and distinct intellectual endeavour that constitutes the world’s bibliographic records.

Perhaps there are lessons to be learned from elsewhere about offering choice to contributors – for example from Flickr, which presents contributors with choices including the various variants of Attribution – see http://www.flickr.com/creativecommons/.

Similarly, the University of California at Santa Cruz recommends CC attribution options for public contributions to its Grateful Dead Online Archive (http://www.gdao.org/contribution). This seeks to encourage contribution of digital objects by guaranteeing credit to members of the public, which seems appropriate for the particular GDAO community context. Their options are set out below.

PS – I wonder if the description of the GDAO target community as one of ‘shared inspiration and adaptation’ has some equivalence to the global community of cataloguers, bibliographers, archivists and curators that have built up our scholarly metadata.