Remember the licence!

August 24, 2012

Go for open, no banana skins

Following what might be regarded as the game-changing Harvard release of open bibliographic metadata with a CC0 licence in April 2012, OCLC has taken considerable steps to recognise the importance of open metadata to library services and wider resource discovery practice.

On 6th August, the Library Journal headlined the OCLC recommendation that member institutions that would like to release their catalogue data on the Web should do so with the Open Data Commons Attribution License (ODC-BY). For more details, see: http://bit.ly/MP63Dc.

However, the Discovery programme has consistently emphasised that attribution is a big banana skin in terms of practical implementation and on account of the associated Fear, Uncertainty and Doubt (the FUD factor), whilst ironically carrying little likelihood of practical enforcement under the law. This position is at the heart of the Discovery principles and is very well articulated in a subsequent Creative Commons blog post – see http://creativecommons.org/weblog/entry/33768

So we propose that open metadata is increasingly mission critical as libraries reach out to new services and that public domain licensing is the best (perhaps only?) way to engender widespread community confidence in this journey.

Don’t forget the licence

On the downside, Cloud of Data’s Paul Miller recently posted his analysis of the use of open licenses associated with data releases registered at the OKFN Data Hub. Paul’s headline findings were that:

  • Half of the 4,000 registered open data sets have no license at all
  • Only 12% of licensed data sets use either CC0 or ODC-PDDL

These stats do not reflect badly on libraries, archives and museums as the Data Hub has attracted open data releases from a wide variety of sources. However, it would be good to see more public references to the UK institutions and Discovery projects that have released open metadata explicitly linked to a public domain licence – i.e. CC0 or ODC-PDDL

So why not consider the following options:

The Data Hub

The Data Hub is maintained by CKAN and was the source of information for Paul Miller’s blogpost. There is a simple slideshow tutorial about registering releases (whether uploads or links) at http://docs.ckan.org/en/latest/publishing-datasets.html

The web upload form is at http://thedatahub.org/dataset/new. As well as being linked to the submitter’s details, it is limited to just

  • title
  • license
  • free text description

It would be good to see UK open metadata releases registered there, with a clear link to CC0, ODC-PDL or whatever other licence has been selected. Given the limited data entry form, why not include reference to the Discovery principles and / or your project in the free text description description.

The Creative Commons CC0 exemplars webpage

http://wiki.creativecommons.org/CC0_use_for_data

Clearly this applies only to those of you that have opted for the CCO license. As you can see, you’ll be in good company. My assumption is that you should simply email info@creativecommons.org (perhaps marked FAO Timothy Volmer) with your request to be on the page, providing a simple statement in line with the style of the page plus a logo.

Postscript – On recommending choice

Without doubt, Attribution has its place in the scheme of things digital – but not ideally in relation to the assertion of uncertain ‘rights’ amidst the mosaic of public domain information and distinct intellectual endeavour that constitutes the world’s bibliographic records.

Perhaps there are lessons to be learned from elsewhere about offering choice to contributors – for example from Flickr, which presents contributors with choices including the various variants of Attribution – see http://www.flickr.com/creativecommons/.

Similarly, the University of California at Santa Cruz recommends CC attribution options for public contributions to its Grateful Dead Online Archive (http://www.gdao.org/contribution). This seeks to encourage contribution of digital objects by guaranteeing credit to members of the public, which seems appropriate for the particular GDAO community context. Their options are set out below.

PS – I wonder if the description of the GDAO target community as one of ‘shared inspiration and adaptation’ has some equivalence to the global community of cataloguers, bibliographers, archivists and curators that have built up our scholarly metadata.


w/c 16 July 2012 – Discovery News Roundup

July 20, 2012

The past few weeks have seen some fairly significant announcements within the UK, in Europe and beyond, regarding linked data, APIs and global discovery services. Below are some of the highlights:

  • The European Library’s new union search portal was launched at the LIBER Conference and opens up online access “to more than 200 million records from the collections of national and research libraries across Europe” – UK contributors to the initiative are:British Library, Wellcome Library, Bodleian Libraries, University College London and the National Library of Wales. European Library The European Library has made an OpenSearch API available for developers on a non-commercial use basis but unfortunately it is only available to member libraries.

“One other thing that came up is that the situation of libraries differs greatly from other cultural institutions. This [is] because they are often not the owners of their metadata, but buy this from a commercial company. This means that open data is often not discussed in the library world because they argue that it is not their choice to make. As a result the librarians remain invisible in the discussion about how [to] provide service in a digital age.”


Views from the SCONUL Fringe

June 16, 2012

Views from the SCONUL fringe

14 June, 2012

It’s a year and a few days since the JISC-funded Discovery initiative launched the Discovery principles. Since then Discovery has been working closely with membership organisations, such as M25, RLUK and SCONUL in the academic library space, to understand how those principles fit with the realities of service management and delivery.

Through two recent workshops involving library leaders from the SCONUL community, we have been able to capture real voices on where we are with Discovery services in academic libraries and the perceived direction of travel. We’ve tried to draw these opinions and insights together in a short paper that is faithful to the opinions of the participants, ‘Making resources more discoverable – a business imperative?’, which was published for the SCONUL conference in Liverpool (June 2012).

This post highlights headlines from that paper that were discussed at the SCONUL Fringe event.

Our thanks go to the library leaders from 31 universities listed here who attended the workshops and to others who we met at the SCONUL Fringe: Anglia Ruskin, Birkbeck, Bournemouth, Bradford, Brunel, Buckingham, City, East Anglia, East London, Edinburgh, Hertfordshire, Kent, Kings College London, Leeds Met, Leicester, Lincoln, LSE, Manchester, Open University, Portsmouth, Royal Holloway, SOAS, Southampton, Stirling, Sussex, Swansea, University of the Creative Arts, West of England, Warwick, Westminster, and Wolverhampton.

The Discovery Problem Space

  • The student and researcher experience is a crucial business focus.
  • Academic libraries have therefore invested recently in ‘Discovery Layer’ products to lift resource discovery beyond the bounds of the local OPAC and to cohere access to resources within and beyond the institution.
  • The Discovery initiative looks ahead of that point. It highlights the imperative for highly flexible services based on open data and cost-effective aggregation, not limited by traditional boundaries between libraries, archives, museums and repositories and potentially extending to domains such as teaching and learning resources and research data.
  • Discovery thinking is very much aligned with a significant “dot gov” response to the data and service challenge at the National Digital conference (May 2012): “ Do Less – Government should only do what government can do. If someone else is doing it – link to it. If we can provide resources (like APIs) that will help other people build things – do that. We should concentrate on the irreducible core.”

So … what is the ‘irreducible core’, not just for the academic library as currently defined but for the institution and the breadth of its scholarly resources?

Discovery is core business, underscored by student facing imperatives

  • User satisfaction is the major driver, so tracking and analysis will be crucial
  • Students and researchers display different discovery patterns and entry points (e.g. Google, reading lists, VLE for students) and what they expect to ‘discover’ varies across levels and subjects, raising the challenge of personalisation
  • Being in sync with global search engines is crucial; Google brings in most traffic to those who have exposed their records
  • However, there is tension between comprehensiveness and the specificity required to enable learning and research and therefore between one stop shop and niche interfaces
  • Our strategy should therefore be that there is “no wrong door”, ensuring that resources are discoverable through the prevalent range of channels; e.g. the local discovery layer, the VLE, Google search and relevant aggregations
  • Consequently current ‘first generation’ discovery layer services need to be placed in perspective – as stepping stones or holding points as we seek more complete solutions, genuinely independent of the vendor LMS and perhaps involving such as full text mining

Possible Game Changers … Surf’s up or Tsunami warning?

  • The large scale take-on of e-books and associated acquisition models present a key opportunity to review discovery services and how they support user workflows
  • The bigger scholarly picture covering repositories and research data management is institutionally challenging and cannot be ignored by libraries
  • Increasing pressures to support cross-institutional collaboration and inter-disciplinary requirements have a bearing on resource discovery and delivery
  • Open access and user generated content have their parts to play – but where does co-creation sit with reputational and quality focus
  • The potential of aggregations (perhaps revisiting such as the National Union Catalogue and the National Digital Library) and other forms of collaboration should be reviewed in the context of emerging discovery models
  • Analytics is expected to grow as an institutional requirement driven by both Effectiveness and Economy – and analysis limited to ‘bought’ resources may fail
  • Linked data or some such tidal wave could change everything, but not so quickly as to negate the former priorities

Strategic Institutional Challenges … Opportunity or Threat?

There are without doubt serious questions to be answered about the curation, discoverability and preservation of a wide range of content of interest to teaching, learning and research, which is often subject special systems and requirements; for example:

  • VLE content – resources are often hidden inside proprietary VLEs
  • Lecture recordings – need to be discoverable, notwithstanding strict controls in some cases
  • Research Data – whether metadata or the data itself, this a hot topic
  • External resources – it may be time for new technological approaches to such challenges

However, whilst we should, like Google, be aspiring to a boundless service, there are significant cultural, professional and operational challenges:

  • Who should define the scope of a library? A question facing any move to redraw service boundaries (e.g. to encompass learning and research content)
  • Do you value your life? The big institutional resource discovery picture is potentially a minefield, apples & oranges, raising both territorial and user perception issues
  • Where should you dedicate available effort to get most return? We operate in a time of austerity
  • Where do you develop and locate the skills to operate in a wider landscape, involving such as research data? Libraries have unique expertise but new skills are crucial – do they fit in the library or elsewhere, such as IT services?
  • How can we benchmark plans and services in this volatile space? There is a dynamic and iterative requirement to develop use cases and associated metrics
  • What about reputation? New boundaries raise the uncontrollable nature of innovation with its potential to tarnish the institutional or library brand

Postscript – There was general agreement that these issues are too ‘big’ to resolve institution by institution … and therefore the community needs to work together (almost certainly with vendors), to identify critical use cases and associated services and skills, to define impact measures and promote analysis methods. Cue RLUK, SCONUL, the JISC Library programme and Discovery partnership …


w/c 11 June 2012 – Discovery News Roundup

June 12, 2012

Although I’ve been away over recent weeks, activity within the world of open metadata has continued unabated – here is my digest of activity from within the Discovery programme and from further afield.

Joy Palmer’s talk at the Joint Content and Discovery Programme contained a wealth of information about the current open metadata landscape, including links to a still relevant 2010 Economist article on the ‘data deluge’ (see also their report on the potential, and conversely the problem, of ‘superabundant data’). I’d argue that the increased quantity of data isn’t necessarily creating lots of new information management issues but it certainly makes those issues more visible and more pressing as soon as we move from passively collecting data to wanting to actively exploit the potential of that data.

Last month OCLC released the Virtual International Authority File (VIAF) dataset under an open data licence, together with their guidance on attribution. OCLC has also recently launched their WorldShare Management Service which provides libraries with “a new approach to managing library services cooperatively, including integrated acquisitions, cataloging, circulation, resource sharing, license management and patron administration, as well as a next-gen discovery tool for library users.” (emphasis is mine).

America’s National Institutes of Health (NIH) presented a showcase of the National Library of Medicine APIs that are available to developers. A recording of the live webcast is available to view online. The NIH has clearly decided to move beyond the more commonly found ‘build it and they will come’ approach and are actively engaging the developer community to help them understand what APIs are available. More recently they ran a two day Health Datapalooza event which brought together NLM data experts and developers. The event was livestreamed and you can view the archived video online.

Closer to home, discussion of data in The Guardian has made it out of their Data Store pages and into the pages of their Culture Professionals Network blog. Patrick Hussey has written a three -part wide ranging exploration of data within the arts and culture sector which argues that it is time to open up performance paradata and look at ways of making their shared data count. Patrick’s main focus is on open data rather than open metadata but the series is very thought provoking and in his second article he points to the work of The National Archive in creating an open API database legislation in the shape of: http://www.legislation.gov.uk/

The BBC Connected Studio project is an open collaboration initiative that kicked off in May and is initially focused on developing new approaches to personalisation using DevCSI-style hackspace gatherings to bring together digital talent from outside the BBC. Later this year the focus shifts to “connected platforms and big data” which could mean some interesting developments that the MLA sectors might benefit from and opportunities for MLA developers to get involved by responding to Connected Studio call for participants.

The BBC Online team have managed to communicate their search and discovery strategy very clearly in the second of the videos included within this Connected Studio blogpost.

link to the BBC blogpost containing the video

The Imperial Museum is heading an international partnership of organisations in the run-up to the beginning of a four-year programme of activities to commemorate the First World War Centenary: “Through the partnership, colleagues from a variety of sectors [including museums, archives, libraries, universities and colleges, special interest groups and broadcasters] have the opportunity to communicate with each other, share and combine resources, cooperate and co-develop products and services that complement each other [...]”.  It will be interesting to see whether any developments similar to the Will’s World Discovery aggregation project emerge as a result of such a broad collaborative partnership.


Discovery Licensing Clinic

May 23, 2012

Image

photo credit: Ed Bremner

The first Discovery Licensing Clinic brought together representatives from a number of different libraries, archives and museums to spend a day considering practical responses to the Discovery open licensing principles and getting practical guidance from the assembled experts. It was an opportunity to identify issues and discuss the range of tactics that institutions might adopt in scoping metadata releases and making the associated licensing decisions.

Our panel of experts on the day consisted of Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data) and Chris Banks (University Librarian & Director, Library, Special Collections & Museums, University of Aberdeen)

Chris Banks has written a blogpost reflecting on the day and her presentation slides can be viewed below:

The issues around licensing open metadata do represent a significant hurdle for institutions but none of those issues are insurmountable. Our hope is that licensing clinics such as this one, and the ones we plan to run in the future, will give managers and decision makers the knowledge they need to progress the open metadata agenda within their organisation.


Highlights from the Content and Discovery Joint Programme event

May 22, 2012

On the 23rd April colleagues from projects across the Discovery, JISC Content, JISC OER and Emerging Opportunities programmes gathered in Birmingham to share knowledge and identify shared challenges and key agendas that need to be progressed going forward. As is often the way with these types of events the discussions that took place over a day and a half were as useful to those running the event as they were for the delegates attending. The notes below represent just a handful of my highlights.

Joy Palmer presented on behalf of the Discovery Programme and gave a compelling overview of the challenges and aspirations we share around the discovery of content. She highlighted how, as the RDTF work was translated into the Discovery initiative, it became clear that we needed to talk in terms of an ecosystem as opposed to an ‘infrastructure’ because the latter suggested that the initiative was aiming to impose an overarching infrastructure model over the entire museums, libraries and archives (and JISC) discovery space.

“To a large degree, what today is about is determining to what degree we can operate as a healthy and thriving ecosystem, where components of our content or applications interact as a system, linked together by the flow of data and transactions.”

But as Joy stated, this is not to oversimplify matters. Her talk touched on the many apparently competing theories about how to enable discovery in the dataspace, highlighting the complexity we’re all confronting as we make decisions about the discovery and use of our data: Big Data and The Cloud, Paradata, Linked Data, Microdata, and the ‘return’ of Structured Data.

But in terms of our shared goals to have our content discoverable or useable via the web, she explained it is the tactic of opening up data that is relevant to us all, even if our challenges in achieving ‘openness’ differ.

The slides from Joy’s presntation are available to view on Slideshare:

Discovery: Towards a (meta)data ecology for education and research

View more PowerPoint from joypalmer

In the afternoon I facilitated Andy McGregor and David Kay’s session on business cases where the participants obligingly contributed to David’s mapping exercises.

There were some interesting discussions around the participants’ experience of writing business cases, including useful suggestions for getting the most out of building a business case:

  • Predicting and measuring benefit are key challenges to overcome but we can do that by using the data at our disposal to create a convincing narrative. However it’s not about manipulating that data and making up stories retrospectively, we need to put energy into building robust analytics that help communicate our story clearly and convincingly.
  • Filling out a business case template shouldn’t be an activity that only happens in order to secure funding or other resources – it can be very useful to reiterate the process throughout the course of the project in order to track any changes in the course of the project.

The following links may be useful if you are interested in building robust business cases:

In the plenary session on day two the conversations centred around a number of discussion points:

  • Terms such as ‘microdata’ (machine-readable semantic tagging of webpage content) and ‘paradata’ (usage analytics or contextual information about data/metadata) were new to some of the participants and this prompted a discussion around the seemingly unavoidable challenge of jargon that we face within the Discovery arena. One suggestion was that instead of working to define a stronger vocabulary that is understood by all, perhaps we should be identifying stronger metaphors which everyone can relate to; metaphors that communicate the vision of what we are working towards and help everyone understand how they can get involved with delivering that vision within their own context.
  • We should be stepping outside of the sector to see the potential for emerging areas of activity (e.g. paradata). Looking to those sectors who are ahead of the game saves the library, museum and archives sectors having to try and work from a blank page. We also need to identify where our sectors are ahead and recognise how those advantages leave us well positioned to make significant progress.
  • Projects would benefit from a system of ‘evaluation buddies’ from within their programme to help uncover evidence of project impact and then share this evidence, together with highlighting any awards and recognition won by projects. This will help institutions build their internal business cases for bidding to run and then embed JISC projects in the future. There was also the suggestion that JISC could usefully build a collection of the major use cases (in a similar way to the Open Bibliographic Data Guide) together with short case studies that demonstrate the institutional impact.
  • Across the two days there were mentions of ‘microdata’ (machine-readable semantic tagging of webpage content), ‘big data’ (i.e. high volume) and ‘heavy data’ (data which ‘stretches current infrastructure or tools due to its size or bulk’ but the argument was made that the primary objective should be to produce ‘simple data’ (data that is both simple to produce and simple to consume).
  • There was recognition that aggregation is an art not a science and that current data standards are a) opinion, not fact and b) open to interpretation. High quality data is key to producing usable datasets but there was a question about how that quality can be defined. One suggestion was that data clean-up is a highly specialist service that should be decoupled, as per the government’s view with regard to open data.

Some key takeaway points for the Discovery programme:

  • Information about the Discovery programme, its projects and the underlying principles should be in a format that is ‘reframeable’, making it easy for interested parties to access information on their terms and cascade that information to their own audience or stakeholders.
  • Identifying and highlighting the tangible benefits of the Discovery Priniciples enables supporters of those principles to embark on fruitful conversations with colleagues in their institutions.
  • There is huge benefit in sharing the learning and challenges from within, and without, the Discovery programme.  An ongoing process of synthesis, re-synthesis and distillation will extract maximum value from the activity taking place across the Discovery initiative.
  • The quality of metadata is key to the success of Discovery initiatives – we need to explore how high quality metadata is defined and ensured.

Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012

Image

Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

The rise of REST – ‘Hot API Protocols and Styles’ from John Musser of Programmableweb.com at SXSW 2012

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous. John’s now made his slides available on slideshare.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:


Follow

Get every new post delivered to your Inbox.