w/c 24 December – Discovery news roundup

December 31, 2012

UK and JISC Discovery Project news:

Europeana news:

Other news from Europe and beyond:

Event reports:

  • The University of Leicester’s live debate, ‘Museums in the information age: Evolution or extinction?’, which took place at the Science Museum is available to listen to onlineA Guardian article covering the debate notes the importance of digital discovery: “[…] some digital resources produced by museums quickly become disposable if not easily discoverable by potential users.”
  • The Culture Hack event, which took place at the Google Campus, centred on envisaging new ways for London’s schoolchildren to interact with and be inspired by the city’s cultural heritage. Martin Belam’s blogpost provides a good overview of the day and highlights a thought provoking comment from the British Library’s Nora McGregor“It’s about teaching metadata to children.”

Call for contributions:


w/c 12 November – Discovery news roundup

November 15, 2012

One particularly interesting thing I noticed this past month was that tweets about open data, linked data and metadata were starting to come thick and fast from people within my network who sit well outside the library and cultural data domains. In particular the tweets from attendees of the Lasa’s Charity Digital Summit and the ‘Nesta in Manchester’ event about innovation seemed to include a rich vein of tweets about all things open. Perhaps an indicator that open data’s tipping point is approaching?

Some highlights from the world of resource discovery and open data in recent weeks:

Updates from a couple of large-scale projects in Europe and the US:

News from Discovery partner organisations:

Please note that places are still available for the second of our free Discovery Licensing Clinics on 30 November in London. It is an opportunity for managers and decision makers from libraries, archives and museums to get practical advice on open data licensing from our assembled team of experts.


Some preliminary highlights from the Discovery programme

November 15, 2012

The Discovery programme is nearing its completion date of December 2012. Most of the projects have finished or are wrapping up. Our efforts are now directed towards gathering together all that we have learned and produced in the programme.

The programme has covered a lot of ground so pulling everything together will take us some time. While that happens I thought it might be worth listing a selection of preliminary highlights of the programme. This blog is based on a talk I gave at the RLUK conference so the focus is on libraries and archives rather than museums.

Future approaches to Discovery

It is not clear what the future is for resource discovery. It is unlikely that there will just be one approach to resource discovery for libraries, museums and archives. The future is likely to be plural. While discovery has not developed firm answers on what the future is. We have experimented with a range of approaches and have identified those that are promising.

These approaches are recorded in the Discovery case studies and guidance site. They can be used to inform future plans in libraries, museums and archives. Or if the approaches seem promising enough they can be emulated or the tools that have been developed can be used. We are planning to produce a toolkit so that all these tools are in one place.

What is clear is that we are not alone in experimenting with these kind of approaches. This is a global movement with many and diverse institutions exploring similar approaches. The case studies and guidance recognise this by including explorations of the approaches of the Wellcome Trust, the Rijksmuseum and the Victoria and Albert Museum.

Innovative cataloguing

Resource discovery starts with cataloguing. The focus of the programme was not on cataloguing but a couple of interesting innovative approaches have emerged from the project.

The Institute of Education decided to explore new ways of cataloguing their collection. This involves the creation of basic records in Drupal, enriching of these records using professional cataloguer input then exporting of these records into the LMS. This may sound a roundabout way of doing things as I have written it but it was 3.5 times quicker and therefore cheaper than the current approaches. This allows the catalloguer to concentrate on the record enrichment by adding index terms. Full figures are available on their blog. They also developed lightweight ways to catalogue uncatalogued material which offers a significant saving in researcher time when using the material. More detail on this on the blog.

The second exploration of catalogues focused on the collection as a whole. The Copac collections management project used the copac data to create tool to allow librarians to analyse their collections and make decisions on which items can be removed from the collection and which are rare and need to be retained. This tool has been trialled by a number of libraries. During their trial, the University of Manchester found that the tool was 86% more effective than manual checking of the collection. Details on how this figure was arrived at can be found in the case study.

Greater impact through linking

Linking items in collections with relevant items in other collections offers the possibility of enabling richer resource discovery services and supports new and emerging research interests. Linked data is an intriguing option for enabling this. I don’t think the discovery programme has come up with a definitive answer on whether linked data is the future for libraries, museums and archives. But I think that the evidence is fairly strong that it will be a part of the future.

The programme included a number of projects experimenting with linked data for libraries and archives and there is work to be done to gather all of these together. However there are some headlines that we can report now:

  • The use case in archives seems to be strong as linking resources by place and person is something that should be useful to researchers and students
  • The step change project worked with Axiell to update CALM so that archives can create linked data records from within CALM. This functionality will be included in the next update and has the potential to benefit the large number of archives that use CALM. This linked data creation functionality is also available as a stand alone tool called Alicat.
  • Cambridge were able to create linked data records for 2.3 million books for their project which cost just under £40,000.
  • The ArchivesHub project Linking Lives has worked to use people as hooks to explore archive collections. This uses linked data and the model they have developed is being reused internationally. 
  • The Pelagios project has created a way to use linked data to identify ancient places in archive collections and there is a vibrant community growing around their approach.

Of course the Discovery programme is not alone in investigating linked data. The Library of Congress, OCLC, The British Library, Europeana and the DPLA are all using or investigating some form of linked data technology in pursuing their aims.

Linked data is not the only option for bringing different collections together and allowing people to use them in new ways. This can also be done with APIs and there are two discovery exemplar projects doing just this for Shakespeare and for WW1. Work on these is still underway but both are looking promising and offer some very interesting lessons for how to aggregate collections to enable new forms of resource discovery and research.

Enhanced shared services

We already have many shared services that help people discover those resources. Throughout the programme we have worked with those services to enhance them to help realise the resource discovery taskforce vision. It s worth a separate post on all of the ways the services have been developed so for now, I will just list the services that have been developed in the programme:

Business case

These are challenging economic times so it was important to address the business case for libraries, museums and archives to invest effort in improving resource discovery. The results of this work can be seen in the business case section of the discovery guidance. We worked with senior managers from libraries, museums and archives throughout the programme to ensure what we were doing address their needs. As part of this work we produced a series of videos where a selection of senior managers talk about their needs, challenges and predictions and they make for interesting viewing.

What’s next?

We are in the process of reviewing the Discovery programme and the resource discovery taskforce vision that kicked it all off. This review will produce a set of recommendations on what we should do next. These will be available in January. We will be looking to pull all of the outputs from the programme into a form that makes it easy for people to learn from the programme and to use what has been produced. We are also in the process of putting together an event for 2013 that brings together people from around the world that are working on addressing resource discovery challenges and seeing what we can learn from each other. More information on all of these things to follow soon.


Update on the activities of the Phase 2 Discovery projects

October 19, 2012

Latest news from the Phase 2 Discovery projects:

Last month all of the Phase 2 Discovery projects met to share information as they approach the end of their projects. Several of the projects have published their final blogposts which share their key lessons learnt and project outputs:


w/c 8 October – Discovery News Roundup

October 12, 2012

In recent weeks the Discovery Team have been finalising and releasing a whole suite of online materials which reflect our continued focus on the business case for Discovery. In September we released a collection of eight videos containing the reflections of UK academic library directors on topics such as the key issues and challenges for resource discovery, the value of making special collections visible and the potential of collaboration.

The videos were launched in our latest Discovery newsletter, along with our Case Study collection and Guidance Materials which aim to highlight and support current real-world practices relating to the Discovery Open Metadata Principles and Technical Principles within museums, libraries and archives.

The work of the Discovery programme has informed the latest animation from the OER IPR Support Project: ‘Open Data Licensing’ which you can view below. A key aspect of the Discovery programme’s approach is “establishing clarity of understanding around licensing and open data” so it’s good to see such a complex issue described in an accessible way – it doesn’t remove any of the inherent complexity but it breaks down and clarifies that complexity, which is an important initial step towards enabling action.

Some highlights from the wider world of resource discovery and open data:

  • In September, Europeana continued to set the pace in cultural data aggregation by opening up metadata for more than 20 million cultural objects for free use under the Creative Commons CC0 Public Domain Dedication licence. Their release represents the largest one-time dedication of cultural data to the public domain using the CC0 waiver and opens up the possibility of innovative apps, games, web services and portals being developed. The move also ‘holds the potential to bring together data [...] from other sectors, such as tourism and broadcasting’. As Jill Cousins, Executive Director of Europeana said: “This move is a significant step forward for open data and an important cultural shift for the network of museums, libraries and galleries who have created Europeana”. EC Vice President, Neelie Kroes referred to the Europeana release as a ‘treasure trove of cultural heritage.
  • At the Healthcare Efficiency Through Technology Expo this week, Garry Coleman talked about the NHS Information Centre’s plans for a large-scale open data release, involving millions of rows of data being made available under an Open Government Licence. This release reflects the wider importance of transparency as a motivator for open data, particularly within governmental/publically-funded organisations. It also could be a watershed moment for the release of anonymised sensitive data which could further open up the way for the, arguably much less contentious, sharing of open metadata that our sector is working towards.
  • I mentioned Cooper-Hewitt Labs’ Director, Seb Chan, in my digest last month and his latest blogpost about being ‘of the web’ rather than ‘on the web’ is another interesting read. They are embracing the porosity of the internet and working with websites such as Behance to surface their collections and associated information out in the wild. In doing so they are finding creative ways to tackle potential showstoppers such as control over branding and retaining attribution. Their approach enables them to keep their expertise focussed on activities that are within their own domain and offers up an interesting blueprint for externally located engagement and visibility.
  • Rewired State are running an ‘Open Science’ hack day event in partnership with the Wellcome Trust in December.
  • The Open Data Network have launched the Open Data Showroom website which looks like it will become a very useful ‘at a glance’ resource for finding interesting sources of, and uses for, open data.
  • Leigh Dodds’ blogpost identifies a simple model for exploring the sustainability of open data curation projects such as legislation.gov.uk.
  • A significant release of legislative open data was announced this week on the Open Knowledge Foundation website, which reported on the release of US Congress legislative data going back to 1973.
  • The latest Arts Council Digital R&D podcast focuses on how organisations can use digital technology to open up archives, collections and data. It includes news from the V&A and the British Museum and considers the impact of projects such as Google’s Art Project.
  • And staying with Google, this week saw the launch of the Google Cultural Institute which aims to “preserve and promote culture online”. The Cultural Institute website presents curated cultural artefacts in online galleries, together with search and browse facilities. The individual artefacts retain their attribution to the holding organisation and, in some cases, the associated metadata can also be viewed. It’s not immediately obvious how open the underlying data is but it appears to be a walled garden at the moment.

Twelve Themes and a few more

September 28, 2012

All 17 projects funded by JISC in Phase 2 of the Discovery programme met in Birmingham today to share updates and ideas as they wind down their efforts. It was a very stimulating meeting, not least because the shared Discovery dialogue seems to have developed significantly since during 2012. The Phase 1 projects undertook some very useful experiments, but the Phase 2 projects have taken things up a notch.

Here, in very raw form are the recurrent themes that I recorded as takeaways from the session

A – Data and access points

  • Time and Place are priority access points
  • URIs offer an effective base level linking strategy
  • Collection level descriptions have potential as finding aids across domains
  • User generated content, such as annotations, has a place at the table

B – People

  • Community is a vital driver – open communities maintain momentum; specialist enthusiasms and ways of working provide strong use cases
  • For embedding new metadata practice, start where the workers are – add-ins to Calm and MODS demonstrate that
  • More IT experience / skills are required on the ground

C - The way the web works

  • Aggregators crawl don’t query … OAI-PMH, Robots, etc
  • Google’s strength shouts ‘Do it my way’ – and we should take heed (but we do need both/and)
  • Currency of data is important – there may be a tension with time lags associated with crawling
  • Aggregators need to know what is where to build or add value  so … we don’t need a registry?
  • No man is an island – It’s a collaborative world with requirements to interact with complementary services such as Dbpedia, Europeana, Google Historypin, Pleiades, UKAT, VIAF

D - Tools and technology

  • There is opportunity / obligation to leverage expert authority data and vocabularies – examples as above and more, such as Victoria County History, …
  • Commonly used software tools include Drupal, Solr/Lucene, Elastic Search, Javascript, Twitter bootstrap
  • JSON and RDF are strong format choices amongst the developers
  • Beware SPARQL end points and Triple Stores, especially in terms of performance
  • APIs are essential – but little use without both documentation and example code
  • OSS tools have been built by several projects … but how do we leverage them (e.g. Bibsoup, Alicat)

w/c 27 August – Discovery News Roundup

August 31, 2012

Earlier this month OCLC announced their recommendation that member institutions use the Open Data Commons Attribution (ODC-BY) License when releasing WorldCat-derived library catalogue data. You can read David Kay’s response to that announcement here on the Discovery blog. And last week there was news that OCLC and Europeana are collaborating on a project developing ‘semantic similarity’ that will improve the experience of searching aggregated metadata by identifying items that are near duplicates or related to each other. The wider significance of this project is that it will feed into the Europeana Data Model and will provide “opportunities to develop new data services for third parties.”

It’s long been asserted that content is king, and more recently that context is queen but now Associated Press are investing in marrying the two in order to speed up the distribution of content. Clearly the Associated Press business model is based on syndication rather than aggregation but Melody K. Smith’s assertion that “[f]indability only works when a proper taxonomy is in place.” seems worth some thought with regard to its relevance for our sectors.

TechCrunch’s article pitching Mendeley’s open API against Elsevier’s closed API is flawed but it’s worth reading for the comments it provoked, particularly from Elsevier’s Director of Platform Integration, Ale De Vries. You can read more about the growth of Mendeley’s API service on the Guardian Technology blog – it’s interesting to note that their future plans involve developing their API service into a multi-directional dataflow that will allow applications built on their API to talk to each other and to upload data to Mendeley.

Seb Chan’s candid blogpost reflecting on the Cooper-Hewitt Design Center’s experience of openly releasing their collection metadata is a useful and timely reminder that a) issues around the quality of released metadata need to be addressed if we want anyone to use the data we’re releasing and b) “collection metadata [has value as a tool for discovery] but it is not the collection itself.” Seb’s point about museum collections being no match for the comprehensiveness of libraries and archives highlights the importance of open metadata, by enabling cross-institutional aggregation, and the work of OCLC and Europeana’s ‘semantic similarity’ project I mentioned above. In an ideal world it will also enable the public permeability that Seb touches on by connecting our collections with the boundless ‘amateur web’ corpus.

News from Discovery projects


Remember the licence!

August 24, 2012

Go for open, no banana skins

Following what might be regarded as the game-changing Harvard release of open bibliographic metadata with a CC0 licence in April 2012, OCLC has taken considerable steps to recognise the importance of open metadata to library services and wider resource discovery practice.

On 6th August, the Library Journal headlined the OCLC recommendation that member institutions that would like to release their catalogue data on the Web should do so with the Open Data Commons Attribution License (ODC-BY). For more details, see: http://bit.ly/MP63Dc.

However, the Discovery programme has consistently emphasised that attribution is a big banana skin in terms of practical implementation and on account of the associated Fear, Uncertainty and Doubt (the FUD factor), whilst ironically carrying little likelihood of practical enforcement under the law. This position is at the heart of the Discovery principles and is very well articulated in a subsequent Creative Commons blog post – see http://creativecommons.org/weblog/entry/33768

So we propose that open metadata is increasingly mission critical as libraries reach out to new services and that public domain licensing is the best (perhaps only?) way to engender widespread community confidence in this journey.

Don’t forget the licence

On the downside, Cloud of Data’s Paul Miller recently posted his analysis of the use of open licenses associated with data releases registered at the OKFN Data Hub. Paul’s headline findings were that:

  • Half of the 4,000 registered open data sets have no license at all
  • Only 12% of licensed data sets use either CC0 or ODC-PDDL

These stats do not reflect badly on libraries, archives and museums as the Data Hub has attracted open data releases from a wide variety of sources. However, it would be good to see more public references to the UK institutions and Discovery projects that have released open metadata explicitly linked to a public domain licence – i.e. CC0 or ODC-PDDL

So why not consider the following options:

The Data Hub

The Data Hub is maintained by CKAN and was the source of information for Paul Miller’s blogpost. There is a simple slideshow tutorial about registering releases (whether uploads or links) at http://docs.ckan.org/en/latest/publishing-datasets.html

The web upload form is at http://thedatahub.org/dataset/new. As well as being linked to the submitter’s details, it is limited to just

  • title
  • license
  • free text description

It would be good to see UK open metadata releases registered there, with a clear link to CC0, ODC-PDL or whatever other licence has been selected. Given the limited data entry form, why not include reference to the Discovery principles and / or your project in the free text description description.

The Creative Commons CC0 exemplars webpage

http://wiki.creativecommons.org/CC0_use_for_data

Clearly this applies only to those of you that have opted for the CCO license. As you can see, you’ll be in good company. My assumption is that you should simply email info@creativecommons.org (perhaps marked FAO Timothy Volmer) with your request to be on the page, providing a simple statement in line with the style of the page plus a logo.

Postscript – On recommending choice

Without doubt, Attribution has its place in the scheme of things digital – but not ideally in relation to the assertion of uncertain ‘rights’ amidst the mosaic of public domain information and distinct intellectual endeavour that constitutes the world’s bibliographic records.

Perhaps there are lessons to be learned from elsewhere about offering choice to contributors – for example from Flickr, which presents contributors with choices including the various variants of Attribution – see http://www.flickr.com/creativecommons/.

Similarly, the University of California at Santa Cruz recommends CC attribution options for public contributions to its Grateful Dead Online Archive (http://www.gdao.org/contribution). This seeks to encourage contribution of digital objects by guaranteeing credit to members of the public, which seems appropriate for the particular GDAO community context. Their options are set out below.

PS – I wonder if the description of the GDAO target community as one of ‘shared inspiration and adaptation’ has some equivalence to the global community of cataloguers, bibliographers, archivists and curators that have built up our scholarly metadata.


w/c 16 July 2012 – Discovery News Roundup

July 20, 2012

The past few weeks have seen some fairly significant announcements within the UK, in Europe and beyond, regarding linked data, APIs and global discovery services. Below are some of the highlights:

  • The European Library’s new union search portal was launched at the LIBER Conference and opens up online access “to more than 200 million records from the collections of national and research libraries across Europe” – UK contributors to the initiative are:British Library, Wellcome Library, Bodleian Libraries, University College London and the National Library of Wales. European Library The European Library has made an OpenSearch API available for developers on a non-commercial use basis but unfortunately it is only available to member libraries.

“One other thing that came up is that the situation of libraries differs greatly from other cultural institutions. This [is] because they are often not the owners of their metadata, but buy this from a commercial company. This means that open data is often not discussed in the library world because they argue that it is not their choice to make. As a result the librarians remain invisible in the discussion about how [to] provide service in a digital age.”


w/c 11 June 2012 – Discovery News Roundup

June 12, 2012

Although I’ve been away over recent weeks, activity within the world of open metadata has continued unabated – here is my digest of activity from within the Discovery programme and from further afield.

Joy Palmer’s talk at the Joint Content and Discovery Programme contained a wealth of information about the current open metadata landscape, including links to a still relevant 2010 Economist article on the ‘data deluge’ (see also their report on the potential, and conversely the problem, of ‘superabundant data’). I’d argue that the increased quantity of data isn’t necessarily creating lots of new information management issues but it certainly makes those issues more visible and more pressing as soon as we move from passively collecting data to wanting to actively exploit the potential of that data.

Last month OCLC released the Virtual International Authority File (VIAF) dataset under an open data licence, together with their guidance on attribution. OCLC has also recently launched their WorldShare Management Service which provides libraries with “a new approach to managing library services cooperatively, including integrated acquisitions, cataloging, circulation, resource sharing, license management and patron administration, as well as a next-gen discovery tool for library users.” (emphasis is mine).

America’s National Institutes of Health (NIH) presented a showcase of the National Library of Medicine APIs that are available to developers. A recording of the live webcast is available to view online. The NIH has clearly decided to move beyond the more commonly found ‘build it and they will come’ approach and are actively engaging the developer community to help them understand what APIs are available. More recently they ran a two day Health Datapalooza event which brought together NLM data experts and developers. The event was livestreamed and you can view the archived video online.

Closer to home, discussion of data in The Guardian has made it out of their Data Store pages and into the pages of their Culture Professionals Network blog. Patrick Hussey has written a three -part wide ranging exploration of data within the arts and culture sector which argues that it is time to open up performance paradata and look at ways of making their shared data count. Patrick’s main focus is on open data rather than open metadata but the series is very thought provoking and in his second article he points to the work of The National Archive in creating an open API database legislation in the shape of: http://www.legislation.gov.uk/

The BBC Connected Studio project is an open collaboration initiative that kicked off in May and is initially focused on developing new approaches to personalisation using DevCSI-style hackspace gatherings to bring together digital talent from outside the BBC. Later this year the focus shifts to “connected platforms and big data” which could mean some interesting developments that the MLA sectors might benefit from and opportunities for MLA developers to get involved by responding to Connected Studio call for participants.

The BBC Online team have managed to communicate their search and discovery strategy very clearly in the second of the videos included within this Connected Studio blogpost.

link to the BBC blogpost containing the video

The Imperial Museum is heading an international partnership of organisations in the run-up to the beginning of a four-year programme of activities to commemorate the First World War Centenary: “Through the partnership, colleagues from a variety of sectors [including museums, archives, libraries, universities and colleges, special interest groups and broadcasters] have the opportunity to communicate with each other, share and combine resources, cooperate and co-develop products and services that complement each other [...]”.  It will be interesting to see whether any developments similar to the Will’s World Discovery aggregation project emerge as a result of such a broad collaborative partnership.


Follow

Get every new post delivered to your Inbox.