w/c 24 December – Discovery news roundup

December 31, 2012

UK and JISC Discovery Project news:

Europeana news:

Other news from Europe and beyond:

Event reports:

  • The University of Leicester’s live debate, ‘Museums in the information age: Evolution or extinction?’, which took place at the Science Museum is available to listen to onlineA Guardian article covering the debate notes the importance of digital discovery: “[…] some digital resources produced by museums quickly become disposable if not easily discoverable by potential users.”
  • The Culture Hack event, which took place at the Google Campus, centred on envisaging new ways for London’s schoolchildren to interact with and be inspired by the city’s cultural heritage. Martin Belam’s blogpost provides a good overview of the day and highlights a thought provoking comment from the British Library’s Nora McGregor“It’s about teaching metadata to children.”

Call for contributions:


w/c 12 November – Discovery news roundup

November 15, 2012

One particularly interesting thing I noticed this past month was that tweets about open data, linked data and metadata were starting to come thick and fast from people within my network who sit well outside the library and cultural data domains. In particular the tweets from attendees of the Lasa’s Charity Digital Summit and the ‘Nesta in Manchester’ event about innovation seemed to include a rich vein of tweets about all things open. Perhaps an indicator that open data’s tipping point is approaching?

Some highlights from the world of resource discovery and open data in recent weeks:

Updates from a couple of large-scale projects in Europe and the US:

News from Discovery partner organisations:

Please note that places are still available for the second of our free Discovery Licensing Clinics on 30 November in London. It is an opportunity for managers and decision makers from libraries, archives and museums to get practical advice on open data licensing from our assembled team of experts.


w/c 8 October – Discovery News Roundup

October 12, 2012

In recent weeks the Discovery Team have been finalising and releasing a whole suite of online materials which reflect our continued focus on the business case for Discovery. In September we released a collection of eight videos containing the reflections of UK academic library directors on topics such as the key issues and challenges for resource discovery, the value of making special collections visible and the potential of collaboration.

The videos were launched in our latest Discovery newsletter, along with our Case Study collection and Guidance Materials which aim to highlight and support current real-world practices relating to the Discovery Open Metadata Principles and Technical Principles within museums, libraries and archives.

The work of the Discovery programme has informed the latest animation from the OER IPR Support Project: ‘Open Data Licensing’ which you can view below. A key aspect of the Discovery programme’s approach is “establishing clarity of understanding around licensing and open data” so it’s good to see such a complex issue described in an accessible way – it doesn’t remove any of the inherent complexity but it breaks down and clarifies that complexity, which is an important initial step towards enabling action.

Some highlights from the wider world of resource discovery and open data:

  • In September, Europeana continued to set the pace in cultural data aggregation by opening up metadata for more than 20 million cultural objects for free use under the Creative Commons CC0 Public Domain Dedication licence. Their release represents the largest one-time dedication of cultural data to the public domain using the CC0 waiver and opens up the possibility of innovative apps, games, web services and portals being developed. The move also ‘holds the potential to bring together data […] from other sectors, such as tourism and broadcasting’. As Jill Cousins, Executive Director of Europeana said: “This move is a significant step forward for open data and an important cultural shift for the network of museums, libraries and galleries who have created Europeana”. EC Vice President, Neelie Kroes referred to the Europeana release as a ‘treasure trove of cultural heritage.
  • At the Healthcare Efficiency Through Technology Expo this week, Garry Coleman talked about the NHS Information Centre’s plans for a large-scale open data release, involving millions of rows of data being made available under an Open Government Licence. This release reflects the wider importance of transparency as a motivator for open data, particularly within governmental/publically-funded organisations. It also could be a watershed moment for the release of anonymised sensitive data which could further open up the way for the, arguably much less contentious, sharing of open metadata that our sector is working towards.
  • I mentioned Cooper-Hewitt Labs’ Director, Seb Chan, in my digest last month and his latest blogpost about being ‘of the web’ rather than ‘on the web’ is another interesting read. They are embracing the porosity of the internet and working with websites such as Behance to surface their collections and associated information out in the wild. In doing so they are finding creative ways to tackle potential showstoppers such as control over branding and retaining attribution. Their approach enables them to keep their expertise focussed on activities that are within their own domain and offers up an interesting blueprint for externally located engagement and visibility.
  • Rewired State are running an ‘Open Science’ hack day event in partnership with the Wellcome Trust in December.
  • The Open Data Network have launched the Open Data Showroom website which looks like it will become a very useful ‘at a glance’ resource for finding interesting sources of, and uses for, open data.
  • Leigh Dodds’ blogpost identifies a simple model for exploring the sustainability of open data curation projects such as legislation.gov.uk.
  • A significant release of legislative open data was announced this week on the Open Knowledge Foundation website, which reported on the release of US Congress legislative data going back to 1973.
  • The latest Arts Council Digital R&D podcast focuses on how organisations can use digital technology to open up archives, collections and data. It includes news from the V&A and the British Museum and considers the impact of projects such as Google’s Art Project.
  • And staying with Google, this week saw the launch of the Google Cultural Institute which aims to “preserve and promote culture online”. The Cultural Institute website presents curated cultural artefacts in online galleries, together with search and browse facilities. The individual artefacts retain their attribution to the holding organisation and, in some cases, the associated metadata can also be viewed. It’s not immediately obvious how open the underlying data is but it appears to be a walled garden at the moment.

w/c 27 August – Discovery News Roundup

August 31, 2012

Earlier this month OCLC announced their recommendation that member institutions use the Open Data Commons Attribution (ODC-BY) License when releasing WorldCat-derived library catalogue data. You can read David Kay’s response to that announcement here on the Discovery blog. And last week there was news that OCLC and Europeana are collaborating on a project developing ‘semantic similarity’ that will improve the experience of searching aggregated metadata by identifying items that are near duplicates or related to each other. The wider significance of this project is that it will feed into the Europeana Data Model and will provide “opportunities to develop new data services for third parties.”

It’s long been asserted that content is king, and more recently that context is queen but now Associated Press are investing in marrying the two in order to speed up the distribution of content. Clearly the Associated Press business model is based on syndication rather than aggregation but Melody K. Smith’s assertion that “[f]indability only works when a proper taxonomy is in place.” seems worth some thought with regard to its relevance for our sectors.

TechCrunch’s article pitching Mendeley’s open API against Elsevier’s closed API is flawed but it’s worth reading for the comments it provoked, particularly from Elsevier’s Director of Platform Integration, Ale De Vries. You can read more about the growth of Mendeley’s API service on the Guardian Technology blog – it’s interesting to note that their future plans involve developing their API service into a multi-directional dataflow that will allow applications built on their API to talk to each other and to upload data to Mendeley.

Seb Chan’s candid blogpost reflecting on the Cooper-Hewitt Design Center’s experience of openly releasing their collection metadata is a useful and timely reminder that a) issues around the quality of released metadata need to be addressed if we want anyone to use the data we’re releasing and b) “collection metadata [has value as a tool for discovery] but it is not the collection itself.” Seb’s point about museum collections being no match for the comprehensiveness of libraries and archives highlights the importance of open metadata, by enabling cross-institutional aggregation, and the work of OCLC and Europeana’s ‘semantic similarity’ project I mentioned above. In an ideal world it will also enable the public permeability that Seb touches on by connecting our collections with the boundless ‘amateur web’ corpus.

News from Discovery projects


w/c 16 July 2012 – Discovery News Roundup

July 20, 2012

The past few weeks have seen some fairly significant announcements within the UK, in Europe and beyond, regarding linked data, APIs and global discovery services. Below are some of the highlights:

  • The European Library’s new union search portal was launched at the LIBER Conference and opens up online access “to more than 200 million records from the collections of national and research libraries across Europe” – UK contributors to the initiative are:British Library, Wellcome Library, Bodleian Libraries, University College London and the National Library of Wales. European Library The European Library has made an OpenSearch API available for developers on a non-commercial use basis but unfortunately it is only available to member libraries.

“One other thing that came up is that the situation of libraries differs greatly from other cultural institutions. This [is] because they are often not the owners of their metadata, but buy this from a commercial company. This means that open data is often not discussed in the library world because they argue that it is not their choice to make. As a result the librarians remain invisible in the discussion about how [to] provide service in a digital age.”


w/c 11 June 2012 – Discovery News Roundup

June 12, 2012

Although I’ve been away over recent weeks, activity within the world of open metadata has continued unabated – here is my digest of activity from within the Discovery programme and from further afield.

Joy Palmer’s talk at the Joint Content and Discovery Programme contained a wealth of information about the current open metadata landscape, including links to a still relevant 2010 Economist article on the ‘data deluge’ (see also their report on the potential, and conversely the problem, of ‘superabundant data’). I’d argue that the increased quantity of data isn’t necessarily creating lots of new information management issues but it certainly makes those issues more visible and more pressing as soon as we move from passively collecting data to wanting to actively exploit the potential of that data.

Last month OCLC released the Virtual International Authority File (VIAF) dataset under an open data licence, together with their guidance on attribution. OCLC has also recently launched their WorldShare Management Service which provides libraries with “a new approach to managing library services cooperatively, including integrated acquisitions, cataloging, circulation, resource sharing, license management and patron administration, as well as a next-gen discovery tool for library users.” (emphasis is mine).

America’s National Institutes of Health (NIH) presented a showcase of the National Library of Medicine APIs that are available to developers. A recording of the live webcast is available to view online. The NIH has clearly decided to move beyond the more commonly found ‘build it and they will come’ approach and are actively engaging the developer community to help them understand what APIs are available. More recently they ran a two day Health Datapalooza event which brought together NLM data experts and developers. The event was livestreamed and you can view the archived video online.

Closer to home, discussion of data in The Guardian has made it out of their Data Store pages and into the pages of their Culture Professionals Network blog. Patrick Hussey has written a three -part wide ranging exploration of data within the arts and culture sector which argues that it is time to open up performance paradata and look at ways of making their shared data count. Patrick’s main focus is on open data rather than open metadata but the series is very thought provoking and in his second article he points to the work of The National Archive in creating an open API database legislation in the shape of: http://www.legislation.gov.uk/

The BBC Connected Studio project is an open collaboration initiative that kicked off in May and is initially focused on developing new approaches to personalisation using DevCSI-style hackspace gatherings to bring together digital talent from outside the BBC. Later this year the focus shifts to “connected platforms and big data” which could mean some interesting developments that the MLA sectors might benefit from and opportunities for MLA developers to get involved by responding to Connected Studio call for participants.

The BBC Online team have managed to communicate their search and discovery strategy very clearly in the second of the videos included within this Connected Studio blogpost.

link to the BBC blogpost containing the video

The Imperial Museum is heading an international partnership of organisations in the run-up to the beginning of a four-year programme of activities to commemorate the First World War Centenary: “Through the partnership, colleagues from a variety of sectors [including museums, archives, libraries, universities and colleges, special interest groups and broadcasters] have the opportunity to communicate with each other, share and combine resources, cooperate and co-develop products and services that complement each other […]”.  It will be interesting to see whether any developments similar to the Will’s World Discovery aggregation project emerge as a result of such a broad collaborative partnership.


w/c 12 Mar 2012 – Discovery News Roundup

March 16, 2012

Here’s my round up of news from the world of Discovery and beyond over the past couple of weeks. As with my previous posts, many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf.

First of all, some news from the Discovery initiative – There is an opportunity to attend the free Licensing Clinic that the Discovery project is running on Wednesday 9th May in Birmingham. This practical roundtable event is aimed at managers and decision makers in libraries, archives and museums and there will be the following experts on hand to help guide you through your institution’s particular open metadata licensing challenges: Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data). Please note that places at this event are strictly limited to 15 delegates so you’re advised to book sooner rather than later and you can do that by signing up via the Eventbrite registration page.

In recent weeks I’ve seen a few articles relating to the need for skills development in the area of ‘data wrangling’/’data management’:

Those articles left me wondering whether there are specific skills needed for dealing with and managing open metadata which we should be identifying and highlighting? On a related note, I saw a short conversation regarding Linked Data on Twitter that I think a lot of people will relate to and which could be equally applied to any of the areas touched on by the Discovery initiative – To summarise, the main point of the conversation was that [people] have no trouble understanding what terms such as Linked Data mean while they are being explained to them but that knowledge is hard to retain and quickly loses definition when you walk away and/or try to explain it to anyone else.

Resources such as the Open Metadata Handbook are undoubtedly a useful touchstone people can keep returning to when they need a refresher but what else needs to be in place to ensure that knowledge about open metadata is discovered, shared and becomes embedded within staff skillsets?

One of the aims of the Discovery initiative is to raise awareness of open metadata and if you’d like to help us do that then you can either:

Some other links of interest from the wider world of data:

Lastly, I’ve started exploring how I can use Delicious to share other items of interest that I pick up during my travels across the webosphere – To that end I’ve started using Packrati.us to auto-bookmark my Twitter favourites and shared hyperlinks in Delicious and have also created a #UKDiscovery ‘stack’ where I’ve started sharing any of my bookmarks that seem particularly pertinent to the Discovery initiative.


w/c 27 Feb 2012 – Discovery News Roundup

March 4, 2012

Here’s my round up of news from the world of Discovery and beyond over the past few weeks. As with previous posts, many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf [update: URL fixed].

Last week the Discovery team published Issue 6 of the Discovery Newsletter which included the following articles among others:

  • an article on how the Copac Collections Management Tool project is aiming to help collections managers.
  • an introduction to ‘Will’s World’ – one of the JISC-funded large-scale exemplar projects.
  • an invitation for supply chain organisations such as system vendors and publishers to engage with the Discovery initiative.

If you’d like to receive future newsletters by email you simply need to drop us a line at rdtf-discovery@sero.co.uk and you’ll be added to the distribution list.

It was interesting to read Harvard’s announcement of the changes they will be undergoing in order to unify their 73 (!) libraries. Much of the announcement concentrated on structural changes but this sentence caught my eye and it seems to suggest that some game changing LIS developments could be in the offing: “The changes will position the Library to lead in scholarly communication and open access, to design next generation search and discovery services, and to accelerate digitization and digital preservation.

Of course Harvard’s Library Lab team are already involved in designing next generation search and discovery services as part of the Digital Public Library of America (DPLA) Beta Sprint initiative – the scale of the data they’re dealing with is pretty impressive but it was the live demo of their “pre-alpha” ShelfLife/LibraryCloud system that took my breath away and got me thinking about new possibilities for discovery interfaces.

When I first read this short blogpost from the Louie B. Nunn Center for Oral History, University of Kentucky I initially dismissed it as not quite newsworthy enough to include in this digest … but I kept thinking about the story after I had clicked away from it.  It seems to me that the ‘Oral History Metadata Synchronizer’ (OHMS) tool that they’ve developed with their digital library division has huge potential for improving the visibility of audio collections and connecting them to other relevant resources. The story of how the Nunn Center have used OHMS to preserve and share interviews with survivors of the Haiti earthquake is a moving reminder that metadata is (at the risk of getting poetic and misty eyed) more than sterile information, and the discovery it enables is human as much as it is digital.

Staying on the subject of audio collections, the Music Library Association is currently working on a final version of their Music Discovery Requirements document and they are currently inviting thoughts and suggestions. This presentation by Nara Newcomer provides useful background on the aim of the Music Discovery Requirements document.

The Discovery programme is particularly focused on the business case for adopting open metadata so it was interesting to read this white paper from Nielsen which reports on the effect of supplying (or not supplying) metadata within the book industry. One of the key conclusions reads: “Overall we see clear indications that supplying a set of full enhanced metadata for product records helps to maximise sales, and that this relationship between enhanced metadata and sales is even stronger for the online retail sector.” Of course UK university libraries are not in the business of book retail and this report could simply serve to make publishers more commercially protective over the metadata they create but all the same it is good to have some high profile research published in this area. It’s a pity that they don’t separate out enhanced metadata from the provision of a cover images in their analysis – from research I’ve been involved in previously I suspect there might be some interesting findings that remain hidden by the approach they’ve taken.

Europeana have published data for 2.4 million items under an open metadata licence as part of its Linked Open Data pilot. The data is provided by eight national libraries and a number of cultural heritage organisations (including some from the UK) and there’s also a convincing animation on the ‘what and why’ of linked data which, pleasingly, keeps the end user at the forefront of the discussion. Europeana also launched the ‘European Library Standards Handbook’ which is their guide for libraries who are providing content to data aggregators – it includes a legal overview as well as a technical guide. If you are interested in linked open data then you might want to follow the University of Bristol’s ‘Bricolage’ project which is JISC-funded and will be publishing catalogue metadata from their Penguin Archive and Geology Museum collections.

Earlier this week I found myself having one of those ‘am I the only person not at this event?’ moments as my Twitterstream gradually filled up with all manner of interesting and diverting tweets from the OCLC EMEA Regional Council Annual Meeting.  Owen Stephens captured some of the knowledge that was shared around the topic of APIs in his blogposts written on the day. One of the sessions that seemed to be particularly well received was Alison Cullingford’s presentation on recent survey findings from the RLUK Unique and Distinct Collections project so it will be interesting to read the report when it is published. The meeting also brought news that an open data commons licence is being considered for WorldCat:

WorldCat: open data commons licence is being considered and will be discussed with OCLC membership through Global Council #EMEARC

— Simon Bains (@simonjbains) February 29, 2012

I will not pretend to be an expert but these guides that the Archives Hub have added to their website look very useful for anyone who is interested in accessing Archives Hub data using SRU and OAI-PMH interfaces.

I’ll finish up by sharing some interesting news in the wider world of open data and metadata:

  • The JISC Managing Research Data Programme is doing some heavy lifting in terms of building a registry of metadata standards  (for UK university research datasets) – I’m sure they would be pleased to hear from you if you have any insights you’d like to share with them.
  • The Government’s call for input to their consultation on “open standards for software interoperability, data and document formats” is ongoing and it doesn’t close until 3 May so there’s plenty of time left to think about what the direct and indirect supply chain ripples might be.
  • In my last news digest I mentioned that ‘big data’ suddenly seemed to be everywhere – This week Nick Edouard’s reflective post over on the BuzzData blog struck a chord with me, particularly his point that “Open-data initiatives are good for many reasons, not least because they can radically improve internal data-sharing.” Often the discussion around open data tends towards a leap of faith/altruistic model but keeping focused on the ‘what’s in it for us?’ question seems a surer way of securing the internal resources needed to release data in the first place.

In closing, a couple of blogposts I’ve read recently have got me thinking about the importance of identifying a vision that other people can quickly understand and get behind:

I think that the Discovery vision packs a similar punch but perhaps it could be more emotive?: “[Our vision] is about making resources more discoverable both by people and machines.” Is that a vision which speaks to you? Have you found the words to succinctly describe your institution’s vision for resource discovery? Please do share your thoughts in the comments below.


w/c 6 Feb 2012 – Discovery News Round-up

February 9, 2012

Here’s my round up of news from the world of Discovery and beyond over the past few weeks. Many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf that I generated.

Last week Joy Palmer shared plans for the next phase of guidance materials and workshops here on the Discovery blog and is looking for your feedback on the outlined approach so please do wade in and let us know what you think. And bonus points for anyone who can suggest a better title for the event than ‘Un’developer hands-on development event. The best I can come up with is ‘Can’t Code, Won’t Code’ so the field is wide open.

The National Information Standards Organization (NISO) are currently inviting public comment on the working group recommendations that have come out of the joint NISO and NFAIS (the National Federation of Advanced Information Services) project to develop Recommended Practice on Online Supplemental Journal Article Materials. The main aim of the project is to improve the ‘discoverability and findability’ of journal supplemental materials for librarians and would-be readers by establishing and maintaining links to the related article. The comment period runs until 29th February and, although the recommendations are aimed mainly at publishers, they are also interested in feedback from the wider scholarly community. [via @simonhodson99]

One of the key NISO/NFAIS recommendations is around consistency and, interestingly, this was also one of the key discussion points raised during recent focus groups run by the JISC/AHRC-funded Open Access e-Books research project (OAPEN-UK). So far the project have heard from humanities and social sciences (HSS) monograph publishers, authors/readers and institutional representatives and next week they are running focus groups for research funders, e-book aggregators and learned societies. Incidentally, if you are interested in taking part in one of those focus groups then further details can be found on their Events page. [via @publishersrcly]

A couple of weeks ago it seemed to be ‘Big Data’ week on my twitter stream – all and sundry were tweeting about it and it wasn’t just the data geeks any more. It certainly seemed to suggest, as reported in this Museum Geek post, that “the era of Big Data has begun” but it struck me that the conversation around big data seems to be moving on from mostly logistical or functional discussions about gathering, storing, sharing and making use of data to a realisation that generating and circulating more data doesn’t solve anything on its own (see GigaOm’s article which likens it to virtual landfill via @paulmiller). In the world of building websites there’s a saying that ‘content is king’ but in the world of data it would appear that ‘content + context = king and queen’. Which had me pondering whether the Discovery initiative could usefully consider establishing Open Paradata Guidelines to sit alongside our Open Metadata Principles. And coming from a humanities background myself I found Michael Kramer’s assertion that “data is always already meta-data” an interesting point to mull over.

The Data Catalogs website, which was launched last summer, aims to be “the most comprehensive list of open data catalogs in the world”. I’m sure it’s relatively early days yet but there are already 212 catalogues listed and the list of experts involved in the website is impressive. It looks like it will grow into a useful centralised resource, particularly if a more advanced search is added, but I noticed that not all of the entries state what their metadata license is – it seems to me that there’s an opportunity to improve consistency and clarity by making that a mandatory field. What did impress/surprise me though is that any visitor to the website can improve a record simply by clicking on the ‘Please help improve this page by adding more information’ link at the bottom of the record and editing the fields that appear [via @rufuspollock]. If you are interested in the issues around licensing open data then Naomi Korn and Professor Charles Oppenheim’s practical guide is worth a read.

And finally, a few items of interest from the wider world of Discovery:

  • This article about book mashups on the Programmable Web ‘API News’ blog got me thinking about countless possibilities for making library and museum and gallery collections more visible and connected in new ways. Then this morning someone tweeted about the strangely hypnotic Flight Radar website and I wondered if one day I might find myself gazing at a map that shows books flying overhead as they wend their way from place to place as inter-library loans.
  • March is looking set to be Culture Hack Month, with events taking place on both sides of the Pennines. Hack for Culture takes place on the 3rd and 4th March in Liverpool and is bringing interested parties together “to explore the possibilities offered by joint experimentation with a wide variety of hidden cultural data sets”.  The 24 hour-long CultureCode Hack takes place towards the end of March in Newcastle and will give cultural and arts organisations with open data the opportunity to work with developers and designers to create something new. You can take a peek at the hacks that were developed the Culture Hack North event in Leeds last year to get an idea of what can be produced in such a short amount of time.

w/c 16 Jan 2012 – Discovery News Round-up

January 16, 2012

This is the first of my regular round-up of what’s happening in the world of resource discovery. Twice a month I’ll be sharing what I’ve found during my internet travels and also highlighting things that have caught my eye under the #UKDiscovery Twitter hashtag. You can also see the latest tweets from that hashtag compiled into an eye-pleasing PDF format (created via the FiveFilters PDF newspaper maker).

Firstly, I want to share the output of the JISC Activity Data Synthesis project which I was involved with last year. The project website was published in November 2011, which already seems like a lifetime ago, but hopefully the collective wisdom gathered together there will be useful for some time to come.

JISC Activity Data website screenshot

The JISC Activity Data programme was a collection of nine projects which, although not directly part of the Discovery initiative, covered some relevant terrain – particularly around issues of licensing, metadata and open data. Other strong themes that emerged during the course of the programme were ‘big data’ (particularly so for the Exposing VLE Activity Data project), data storage and data visualisation. If you’re interested in getting to grips with data visualisation then the online talk that Tony Hirst kindly did for us as part of our virtual exchange sessions is well worth a watch. Five of the projects were focused on library activity data so they are worth exploring if that’s the domain you’re involved with: AEIOU, LIDP, RISE, SALT and OpenURL.

Now onto the highlights of things I’ve come across over the past few weeks: