Discovery Licensing Clinic

May 23, 2012

Image

photo credit: Ed Bremner

The first Discovery Licensing Clinic brought together representatives from a number of different libraries, archives and museums to spend a day considering practical responses to the Discovery open licensing principles and getting practical guidance from the assembled experts. It was an opportunity to identify issues and discuss the range of tactics that institutions might adopt in scoping metadata releases and making the associated licensing decisions.

Our panel of experts on the day consisted of Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data) and Chris Banks (University Librarian & Director, Library, Special Collections & Museums, University of Aberdeen)

Chris Banks has written a blogpost reflecting on the day and her presentation slides can be viewed below:

The issues around licensing open metadata do represent a significant hurdle for institutions but none of those issues are insurmountable. Our hope is that licensing clinics such as this one, and the ones we plan to run in the future, will give managers and decision makers the knowledge they need to progress the open metadata agenda within their organisation.


Highlights from the Content and Discovery Joint Programme event

May 22, 2012

On the 23rd April colleagues from projects across the Discovery, JISC Content, JISC OER and Emerging Opportunities programmes gathered in Birmingham to share knowledge and identify shared challenges and key agendas that need to be progressed going forward. As is often the way with these types of events the discussions that took place over a day and a half were as useful to those running the event as they were for the delegates attending. The notes below represent just a handful of my highlights.

Joy Palmer presented on behalf of the Discovery Programme and gave a compelling overview of the challenges and aspirations we share around the discovery of content. She highlighted how, as the RDTF work was translated into the Discovery initiative, it became clear that we needed to talk in terms of an ecosystem as opposed to an ‘infrastructure’ because the latter suggested that the initiative was aiming to impose an overarching infrastructure model over the entire museums, libraries and archives (and JISC) discovery space.

“To a large degree, what today is about is determining to what degree we can operate as a healthy and thriving ecosystem, where components of our content or applications interact as a system, linked together by the flow of data and transactions.”

But as Joy stated, this is not to oversimplify matters. Her talk touched on the many apparently competing theories about how to enable discovery in the dataspace, highlighting the complexity we’re all confronting as we make decisions about the discovery and use of our data: Big Data and The Cloud, Paradata, Linked Data, Microdata, and the ‘return’ of Structured Data.

But in terms of our shared goals to have our content discoverable or useable via the web, she explained it is the tactic of opening up data that is relevant to us all, even if our challenges in achieving ‘openness’ differ.

The slides from Joy’s presntation are available to view on Slideshare:

Discovery: Towards a (meta)data ecology for education and research

View more PowerPoint from joypalmer

In the afternoon I facilitated Andy McGregor and David Kay’s session on business cases where the participants obligingly contributed to David’s mapping exercises.

There were some interesting discussions around the participants’ experience of writing business cases, including useful suggestions for getting the most out of building a business case:

  • Predicting and measuring benefit are key challenges to overcome but we can do that by using the data at our disposal to create a convincing narrative. However it’s not about manipulating that data and making up stories retrospectively, we need to put energy into building robust analytics that help communicate our story clearly and convincingly.
  • Filling out a business case template shouldn’t be an activity that only happens in order to secure funding or other resources – it can be very useful to reiterate the process throughout the course of the project in order to track any changes in the course of the project.

The following links may be useful if you are interested in building robust business cases:

In the plenary session on day two the conversations centred around a number of discussion points:

  • Terms such as ‘microdata’ (machine-readable semantic tagging of webpage content) and ‘paradata’ (usage analytics or contextual information about data/metadata) were new to some of the participants and this prompted a discussion around the seemingly unavoidable challenge of jargon that we face within the Discovery arena. One suggestion was that instead of working to define a stronger vocabulary that is understood by all, perhaps we should be identifying stronger metaphors which everyone can relate to; metaphors that communicate the vision of what we are working towards and help everyone understand how they can get involved with delivering that vision within their own context.
  • We should be stepping outside of the sector to see the potential for emerging areas of activity (e.g. paradata). Looking to those sectors who are ahead of the game saves the library, museum and archives sectors having to try and work from a blank page. We also need to identify where our sectors are ahead and recognise how those advantages leave us well positioned to make significant progress.
  • Projects would benefit from a system of ‘evaluation buddies’ from within their programme to help uncover evidence of project impact and then share this evidence, together with highlighting any awards and recognition won by projects. This will help institutions build their internal business cases for bidding to run and then embed JISC projects in the future. There was also the suggestion that JISC could usefully build a collection of the major use cases (in a similar way to the Open Bibliographic Data Guide) together with short case studies that demonstrate the institutional impact.
  • Across the two days there were mentions of ‘microdata’ (machine-readable semantic tagging of webpage content), ‘big data’ (i.e. high volume) and ‘heavy data’ (data which ‘stretches current infrastructure or tools due to its size or bulk’ but the argument was made that the primary objective should be to produce ‘simple data’ (data that is both simple to produce and simple to consume).
  • There was recognition that aggregation is an art not a science and that current data standards are a) opinion, not fact and b) open to interpretation. High quality data is key to producing usable datasets but there was a question about how that quality can be defined. One suggestion was that data clean-up is a highly specialist service that should be decoupled, as per the government’s view with regard to open data.

Some key takeaway points for the Discovery programme:

  • Information about the Discovery programme, its projects and the underlying principles should be in a format that is ‘reframeable’, making it easy for interested parties to access information on their terms and cascade that information to their own audience or stakeholders.
  • Identifying and highlighting the tangible benefits of the Discovery Priniciples enables supporters of those principles to embark on fruitful conversations with colleagues in their institutions.
  • There is huge benefit in sharing the learning and challenges from within, and without, the Discovery programme.  An ongoing process of synthesis, re-synthesis and distillation will extract maximum value from the activity taking place across the Discovery initiative.
  • The quality of metadata is key to the success of Discovery initiatives – we need to explore how high quality metadata is defined and ensured.

Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012

Image

Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

The rise of REST - 'Hot API Protocols and Styles' from John Musser of Programmableweb.com at SXSW 2012

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:


Warwick workshop prioritises resource discovery

March 29, 2012

In January 2012, JISC and SCONUL convened a workshop for Library Directors and Senior Managers to review the evolving requirements for institutional Library Management Systems (LMS), referenced as Domain 3 in the 2009 SCONUL report to HEFCE.  Entitled ‘The Squeezed Middle’, the workshop focused on the key service developments impacting the LMS footprint, given evolving approaches in Resource Discovery (Domain 2) and shared service developments in the management of subscription resources (Domain 1).

After considering a business modeling framework presented by Lorcan Dempsey and a number of future scenarios set in the year 2020, the workshop reviewed a catalogue of over 60 potential library service and institutional knowledge management objectives. The group evaluated them in terms of desirability, feasibility and their potential to act as drivers of mission critical change.

It was striking that the Discovery agenda represented a very high proportion of the items ranked as high priority looking to 2020. It was also noted that above campus initiatives (such as shared cataloguing and records improvement) and services (such as resource discovery aggregations) can act as catalysts for reviewing workflows (both user and librarian) and reappraising library team skills.

The highest ranked Discovery related targets were as follows:

  • 31 – Provide 1-stop search across all asset types
  • 32 – Publish open linked catalogue metadata
  • 33 – Expose the collection to other search mechanisms
  • 34 – Emphasise exposure of special collections
  • 35 – Integrate LMS & VLE resources, including reading lists
  • 43 – Curate local learning resources, including OERs
  • 44 – Drive the value of reading lists

Medium priority Discovery related targets were:

  • 36 – Provide recommender and associated ‘social’ services
  • 45 – Curate institutional research data
  • 46 – Expose the institutional repository
  • 47 – Expose the university archives

The headline priorities included

  • Provide 1-stop search across the range of Teaching, Learning and Research asset types that are authored and collected within institutions
  • Integrate reading lists effectively with the discovery of and access to library, VLE and repository resources
  • Establish sustainable curation, workflow management and exposure for all digital scholarly assets – including local learning resources, OERs and research data
  • Not on the original list, delegates added the potential for a persistent personal interface to assets, typically through bookmarking; the metaphor of a personal e-shelf was regarded as attractive.

Other challenges such as re-thinking the user access points for resource discovery or collaboration on adoption of widely used authorities and vocabularies were regarded as less critical, though not unimportant. The abandonment of the traditional LMS OPAC received a low vote on the basis that this will be an outcome of success in these broader ambitions. Whilst enhancing the discoverability of university museum assets received a low average vote, it was highly scored by those institutions with their own museum collection.

So Discovery featured highly for library management both as an end in itself and as a catalyst for changing processes and practice, relationships and responsibilities. However, we must also reflect on whether this professional and user-centred aspiration relates to a destination at which we will one day arrive or perhaps may be better viewed as an essential element in the continuous evolution of the academy.


w/c 12 Mar 2012 – Discovery News Roundup

March 16, 2012

Here’s my round up of news from the world of Discovery and beyond over the past couple of weeks. As with my previous posts, many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf.

First of all, some news from the Discovery initiative – There is an opportunity to attend the free Licensing Clinic that the Discovery project is running on Wednesday 9th May in Birmingham. This practical roundtable event is aimed at managers and decision makers in libraries, archives and museums and there will be the following experts on hand to help guide you through your institution’s particular open metadata licensing challenges: Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data). Please note that places at this event are strictly limited to 15 delegates so you’re advised to book sooner rather than later and you can do that by signing up via the Eventbrite registration page.

In recent weeks I’ve seen a few articles relating to the need for skills development in the area of ‘data wrangling’/’data management’:

Those articles left me wondering whether there are specific skills needed for dealing with and managing open metadata which we should be identifying and highlighting? On a related note, I saw a short conversation regarding Linked Data on Twitter that I think a lot of people will relate to and which could be equally applied to any of the areas touched on by the Discovery initiative – To summarise, the main point of the conversation was that [people] have no trouble understanding what terms such as Linked Data mean while they are being explained to them but that knowledge is hard to retain and quickly loses definition when you walk away and/or try to explain it to anyone else.

Resources such as the Open Metadata Handbook are undoubtedly a useful touchstone people can keep returning to when they need a refresher but what else needs to be in place to ensure that knowledge about open metadata is discovered, shared and becomes embedded within staff skillsets?

One of the aims of the Discovery initiative is to raise awareness of open metadata and if you’d like to help us do that then you can either:

Some other links of interest from the wider world of data:

Lastly, I’ve started exploring how I can use Delicious to share other items of interest that I pick up during my travels across the webosphere – To that end I’ve started using Packrati.us to auto-bookmark my Twitter favourites and shared hyperlinks in Delicious and have also created a #UKDiscovery ‘stack’ where I’ve started sharing any of my bookmarks that seem particularly pertinent to the Discovery initiative.


w/c 27 Feb 2012 – Discovery News Roundup

March 4, 2012

Here’s my round up of news from the world of Discovery and beyond over the past few weeks. As with previous posts, many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf [update: URL fixed].

Last week the Discovery team published Issue 6 of the Discovery Newsletter which included the following articles among others:

  • an article on how the Copac Collections Management Tool project is aiming to help collections managers.
  • an introduction to ‘Will’s World’ – one of the JISC-funded large-scale exemplar projects.
  • an invitation for supply chain organisations such as system vendors and publishers to engage with the Discovery initiative.

If you’d like to receive future newsletters by email you simply need to drop us a line at rdtf-discovery@sero.co.uk and you’ll be added to the distribution list.

It was interesting to read Harvard’s announcement of the changes they will be undergoing in order to unify their 73 (!) libraries. Much of the announcement concentrated on structural changes but this sentence caught my eye and it seems to suggest that some game changing LIS developments could be in the offing: “The changes will position the Library to lead in scholarly communication and open access, to design next generation search and discovery services, and to accelerate digitization and digital preservation.

Of course Harvard’s Library Lab team are already involved in designing next generation search and discovery services as part of the Digital Public Library of America (DPLA) Beta Sprint initiative – the scale of the data they’re dealing with is pretty impressive but it was the live demo of their “pre-alpha” ShelfLife/LibraryCloud system that took my breath away and got me thinking about new possibilities for discovery interfaces.

When I first read this short blogpost from the Louie B. Nunn Center for Oral History, University of Kentucky I initially dismissed it as not quite newsworthy enough to include in this digest … but I kept thinking about the story after I had clicked away from it.  It seems to me that the ‘Oral History Metadata Synchronizer’ (OHMS) tool that they’ve developed with their digital library division has huge potential for improving the visibility of audio collections and connecting them to other relevant resources. The story of how the Nunn Center have used OHMS to preserve and share interviews with survivors of the Haiti earthquake is a moving reminder that metadata is (at the risk of getting poetic and misty eyed) more than sterile information, and the discovery it enables is human as much as it is digital.

Staying on the subject of audio collections, the Music Library Association is currently working on a final version of their Music Discovery Requirements document and they are currently inviting thoughts and suggestions. This presentation by Nara Newcomer provides useful background on the aim of the Music Discovery Requirements document.

The Discovery programme is particularly focused on the business case for adopting open metadata so it was interesting to read this white paper from Nielsen which reports on the effect of supplying (or not supplying) metadata within the book industry. One of the key conclusions reads: “Overall we see clear indications that supplying a set of full enhanced metadata for product records helps to maximise sales, and that this relationship between enhanced metadata and sales is even stronger for the online retail sector.” Of course UK university libraries are not in the business of book retail and this report could simply serve to make publishers more commercially protective over the metadata they create but all the same it is good to have some high profile research published in this area. It’s a pity that they don’t separate out enhanced metadata from the provision of a cover images in their analysis – from research I’ve been involved in previously I suspect there might be some interesting findings that remain hidden by the approach they’ve taken.

Europeana have published data for 2.4 million items under an open metadata licence as part of its Linked Open Data pilot. The data is provided by eight national libraries and a number of cultural heritage organisations (including some from the UK) and there’s also a convincing animation on the ‘what and why’ of linked data which, pleasingly, keeps the end user at the forefront of the discussion. Europeana also launched the ‘European Library Standards Handbook’ which is their guide for libraries who are providing content to data aggregators – it includes a legal overview as well as a technical guide. If you are interested in linked open data then you might want to follow the University of Bristol’s ‘Bricolage’ project which is JISC-funded and will be publishing catalogue metadata from their Penguin Archive and Geology Museum collections.

Earlier this week I found myself having one of those ‘am I the only person not at this event?’ moments as my Twitterstream gradually filled up with all manner of interesting and diverting tweets from the OCLC EMEA Regional Council Annual Meeting.  Owen Stephens captured some of the knowledge that was shared around the topic of APIs in his blogposts written on the day. One of the sessions that seemed to be particularly well received was Alison Cullingford’s presentation on recent survey findings from the RLUK Unique and Distinct Collections project so it will be interesting to read the report when it is published. The meeting also brought news that an open data commons licence is being considered for WorldCat:

WorldCat: open data commons licence is being considered and will be discussed with OCLC membership through Global Council #EMEARC

— Simon Bains (@simonjbains) February 29, 2012

I will not pretend to be an expert but these guides that the Archives Hub have added to their website look very useful for anyone who is interested in accessing Archives Hub data using SRU and OAI-PMH interfaces.

I’ll finish up by sharing some interesting news in the wider world of open data and metadata:

  • The JISC Managing Research Data Programme is doing some heavy lifting in terms of building a registry of metadata standards  (for UK university research datasets) – I’m sure they would be pleased to hear from you if you have any insights you’d like to share with them.
  • The Government’s call for input to their consultation on “open standards for software interoperability, data and document formats” is ongoing and it doesn’t close until 3 May so there’s plenty of time left to think about what the direct and indirect supply chain ripples might be.
  • In my last news digest I mentioned that ‘big data’ suddenly seemed to be everywhere – This week Nick Edouard’s reflective post over on the BuzzData blog struck a chord with me, particularly his point that “Open-data initiatives are good for many reasons, not least because they can radically improve internal data-sharing.” Often the discussion around open data tends towards a leap of faith/altruistic model but keeping focused on the ‘what’s in it for us?’ question seems a surer way of securing the internal resources needed to release data in the first place.

In closing, a couple of blogposts I’ve read recently have got me thinking about the importance of identifying a vision that other people can quickly understand and get behind:

I think that the Discovery vision packs a similar punch but perhaps it could be more emotive?: “[Our vision] is about making resources more discoverable both by people and machines.” Is that a vision which speaks to you? Have you found the words to succinctly describe your institution’s vision for resource discovery? Please do share your thoughts in the comments below.


w/c 6 Feb 2012 – Discovery News Round-up

February 9, 2012

Here’s my round up of news from the world of Discovery and beyond over the past few weeks. Many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf that I generated.

Last week Joy Palmer shared plans for the next phase of guidance materials and workshops here on the Discovery blog and is looking for your feedback on the outlined approach so please do wade in and let us know what you think. And bonus points for anyone who can suggest a better title for the event than ‘Un’developer hands-on development event. The best I can come up with is ‘Can’t Code, Won’t Code’ so the field is wide open.

The National Information Standards Organization (NISO) are currently inviting public comment on the working group recommendations that have come out of the joint NISO and NFAIS (the National Federation of Advanced Information Services) project to develop Recommended Practice on Online Supplemental Journal Article Materials. The main aim of the project is to improve the ‘discoverability and findability’ of journal supplemental materials for librarians and would-be readers by establishing and maintaining links to the related article. The comment period runs until 29th February and, although the recommendations are aimed mainly at publishers, they are also interested in feedback from the wider scholarly community. [via @simonhodson99]

One of the key NISO/NFAIS recommendations is around consistency and, interestingly, this was also one of the key discussion points raised during recent focus groups run by the JISC/AHRC-funded Open Access e-Books research project (OAPEN-UK). So far the project have heard from humanities and social sciences (HSS) monograph publishers, authors/readers and institutional representatives and next week they are running focus groups for research funders, e-book aggregators and learned societies. Incidentally, if you are interested in taking part in one of those focus groups then further details can be found on their Events page. [via @publishersrcly]

A couple of weeks ago it seemed to be ‘Big Data’ week on my twitter stream – all and sundry were tweeting about it and it wasn’t just the data geeks any more. It certainly seemed to suggest, as reported in this Museum Geek post, that “the era of Big Data has begun” but it struck me that the conversation around big data seems to be moving on from mostly logistical or functional discussions about gathering, storing, sharing and making use of data to a realisation that generating and circulating more data doesn’t solve anything on its own (see GigaOm’s article which likens it to virtual landfill via @paulmiller). In the world of building websites there’s a saying that ‘content is king’ but in the world of data it would appear that ‘content + context = king and queen’. Which had me pondering whether the Discovery initiative could usefully consider establishing Open Paradata Guidelines to sit alongside our Open Metadata Principles. And coming from a humanities background myself I found Michael Kramer’s assertion that “data is always already meta-data” an interesting point to mull over.

The Data Catalogs website, which was launched last summer, aims to be “the most comprehensive list of open data catalogs in the world”. I’m sure it’s relatively early days yet but there are already 212 catalogues listed and the list of experts involved in the website is impressive. It looks like it will grow into a useful centralised resource, particularly if a more advanced search is added, but I noticed that not all of the entries state what their metadata license is – it seems to me that there’s an opportunity to improve consistency and clarity by making that a mandatory field. What did impress/surprise me though is that any visitor to the website can improve a record simply by clicking on the ‘Please help improve this page by adding more information’ link at the bottom of the record and editing the fields that appear [via @rufuspollock]. If you are interested in the issues around licensing open data then Naomi Korn and Professor Charles Oppenheim’s practical guide is worth a read.

And finally, a few items of interest from the wider world of Discovery:

  • This article about book mashups on the Programmable Web ‘API News’ blog got me thinking about countless possibilities for making library and museum and gallery collections more visible and connected in new ways. Then this morning someone tweeted about the strangely hypnotic Flight Radar website and I wondered if one day I might find myself gazing at a map that shows books flying overhead as they wend their way from place to place as inter-library loans.
  • March is looking set to be Culture Hack Month, with events taking place on both sides of the Pennines. Hack for Culture takes place on the 3rd and 4th March in Liverpool and is bringing interested parties together “to explore the possibilities offered by joint experimentation with a wide variety of hidden cultural data sets”.  The 24 hour-long CultureCode Hack takes place towards the end of March in Newcastle and will give cultural and arts organisations with open data the opportunity to work with developers and designers to create something new. You can take a peek at the hacks that were developed the Culture Hack North event in Leeds last year to get an idea of what can be produced in such a short amount of time.

New Discovery open metadata projects

February 3, 2012

Five new Discovery projects started this week. They are all focused on the creation and release of open metadata from libraries, museums and archives in line with the Discovery open metadata and technical principles.

The projects are:

  • Bricolage - will publish catalogue metadata as Linked Open Data for two of its most significant collections: the Penguin Archive, a comprehensive collection of the publisher’s papers and books; and the Geology Museum, a 100,000 specimen collection housing many unique and irreplaceable resources. University of Bristol
  • Open Education Metadata UK - will publish metadata sourced from four significant UK education collections as Open Data in a variety of formats, for anyone to reuse as linked data in their own applications. In addition, subsets of two collections which have high latent potential for linked data will be catalogued. Institute of Education
  • Open Book - will release open metadata for the Fitzwilliam’s Designated Collection (over 150,000 records) and linked open data for the internationally important collection of illuminated manuscripts in the Fitzwilliam Museum (approximately 500 manuscripts records). The Fitzwilliam Museum, University of Cambridge
  • Music Collections at Cardiff University: Advancing the Resource – focuses on a collection of manuscript and printed music from the eighteenth and nineteenth centuries, a resource of nearly 3000 items largely unknown to the wider scholarly community. This project will catalogue the material online, and make the data available through the Archives Hub and COPAC, as well as RISM (UK) (Répertoire International des Sources Musicales). Cardiff University
  • Trenches to Triples - will provide Linked Data markup to 200 collection level descriptions and 6,000 item level catalogue entries relating to the First World War from the Liddell Hart Centre for Military Archives and will also provide a demonstrator for using Linked Data to make appropriate connections between image databases, Serving Soldier, and detailed catalogues. King’s College London

The projects are just getting started but will all have blogs which will record their progress. Look out for further information on the projects via the discovery site. All of the learning and outputs from these projects will be summarised on the Discovery website to ensure that others can benefit from what the projects learn and produce.

I have written an overview of all the current Discovery work on the JISC website.



The Digital Public Library of America. Highlights from Robert Darnton’s recent talk

January 24, 2012

I was fortunate to be among those attending Robert Darnton’s talk on the Digital Public Library of America initiative last week. Harvard Professor and Director of Harvard Library, Darnton is a pivotal figure behind DPLA and his talk – most concurred – was both provocative and inspirational. More than a description of the DPLA initiative, Darnton framed his talk with key issues and questions for us to reflect upon. How can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change the emerging patterns of locked down and monopolised chains of supply and demand?  And as Professor David Baker highlighted in his introduction of Darnton, there is much alignment here with the broader and more aspirational ethos of Discovery: a striving to support new marketplaces, new patterns of demand, new business models – all in the ideal pursuit of the Public Good. Arguably naïve aspirations, but certainly the tenor in the room was one of consensus, a collective pleasure at being both challenged and inspired. Like Discovery, the DPLA is a vision, a movement, tackling these grand challenges, but also striving to make practical inroads along the way.

The remainder of this post attempts to capture Darnton’s key points, and also highlight some of the interesting themes emerging in the Q&A session that followed.

————-

 “He who receives ideas from me, receives instruction himself without lessening mine; as he who lights his taper at mine receives light without darkening me” Thomas Jefferson

 

To frame his talk, Darnton invoked this oft-cited tenet of Thomas Jefferson – that the spread of knowledge benefits all. He aptly applied this concept to the concept of the internet and specifically the principles of Open Access for the Public Good, and the assumption that one citizen’s benefit does not diminish another. But of course, he cautioned, this does not mean information is free and we face a challenging time where, even as more knowledge is being produced, an increasingly smaller proportion of it is being made available to the public openly. To illustrate this, he pointed to how academic journals have increased in costs at four times the cost of inflation, and we are anticipating that these rates will continue to rise, even as Universities and libraries face increasing cutbacks. We need to ask, how can that increase in price be sustained? Health care may be a Public Good, but information about health is monopolised by those who will push it as far as the market will bear.

Darnton acknowledged that publishers will reply by deprecating the naiveté of the Jeffersonian framing of the issue. And, he conceded, journal suppliers clearly add value; it’s fair they should benefit – but how much? Publishers often invoke the concept of ‘marketplace of ideas’ But in a free marketplace, the best will survive. For Darnton, we are not currently operating in a free marketplace, as demand is simply not flexible  – publishers create niche journals, territorialise, and then crush the competition.

The questions remain, then, how can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change these locked down and monopolised chains of supply and demand?  The remainder of Darnton’s talk outlined the approaches being taken by the DPLA initiative. It’s early days, he acknowledged, but significant inroads are already being made.

So what is DPLA? A brief overview

Darnton addressed (in relative brief) the scope and content of DPLA, the costs, the legal issues being tackled, technical approaches, and governance.

Scope and content: Like Discovery, the DPLA is not to be One Big Database – instead, the approach is to establish a distributed system aggregating collections from many institutions. Their vision is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from  Hathi Trust, the Internet Archive, and U.S and international research libraries. Also carefully highlighted that the DPLA vision is being openly and deliberately defined in a manner that makes the service distinct from those services offered by public libraries, for instance excluding everything from the last 5-10 years (with a moving wall annually as more content come available as Public Domain).

The key tactic to maximise impact and reduce costs will be to aggregate collections that already exist, and so when it opens, it will likely only contain a stock of public domain items, and will grow as fast as funding commits. To achieve this, it will be designed in a way that as much as possible makes it interoperable with other Digital Libraries (for example, an agreement has already been made with Europeana). So far funding has been dedicated to building this technical architecture, but there is also a strong concentration on ongoing digitisation and collaboratively funding such initiatives.

In terms of legal issues Darnton anticipates that DPLA will be butting heads against copyright legislation – he clearly has strong personal views in this area (e.g. referring to the Google Books project as a ‘great idea gone wrong’ with Google’s failure to pursue making the content available under Fair Use)  but he was careful to distinguish these views from any DPLA policy in this regard.  But as DPLA will be not-for-profit, he suggested that they might stand a good chance to invoke the Fair Use defence in the case of orphan works, for example. But he also acknowledged this is difficult and new territory. Other open models referenced included the case of a Scandinavian style licence for public digital access to all books. He also stated that he sees the potential for private partnerships in developing value-added monetised services such as apps – while keeping the basic open access principles of the DPLA untouched.

The technical work of DPLA is still very much in progress, with a launch date of April 2013 for a technical prototype along with 6 winning ideas from a beta sprint competition. More information will be released soon.

In terms of governance, a committee has been convened and has only just started to study options for running DPLA.

Some questions from UK stakeholders

The Q&A session was kicked off by Martin Hall, VC of Salford University, who commented that in many ways there is much to be hopeful for in the UK in terms of the Open agenda. Open Access is going strongly in the UK with 120 open access repositories; and, he stated, a government that seems to ‘get it’ largely because of a fascination with forces in the open market. As a result there is a clause in new policy about making available ‘openly’ public datasets.  This is quite an extraordinary statement, Hall commented, given the implications for public health, etc. and this is possibly indicating a step change. But it all perhaps contributes to the quiet revolution occurring around Open Access.

Darnton responded by highlighting that in the USA they may have open access repositories, but that there is a low compliance rate in terms of depositing (and of course this is an issue in the UK too). But Harvard has recently mandated the deposit; and while there was less than 4% before, there is now over 50% compliance, and the repository “is bulging with new knowledge.”

In addition, Darnton reminded the group, while the government might be behind ‘Open,’ we still face opposition from the private sector. A lot of vested interests feel threatened by open access; and there is always a danger of vested interest groups capturing attention of the government.  But, he said, it’s good to see hard business reasons are being argued as well as cultural ones, but we need to be very careful.

Building on this issue, Eric Thomas, Vice Chancellor of Bristol University raised the issue of demonstrating the public value – how do we achieve this? He noted that the focus of Darnton’s talk was on supply side, but what about demand? To what extend are DPLA looking at ways to demonstrate public value, i.e. ‘this is what is happening now that couldn’t happen before…’?

In his response, Darnton referred to a number of grassroots approaches that are addressing this ‘demand’ side of the equation, including a roving Winnebago ‘roadshow’ to get communities participating in curating and digitising local artefacts. In short, DPLA is not about a website, but an organic, living entity… This approach, he later commented was about encouraging participation from the top down and bottom up.

Alistair Dunning from JISC posed the question of what will ‘stop people from going to Google?; Darnton was keen to point out that while he critiqued Google’s approach to the million books copyright situation, DPLA was in no way about ‘competing’ with Google.  People must and will use Google, and DPLA will open their metadata and indexes to ensure they are discoverable by search engines. DPLA would highly value a collaborative partnership with Google.

Peter Burnhill from EDINA raised the critical question of licensing. Making data ‘open’ through APIs can allow people to do ‘unimaginable things’; what will the licensing provision for DPLA be? CC-0?  Darnton acknowledged that this was still a matter of debate in terms of policy decisions – and especially around content. He agreed that there were unthought of possibilities in terms of Apps using DPLA, and they want to add value by taking this approach (and presumably consider sustainability options moving forward).  In short, the content would be open access, and metadata likely openly licensed, but in terms of reuse of the content itself, this *could* be commercialised in order to sustain the DPLA.

In a later comment, Caroline Brazier from the British Library expressed admiration for the vision and the energy and the drive. She explained that from the BL perspective ‘we’re there for anybody who wants to do research’; She highlighted how the British Library and the community more broadly has a huge amount to do to push on with advocacy, particularly around copyrighting issues.  This, forces all institutions of all sizes to rethink their roles in this environment – there are no barriers here, she suggested: we can do things differently. We need to think individually about what we do uniquely. What do we do? What do we invest in? What do we stop doing? Funding will be precious, and we really need to maximise the possibility to get funding.

Darnton agreed, and stated that there is a role for any library that has something unique to make it available (and of course, the British Library is the pinnacle of this). The U.S. has many independent research libraries (the Huntington, Newberry, etc) and they very much want to make room for them in the DPLA; they want to reach out to these research libraries who may be open minded but are behind closed doors in terms of broader public.

The final (and perhaps one of the most thought-provoking questions) came from Graham Taylor from the Publishers Association. He stated that he concurred with much of what Darnton had to say (perhaps surprising, he suggested, given his role) but he did comment that throughout the afternoon he had “not heard anything good about publishers.” So, he asked, where do publishers fit? In many regards, publishers are the risk-takers, the ones who work to protect intellectual property, and get all works out there – including those that pose ‘risk’ because they are not guaranteed blockbusters.

Darnton strongly agreed that publishers do add value, but, he explained, what he’s attacking is excessive, monopolistic commercial practices to such an extent that they are damaging the world of knowledge.  He was struck by Taylor’s comment on risk-taking, though, for indeed publishing is a very risky business. But sometimes the way risk is dealt with is unfortunate, with that emphasis on the blockbuster as opposed to a quality, sound backlist. So what can be done about this risktaking and sharing the burden? Later this year, he said, Harvard would be hosting a conference that explores business opportunities in publishing in open access. If publishers are gatekeepers of quality, how can open access can be used to the benefit of publishing, and so alleviate that risk-taking and raise quality?


Follow

Get every new post delivered to your Inbox.