Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012

Image

Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

The rise of REST – ‘Hot API Protocols and Styles’ from John Musser of Programmableweb.com at SXSW 2012

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous. John’s now made his slides available on slideshare.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:


The Digital Public Library of America. Highlights from Robert Darnton’s recent talk

January 24, 2012

I was fortunate to be among those attending Robert Darnton’s talk on the Digital Public Library of America initiative last week. Harvard Professor and Director of Harvard Library, Darnton is a pivotal figure behind DPLA and his talk – most concurred – was both provocative and inspirational. More than a description of the DPLA initiative, Darnton framed his talk with key issues and questions for us to reflect upon. How can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change the emerging patterns of locked down and monopolised chains of supply and demand?  And as Professor David Baker highlighted in his introduction of Darnton, there is much alignment here with the broader and more aspirational ethos of Discovery: a striving to support new marketplaces, new patterns of demand, new business models – all in the ideal pursuit of the Public Good. Arguably naïve aspirations, but certainly the tenor in the room was one of consensus, a collective pleasure at being both challenged and inspired. Like Discovery, the DPLA is a vision, a movement, tackling these grand challenges, but also striving to make practical inroads along the way.

The remainder of this post attempts to capture Darnton’s key points, and also highlight some of the interesting themes emerging in the Q&A session that followed.

————-

 “He who receives ideas from me, receives instruction himself without lessening mine; as he who lights his taper at mine receives light without darkening me” Thomas Jefferson

 

To frame his talk, Darnton invoked this oft-cited tenet of Thomas Jefferson – that the spread of knowledge benefits all. He aptly applied this concept to the concept of the internet and specifically the principles of Open Access for the Public Good, and the assumption that one citizen’s benefit does not diminish another. But of course, he cautioned, this does not mean information is free and we face a challenging time where, even as more knowledge is being produced, an increasingly smaller proportion of it is being made available to the public openly. To illustrate this, he pointed to how academic journals have increased in costs at four times the cost of inflation, and we are anticipating that these rates will continue to rise, even as Universities and libraries face increasing cutbacks. We need to ask, how can that increase in price be sustained? Health care may be a Public Good, but information about health is monopolised by those who will push it as far as the market will bear.

Darnton acknowledged that publishers will reply by deprecating the naiveté of the Jeffersonian framing of the issue. And, he conceded, journal suppliers clearly add value; it’s fair they should benefit – but how much? Publishers often invoke the concept of ‘marketplace of ideas’ But in a free marketplace, the best will survive. For Darnton, we are not currently operating in a free marketplace, as demand is simply not flexible  – publishers create niche journals, territorialise, and then crush the competition.

The questions remain, then, how can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change these locked down and monopolised chains of supply and demand?  The remainder of Darnton’s talk outlined the approaches being taken by the DPLA initiative. It’s early days, he acknowledged, but significant inroads are already being made.

So what is DPLA? A brief overview

Darnton addressed (in relative brief) the scope and content of DPLA, the costs, the legal issues being tackled, technical approaches, and governance.

Scope and content: Like Discovery, the DPLA is not to be One Big Database – instead, the approach is to establish a distributed system aggregating collections from many institutions. Their vision is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from  Hathi Trust, the Internet Archive, and U.S and international research libraries. Also carefully highlighted that the DPLA vision is being openly and deliberately defined in a manner that makes the service distinct from those services offered by public libraries, for instance excluding everything from the last 5-10 years (with a moving wall annually as more content come available as Public Domain).

The key tactic to maximise impact and reduce costs will be to aggregate collections that already exist, and so when it opens, it will likely only contain a stock of public domain items, and will grow as fast as funding commits. To achieve this, it will be designed in a way that as much as possible makes it interoperable with other Digital Libraries (for example, an agreement has already been made with Europeana). So far funding has been dedicated to building this technical architecture, but there is also a strong concentration on ongoing digitisation and collaboratively funding such initiatives.

In terms of legal issues Darnton anticipates that DPLA will be butting heads against copyright legislation – he clearly has strong personal views in this area (e.g. referring to the Google Books project as a ‘great idea gone wrong’ with Google’s failure to pursue making the content available under Fair Use)  but he was careful to distinguish these views from any DPLA policy in this regard.  But as DPLA will be not-for-profit, he suggested that they might stand a good chance to invoke the Fair Use defence in the case of orphan works, for example. But he also acknowledged this is difficult and new territory. Other open models referenced included the case of a Scandinavian style licence for public digital access to all books. He also stated that he sees the potential for private partnerships in developing value-added monetised services such as apps – while keeping the basic open access principles of the DPLA untouched.

The technical work of DPLA is still very much in progress, with a launch date of April 2013 for a technical prototype along with 6 winning ideas from a beta sprint competition. More information will be released soon.

In terms of governance, a committee has been convened and has only just started to study options for running DPLA.

Some questions from UK stakeholders

The Q&A session was kicked off by Martin Hall, VC of Salford University, who commented that in many ways there is much to be hopeful for in the UK in terms of the Open agenda. Open Access is going strongly in the UK with 120 open access repositories; and, he stated, a government that seems to ‘get it’ largely because of a fascination with forces in the open market. As a result there is a clause in new policy about making available ‘openly’ public datasets.  This is quite an extraordinary statement, Hall commented, given the implications for public health, etc. and this is possibly indicating a step change. But it all perhaps contributes to the quiet revolution occurring around Open Access.

Darnton responded by highlighting that in the USA they may have open access repositories, but that there is a low compliance rate in terms of depositing (and of course this is an issue in the UK too). But Harvard has recently mandated the deposit; and while there was less than 4% before, there is now over 50% compliance, and the repository “is bulging with new knowledge.”

In addition, Darnton reminded the group, while the government might be behind ‘Open,’ we still face opposition from the private sector. A lot of vested interests feel threatened by open access; and there is always a danger of vested interest groups capturing attention of the government.  But, he said, it’s good to see hard business reasons are being argued as well as cultural ones, but we need to be very careful.

Building on this issue, Eric Thomas, Vice Chancellor of Bristol University raised the issue of demonstrating the public value – how do we achieve this? He noted that the focus of Darnton’s talk was on supply side, but what about demand? To what extend are DPLA looking at ways to demonstrate public value, i.e. ‘this is what is happening now that couldn’t happen before…’?

In his response, Darnton referred to a number of grassroots approaches that are addressing this ‘demand’ side of the equation, including a roving Winnebago ‘roadshow’ to get communities participating in curating and digitising local artefacts. In short, DPLA is not about a website, but an organic, living entity… This approach, he later commented was about encouraging participation from the top down and bottom up.

Alistair Dunning from JISC posed the question of what will ‘stop people from going to Google?; Darnton was keen to point out that while he critiqued Google’s approach to the million books copyright situation, DPLA was in no way about ‘competing’ with Google.  People must and will use Google, and DPLA will open their metadata and indexes to ensure they are discoverable by search engines. DPLA would highly value a collaborative partnership with Google.

Peter Burnhill from EDINA raised the critical question of licensing. Making data ‘open’ through APIs can allow people to do ‘unimaginable things’; what will the licensing provision for DPLA be? CC-0?  Darnton acknowledged that this was still a matter of debate in terms of policy decisions – and especially around content. He agreed that there were unthought of possibilities in terms of Apps using DPLA, and they want to add value by taking this approach (and presumably consider sustainability options moving forward).  In short, the content would be open access, and metadata likely openly licensed, but in terms of reuse of the content itself, this *could* be commercialised in order to sustain the DPLA.

In a later comment, Caroline Brazier from the British Library expressed admiration for the vision and the energy and the drive. She explained that from the BL perspective ‘we’re there for anybody who wants to do research’; She highlighted how the British Library and the community more broadly has a huge amount to do to push on with advocacy, particularly around copyrighting issues.  This, forces all institutions of all sizes to rethink their roles in this environment – there are no barriers here, she suggested: we can do things differently. We need to think individually about what we do uniquely. What do we do? What do we invest in? What do we stop doing? Funding will be precious, and we really need to maximise the possibility to get funding.

Darnton agreed, and stated that there is a role for any library that has something unique to make it available (and of course, the British Library is the pinnacle of this). The U.S. has many independent research libraries (the Huntington, Newberry, etc) and they very much want to make room for them in the DPLA; they want to reach out to these research libraries who may be open minded but are behind closed doors in terms of broader public.

The final (and perhaps one of the most thought-provoking questions) came from Graham Taylor from the Publishers Association. He stated that he concurred with much of what Darnton had to say (perhaps surprising, he suggested, given his role) but he did comment that throughout the afternoon he had “not heard anything good about publishers.” So, he asked, where do publishers fit? In many regards, publishers are the risk-takers, the ones who work to protect intellectual property, and get all works out there – including those that pose ‘risk’ because they are not guaranteed blockbusters.

Darnton strongly agreed that publishers do add value, but, he explained, what he’s attacking is excessive, monopolistic commercial practices to such an extent that they are damaging the world of knowledge.  He was struck by Taylor’s comment on risk-taking, though, for indeed publishing is a very risky business. But sometimes the way risk is dealt with is unfortunate, with that emphasis on the blockbuster as opposed to a quality, sound backlist. So what can be done about this risktaking and sharing the burden? Later this year, he said, Harvard would be hosting a conference that explores business opportunities in publishing in open access. If publishers are gatekeepers of quality, how can open access can be used to the benefit of publishing, and so alleviate that risk-taking and raise quality?


Follow

Get every new post delivered to your Inbox.