Hello! I'm Mark Matienzo, the Director of Technology for the Digital Public Library of America. Thank you for giving me the opportunity to speak at BCLC and to travel to Vancouver.
What is DPLA?
To start, I would first like to ask you a few questions. How many of you have heard of the Digital Public Library of America before? How many of you have used DPLA? If you're not familiar with DPLA, that's alright!
The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world. It strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. The "full range of human expression" means all kinds of stuff—cultural heritage materials such as images, sound files, moving images, journals, books, data sets, etc. The DPLA aims to expand this crucial realm of openly available materials, and make those riches more easily discovered and more widely usable and used.
DPLA has a unique opportunity to connect people across the United States and its territories with cultural heritage materials. In this presentation, I will be talking about some ways in which DPLA has established local connections, including in terms of our organizational infrastructure, our outreach, and our portal and ways in which we could improve that going forward.
History
DPLA didn’t just appear one day as a small start-up non-profit. Instead, it was the result of a two-year grassroots planning initiative that started in 2010. The DPLA planning process began in October 2010 at a meeting in Cambridge, MA. During this meeting, 40 leaders from libraries, foundations, academia, and technology projects agreed to work together to create "an open, distributed network of comprehensive online resources that would draw on the nation’s living heritage from libraries, universities, archives, and museums in order to educate, inform, and empower everyone in current and future generations." That single sentence banded together hundreds toward a common goal of building a national digital library platform.
In December 2010, the Berkman Center for Internet & Society at Harvard University convened leading experts in libraries, technology, law, and education to begin work on this ambitious project. A two-year process of intense grassroots community organization, beginning in October 2011 and hosted at the Berkman Center under the aegis of the DPLA Secretariat, brought together hundreds of public and research librarians, innovators, digital humanists, and other volunteers—organized into six workstreams and led by a distinguished Steering Committee—helped to scope, design, and construct the DPLA. The culmination of all of this hard work was DPLA’s successful launch on April 18, 2013. While the actual in-person celebration event had to be canceled on account of the Boston Marathon bombing, DPLA’s site and services launched right on-time. The April launch also marked the transition from the Harvard-based planning phase to what we call DPLA’s operational phase, or the start of DPLA as an independent 501c3 non-profit organization.
The DPLA is ...
a Portal
a Platform
an advocate for the Public Option
We describe DPLA in three ways: First, DPLA is a portal that delivers students, teachers, scholars, and the public to incredible resources, wherever they may be in America. Secondly, DPLA is a platform that enables new and transformative uses of our digitized cultural heritage. Thirdly, DPLA is an advocate for a strong public option in the twenty-first century.
The DPLA portal is a tool for the discovery of content. It currently contains records for 5.8 million digital objects that are open and freely accessible to all users anywhere. When discussing the DPLA as a portal for discovery, we like to emphasize the "one-stop shop" idea. This means that through one portal, you can access many collections with related content and see connections between that content in new ways. A search for John Steinbeck might produce results from many different DPLA partners – taken together they represent a more robust collection of Steinbeck material accessed easily through a simple search. This model means that a record from a small historical society can have the same status in DPLA as a record from a large institution like the National Archives.
When you first visit DPLA's portal, you are given a variety of ways to find cultural materials.
For example, you can perform simple searches, sort the results, and filter them by format, contributing institution or partner, date, language, location, or subject.
In addition to a familiar search paradigm, we provide a few additional interfaces that allow users to find and interact with collections in new ways. The DPLA Timeline similarly uses available time information—year, month, day—to chart records related to a search over time. Using the red slider on the right, a user can capture a particular period of time that will display in the blue section at the bottom. Within this section, they can click on search results for a particular decade or a particular year (indicated by the vertical bars—longer bars have more results). This can make it easier for some users to browse large result sets.
The DPLA Bookshelf provides is yet another way that we provide an innovative mechanism for users to interact with materials available through our portal. The items on Bookshelf represent digitized books available through the portal, from providers such as the University of California, the University of Illinois, and the New York Public Library. The shelf is shown as a vertical stack so that the titles and authors are more easily readable on their spines. The width of the book represents the actual height of the physical book, and its thickness represents its page count. The spine is colored with one of ten depths of blue to indicate how relevant the work is to the reader's search.
When a reader clicks on one of the books, additional information about it is displayed to its right. The reader can open the book with the click of a button. Further, when a reader clicks on a book, the DPLA Bookshelf displays thumbnails of images within the DPLA collection related to that book’s subject areas. Clicking on a thumbnail displays the image and additional information about it.
In addition, users can explore further by clicking on one of the subjects under which the book has been categorized. This replaces the existing shelf with a shelf containing all the other books in the DPLA collection categorized under that same subject.
We also provide a map-based interface that allows users to identify the places associated with a given item. I'll be talking about the map, how we augment the data we receive to produce this map, and some of the issues we've identified in the process, later in the presentation.
In addition to these interfaces I've just discussed, DPLA also provides curated exhibitions. The exhibitions currently on the site were curated by our partners, and through a pilot project with groups of MLIS students. Exhibition contibutors work with DPLA content from multiple institutions around a topic of national significance. Exhibitions offer some opportunity to create juxtapositions between items, use them in narratives, and give them useful context. We plan to add to our group of exhibitions through a Public Library Partnerships Project which offers digital skills training and content aggregation avenues to public libraries. I will be describing that project in more detail later in the presentation. We're particularly interested in the potential exhibitions have for teachers at many levels. They can be a useful way to introduce DPLA to new users.
The DPLA platform is one of the most important parts of our technical infrastructure. It provides us, as well as our technically-inclined users, with the ability to search and retrieve metadata ingested from our partners. In fact, the Platform directly provides this functionality to the DPLA portal. Most importantly, we provide free and open access to the Platform and the metadata available from within it.
How is it free?
How is it free? NEXT As part of the contribution process, we require all of our partners to license their metadata under the CC Zero license. This CC license lets creators and owners of copyright-protected content to waive all copyright interests in their works and thereby place them as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restrictions under copyright. In addition to the open licensing for the metadata, we have also released all of our infrastructure - the platform, the portal, and our exhibition code, under open source licenses.
Platform Data
For those developers that want to work with all of our metadata, or from a specific provider at once, we provide a bulk download option.
The DPLA API
Access to millions of items, for any purpose
But the DPLA is not just a database or a website, and it should be easy for a software developer to get started working with our metadata. We provide a set of tools that anyone can use to build their own application or interface on top of the DPLA’s aggregated data. This toolset is called an Application Programming Interface (API). APIs let computer programs talk to other computer programs, enabling application components to fit together like Lego blocks. Right now, the API gets more hits than the portal!
What powers the platform, the portal, and apps is our metadata within the platform. DPLA harvests metadata in many different formats, such as Dublin Core, MODS, MARCXML, and others. As part of the process to bring in a partner's metadata, we map the incoming data to our Metadata Application Profile. In this process, we also enrich the data.
We encourage to build applications and interfaces using our API. We also provide a public app library, which allows submissions. Submitting your app to the app library can help with promotion, and helps us demonstrate that people are actively using the API. In addition to the apps listed here, apps submitted to the library include OpenPics, an iOS application for access to cultural heritage images, several map-based visualizations, and Serendip-o-matic, a tool that analyzes your research materials for keywords and finds related items in DPLA and other sources.
A Strong Public Option
For most of American history, the ability to access materials for free through public libraries has been a central part of our culture. The DPLA works with others to ensure that this critical, open intellectual landscape remains vibrant and broad in the face of increasingly restrictive digital options. The DPLA seeks to multiply openly accessible materials to strengthen the public option that libraries represent in their communities.
As an aggregator of metadata from many institutions, DPLA is in a unique position to help our partners recognize and manage data quality issues. In October 2013, Europeana and the DPLA organized a first joint rights management workshop to explore this possibility in Boston, Massachusetts. As a result of further discussions between the DPLA and Europeana, a small joint working group to explore the possibilities for concrete collaborations in more detail has been established. This important work will, above all, make rights clear to the end user and provide a framework for aggregators and our partners. With the creation and standardization of actionable rights statements, users will know when a work is in the Public Domain, covered under a Creative Commons license or is Rights Restricted, among other possible labels. In addition, we recently submitted an entry to the Knight Foundation's Knight News Challenge to potentially support this work.
How is this possible?
Now, you may wonder how this is possible, because those 5.8 million items had to come from somewhere. In addition, DPLA has a small staff - currently, we're seven full time employees and one intern. There's no special magic here - just lots and lots of collaboration.
DPLA Hubs
We have a partnership model which we refer to as our "hubs model." The Hubs Program is designed to establish a national network of state and regional digital libraries, as well as large institutional digital libraries. The hubs model allows us to bring together digitized content from across the country into a single access point for end users, and an open platform for developers.
Content Hubs
The DPLA Content Hubs are large digital libraries, museums, archives, or repositories that maintain a one-to-one relationship with the DPLA. Content hubs tend to be larger, with collections exceeding 250,000 records and content previews (such as thumbnails or low resolution clips of audio/visual material). Content hubs work with DPLA to make their metadata globally interoperable, meaning that they work with DPLA to normalize, clean, update their data, and investigate new methods for data sharing.
Service Hubs
The DPLA Service Hubs are state or regional digital libraries that aggregate information about digital objects from libraries, archives, museums, and other cultural heritage institutions within its given state or region. Like content hubs, service hubs share metadata and content previews and work with DPLA to make their metadata globally interoperable. In addition, service hubs also represent their community as single metadata aggregation point (state, region, etc., but perhaps differently defined communities in the future). Each Service Hub also offers its state or regional partners a full menu of standardized digital services, including digitization, metadata, data aggregation and storage services, as well as locally hosted community outreach programs, bringing users in contact with digital content of local relevance.
A Network of Partners
But, the DPLA is really made up of over 1,100 partners - institutions and organizations from across the US - that provide content to or are hosted by (or have some other relationship with) our Hubs. In turn, the Hubs serve up this content to DPLA.
This chart represents the breakdown as of 2013 in terms of types of partners whose content is represented in DPLA. As you can see, there is a wide range of types of institutions, including government agencies, museums, historical societies, academic libraries, and so on.
WHY THE THE HUBS MODEL? Sustainability! The image on the left shows that one-to-one partnerships can be resource intensive, requiring more staffing and processing power to ingest and update, and to continually manage the variety of individual metadata standards and quality and feed types. The image on the right, however, represents that the Service and Content Hubs model supports the sharing of responsibilities for metadata management and feeds. It encourages collaboration, which in turn increases the likelihood of more complete and higher quality metadata and sustainable curation models.
I like to think of us more like a water cycle, wherein all partners play an equally visible and valuable role in the content sharing process.
For example, The Maxwell automobile company was formed in 1904 and ceased to exist in 1925. You love Maxwell cars: How do you know that the image even exists?
This image is from the Nicollet County Historical Society in St. Peter, Minnesota. The historical society doesn't have a digital collection on their website. You’ve never heard of Nicollet County or even St. Peter, since you don't live in Minnesota.
Luckily, Nicolett County Historical Society works with the Minnesota Digital Library, which manages their digital collections. Luckily, MDL works with DPLA. And, luckily for you, you’ve found DPLA and all of the Maxwell Automobile images available from six institutions across the US, including that one from the NCHS.
The luck continues--now you know about MDL because you’ve followed the link back to their site to see that awesome image. And, maybe now you know that the NCHS exists. And, just maybe you’ll visit their site, contact them, or spread the good word about how you found that image and where it comes from.
Partnership =Local Connections
As you can see, the hubs model allows local collections to become more easily discoverable. The hubs help us obtain the content, as well as the partners that provide that content, both serve a vital function. The hubs help mediate the relationship between the partners and DPLA. The partners are the local institutions with direct expertise and knowledge in the collections and related topics. Specifically, the partners and hubs provide an important local connection.
DPLA has another opportunity to help strengthen the local connection between service hubs and the areas they serve. With the Public Library Partnerships Project, DPLA has the opportunity to provide public librarians in a small number of states with digital skills training. This project also allows public librarians in libraries with special collections to connect with their local service hub, which can provide additional resources at the state or regional level. The project will host 12 total workshops that will reach approximately 180 public librarians, teaching skills such as writing for the web, exhibition development, and understanding intellectual property rights. At the conclusion of the project, we will release a public version of the training materials that others can reuse. Through PLPP, more public library content will appear in DPLA. Although the quality and sustainability of relationships between DPLA, Hubs, and public libraries is the bigger project priority, we will also be happy to grow our number of public library partners and establish an even stronger local connection.
Personal Connections
Of course, the hubs model allows us to do one kind of outreach. In addition to undertaking outreach to librarians and archivists, DPLA should also provide outreach to the general public However, the DPLA staff is small, with seven current employees. How can we improve outreach to new communities with this limitation, plus further limitations on our travel budgets?
Community Reps
In September 2013, DPLA staff conceived of a program of community representatives, or volunteers that would help spread the word about DPLA. While DPLA was still in its planning phase, my colleagues found that our web forums and committees provided a forum for interested and motivated people to help out and give us feedback.
Initial Reps
Our first class of community reps was announced in January 2014. We admitted approximately 100 people in the first class, and include representation from K-12 education, public libraries, state libraries, municipal archives, public history and museums, publishing, media, genealogy, and many areas of higher education. Proposed activities by our initial class of reps include creating materials to leverage DPLA as a teaching and learning resource, hackathons and other events targeted at software developers, and outreach to rural public libraries. Community Reps are assigned a contact that is a full-time DPLA employee, to whom they can direct any questions, and are provided with basic training through a webinar.
Our first class of Community Reps comes from 36 out of 50 states and two countries outside of the US, and helps to extend our outreach significantly.
Responsibilities of Community Reps
Represent DPLA formally and informally
Organize activities that promote DPLA using DPLA materials
Share materials and feedback from outreach efforts
Check in with DPLA staff about progress and share experiences
Be willing to participate in speaking or event opportunities as requested by DPLA
While Community Reps are volunteers, DPLA provides a small set of responsibilities that we use to help guide their efforts. First, we expect Community Reps to represent DPLA both formally, through recognition on our website, and informally, through various networks or organizations. Secondly, we ask them to organize at least one activity or event that help spread the word about DPLA as a portal, platform, or public option. We provide Community Reps with materials such as presentations, flyers, and so forth, and supply them with DPLA merchandise to give away, such as t-shirts, stickers, and mugs. Thirdly, we ask them to publicize their own efforts through means such as writing posts for the DPLA blog or aggregating notes or tweets from an event or activity. We also expect Community Reps to check in with a DPLA staff member on a roughly quarterly basis. Finally, we may ask Community Reps on occasion to give a presentation or participate in an event at our request. For example, if they are attending or live near a conference at which the organizers have asked for a DPLA presentation, we may ask them to take this on if DPLA staff are unable to participate.
What's Next for Reps?
The Community Reps program is already showing some early signs of success, and we could not keep up with the enthusiasm shown by the community! We have just announced recruitment for our second class of community reps, and we are looking to greatly extend our reach by searching for applicants from states and territories in which we have no reps. Applications close on April 30th, and we look forward to further expanding our reach and geographic diversity. Given some inital review of the Community Reps program, we have decided to more explicitly ask applicants to identify the communities to which they intend to provide outreach. Doing so will allow us to have a better understanding of where our strengths and gaps may be, and can provide important information about where to undertake targeted outreach in the future, either through the Reps program or using DPLA staff time.
Finding Items
The DPLA Map
Finally, and perhaps the most obvious way in which DPLA provides a place based location is through the map in the Portal.
The map interface presents items contributed by DPLA partners that have some degree of geographic information included wihtin their metadata. It is important to note that not all records in DPLA have geographic information. As the search box at the top indicates, only records with geographic information appear on the map, and while that is a large portion of the DPLA’s collection, it is not all items related to a search.
One very common misconception about the DPLA map is that it organizes records geographically by the institutions they come from. For example, users of this search might assumed that we have 55 "baseball" items owned by Arizona institutions. In reality, we have 55 items related to baseball that have been identified as "being about", representing, or depicting Arizona – like images taken there.
As you zoom in, you can more clearly see the places depicted or represented in these items, based on the information. But how do we get this information?
This goes back to the process by which we work with the metadata we receive. Many of our providers include place names or other geographic headings in their metadata, and as part of the enrichment process we identify those headings. We send the text of those headings to a geocoding service, which looks them up in a database, and returns a set of latitude and longitude coordinates for that place. Once we have those coordinates, we send them back to another service that "reverse geocodes" them and provides us with a full place hierarchy, including country, state or province, region or county, and city, when available.
Usually, this works well, except when it doesn't. One example of this is what I call the Kansas problem. If you look at the map, it suggests that there are nearly 450,000 records that are associated with Kansas. However, very few (if any) of these items are about Kansas. So, why is this happening? Naturally, we discovered it was a bug in our process. The only spatial value being passed to us from the Hubs is "United States." When we get ONLY that value, we're grabbing coordinates for the center of the country and then reverse identifying the location... as Kansas. We've had other problems as well, such as geocoding services misidentifying which place we were talking about.
Transparency
Improving our metadata enrichment process is a repetitive and iterative process, and requires feedback. Problems like this are serious, however, as we really do want to provide our user community with a sense of confidence in what we do. In addition, if we have problems with our metadata that impacts our user experience, this could easily threaten the level of trust that our users might have in our ability to provide reliable information. We are thinking about how to provide better transparency to our users about these processes given that our metadata enhancement processes evolve quickly to address changing needs, new partners, and other issues. There is some concern about how to do this in a way which explains things without relying on technical or professional jargon without oversimplifying our work.
Feedback
We also have been thinking about ways in which we can provide better user feedback mechanisms that will allow people to give us feedback about the data we present. How should we be gathering feedback from users about the accuracy of our data? One possibility would be to provide more opportunities to include user commentary, either through the site itself or through an application that uses the DPLA Platform API. In addition, how should that feedback flow back to the hubs and the partners from which we received the data? One area in which DPLA could lead would be to help provide better infrastructure to allow multiple narratives, such as lighter weight exhibitions, which we could either host or that used a minimal amount of infrastructure.
Accountability
As a small, non-profit organization, we also may not always have the subject or regional expertise about a topic to know whether the information is accurate. However, as a national-level aggregator, DPLA has some degree of responsibility here. But what is that level of responsibility? We do have some degree of influence, but we cannot always control what our hubs and partners do. In addition, how do we respect the expertise of those local partners, whose metadata may be dismissed by some as incorrect, incomplete, or otherwise problematic? What is the balance of accountability that we should strive to achieve?
Lather, Rinse, Repeat
We do not yet have the solutions, but we're certainly looking for suggestions. DPLA is uniquely situated because of its independence, and we need to take the same iterative approach to building trust and developing a connection with local communities as we use to enhance metadata received from our partners.
Thank you for your time and for bringing me to Vancouver. I look forward to speaking with you further, and I would greatly appreciate any questions or feedback about the work we're doing at DPLA.