North Carolina Heritage Index Project

PROJECT SPECIFICATION

Description:

The aim of the "Heritage Indexing" proof-of-concept project is to implement a "one stop shop" means for librarians and library users to simultaneously search metadata for digital library assets of participating North Carolina libraries. During the pilot phase of the project the collections will be from the following libraries: the North Carolina Digital Heritage Center, the State Library of North Carolina, and the University of North Carolina at Chapel Hill. The index and search interface will exist on NC LIVE servers and be accessible through the nclive.org site (UPDATE: the completed site used to be found at ncecho.org).

Target Launch Date:

The "beta" version with which searches across the participating pilot collections can be performed via a web-based, graphical user interface will launch by the end of February 2012.

Target Audience:

Envisioned end-users for the project are librarians as well as library patrons across the state of North Carolina who are interested in discovering North Carolina digital assets.

Requirements:

The core requirements of the project are use open technologies to:
 

  • Develop software capable of retrieving metadata feeds from open, RESTful APIs delivering XML-formatted metadata in a schema-agnostic manner. The software will be executable via the command line to facilitate repeatable, chronological execution, thereby keeping the index "fresh". The software will "look" to external XML setup files in regard to which feeds to index and how to crosswalk the metadata into a format for indexing, thus making it extensible without the need to modify the source code.
  • Build an NC LIVE hosted Solr index of descriptive metadata from the pilot library collections.
  • Customize an NC LIVE hosted Drupal-compatible GUI interface with which to search the collection.
  • Customize an NC LIVE hosted RESTful API to allow programmatic querying of the index. (UPDATE: During the course of this project, this requirement was subsequently removed given the freely available API available through the chosen indexer/Solr).


Features:

The public facing feature is an NC LIVE hosted Drupal-compatible GUI interface with which to search the collection. This will allow users to enter basic search keywords into a simple search box and retrieve lists of matching metadata for digital items from participating collections. Boolean searches will be supported.

Users will be able to read the basic descriptive metadata on the NC LIVE page and visit the item, via hyperlink, on the original collection site.

In addition to search capabilities, browse functionality via facets will also be supported. Given the North Carolina-centric aspect of the project, the ability for end-users to browse the collection via North Carolina counties is a highly desired browsing feature.

Lastly, programmatic access to the index will be provided by a simple API, allowing developers to use the aggregated data in the index for their own projects. (UPDATE: see earlier note under "Requirements").

Out of Scope:

The project "will not replace the content management systems currently in use at the contributing institutions ... Nor will the federated search replace the custom searching and browsing options available on the individual partner's websites." - Graham, Nicholas, North Carolina Digital Heritage Center, internal project memo.

In addition, the project workflow will not involve the following:
 

  • Metadata remediation. That is to say metadata remediation, if desired by a contributing library, must take place at the point of origin and not within the index itself.
  • Support for metadata retrieval from databases or repositories lacking open methods of metadata harvesting via HTTP/RESTful APIs.
  • Support for metadata retrieval from databases or repositories incapable or unwilling to expose metadata in a consistent, structured manner.
  • Customized indexing methods for specific collections or specific institutions.


As the project develops, more issues may emerge that may require them to be explicitly denoted as "out of scope" for the project.

Open Issues:

No open issues are known at this time.

Cost Estimate:

Envisioned costs will emerge from in-kind contributions and staff time. In-kind contributions include the provision of items such as server space by NC LIVE to host the index and execute necessary applications.

Team:

  • Project Manager: Nitin Arora, NC LIVE
  • Product Owner: Tim Rogers, NC LIVE
  • Content Partners:

o North Carolina Digital Heritage Center

o University of North Carolina at Chapel Hill

o State Library of North Carolina

  • Collection Group:

o Nicholas Graham, NC Digital Heritage Center, Lead

o Amy Rudersdorf, State Library of NC

  • Technical Group:

o Stephanie Williams, NC Digital Heritage Center, Lead

o Nitin Arora

  • Testing Group:

o Jennifer Ricker, State Library of NC, Lead

  • Metadata Group:

o Amy Rudersdorf, State Library of NC, Lead

o Maggie Dickson, NC Digital Heritage Center

o Lisa Gregory, State Library of NC

  • Technical Support:

o Scott Ross, NC LIVE

o Dean Farrell, State Library of NC

*Leads are responsible for coordinating meetings among colleagues within a group and for providing necessary documentation such as public facing documents, minutes, etc. related to project communications.