NASIG 2012: Automated metadata creation - possibilities and pitfalls

Slides from the Presentation

Click here to download the slides.

Screencast of the Presentation

Click here to view the presentation.

Summary from NASIG schedule:

This program presents an overview of automated indexing and automated metadata creation, and then discuss a project completed last summer at the Florida State University Law Research Center (formerly Law Library) which used computer created metadata to index individual pages of a looseleaf resource. The program will cover an overview of machine created metadata. Internet search engines use this almost exclusively. Some library projects, and some database companies use automated indexing. The program will highlight an index and search designed to retrieve pages from a looseleaf resource as the page appeared on a specific date over a 20 year period. This search is located at www.fsulawrc.com . This project was indexed using scripting to extract most metadata. Staff then completed missing metadata fields and audited for errors. I will present on the cost-effectiveness of automated metadata creation, given error rates and costs for human and machine produced metadata, and an overall assessment of the potentials for digital library projects. The goal is to assist catalogers in knowing what is possible, what is difficult, and what is easy in using techniques for automated metadata creation.

The Database Presented: Florida Administrative Code (1970-1983)

This is an interactive database to represent a looseleaf resource with monthly update pages over a 14 year period. The search interface allows the user to pull a single page at a time as that page appeared on a specific date.

The search is built in MySQL and PHP. The most interesting aspects of this project are the attempt to represent a resource which has different states over time (ie. different pages are in play during different time periods), and use of automated indexing to extract metadata for the pages.

There are currently no digital library platforms allowing storage and retrieval of a looseleaf resource - a resource which changes state over time. Representations of the Code of Federal Regulations and United States Code in Lexis Nexis, WestLaw, Cornell's LLI and other databases, do not store the material as a dynamic resource, but instead record how it looked at one point during the year. This project might serve as one model for building a platform to hold looseleaf resources.

Click here for documentation on the database to represent a looseleaf resource and on automated metadata creation using Visual Basic in Excel.