John Resig - PHAROS Italian Art Database Proposal

PHAROS Italian Art Database Proposal

Note: This project received funding via a Kress Digital Resources Grant in May 2015 and was completed in June 2016.

The goal of this project is to build a unified database of images of Italian art aggregated from multiple art historical photo archives that uses the image as the vehicle for performing the searches. Italian works of art were selected because they represent a critical mass of existing digitized images held by as many as six consortium institutions. This project will be publicly accessible and will serve as one of the primary means of collaboration between members of the PHAROS consortium.

The primary objective of this database will be to enable researchers to browse solely through visual connections to discover meaningful similarities between works of art documented in multiple archives. In some cases the connections will be due to two or more archives having records for the same work of art. In other instances the connections will be for images cataloged under different attributions in different archives or for previously unknown studies, copies, and versions of the work. With these connections established, researchers will have a powerful tool to facilitate discoveries and to correct and enhance attributions and other documentation. This is the goal of virtually any “linked” network; however, such networks usually require that all data be aligned in advance, necessitating time-consuming planning and significant up-front implementation costs. None of that is required for this database because of its reliance on image recognition and the individual archives in the consortium will be able to reap the benefits almost immediately.

A secondary objective is to give researchers the ability to introduce a new image for comparison against the holdings of multiple photo archives simultaneously. Positive matches will enable researchers to instantly command information about the work of art that otherwise might take weeks to discover. This kind of image search has already proved to be of tremendous value to scholars using the Ukiyo-e.org project.

Project timeline

The development of this project will take approximately one year and will be carried out entirely by John Resig. It will utilize portions of the open source infrastructure that was developed for the Ukiyo-e.org site, which will keep the full development cost reasonable. In addition to the database being made publicly accessible, an open source tool will be made available online for institutions and individuals to freely download.

Laying the Framework (2 months, June – July 2015)

Fact-finding

All of these questions will require input from members of the PHAROS consortium. The answers will help to define the scope of the rest of the project.

What metadata will be stored with photos?
Will we have separate ‘artwork’ records, or just photo records?
What is the format of the institutions’ data and how will they prefer uploading new data?
What is the desired date searching interaction?

Building Data Models

Based upon the results of the fact-finding data models will be constructed to represent the data. The data models will be used to store all of the information brought into the database. Additionally an initial draft of the data format which institutions will need to provide will be created and given to all those participating.

Initial Data Import

An initial import of the data will occur. This will be based off of the data that’s been received from the Frick Art Reference Library’s Anonymous Italian Art archive and the Zeri Foundation’s 15th and 16th century photo archives.

UI Design

Will begin working on the design of the site. Creating mock-ups of the design to use during the creation of the database.

Deliverables

Models representing the data to be stored in the database. Additionally a draft of the data format will be provided to participating institutions, allowing them to start formatting their data appropriately. Finally a rough design for the site will be mocked up (likely in a purely visual form, not usable).

Building the Database (5 months, August – December 2015)

Visual Image Connections

Work will be done to provide connections between images which have a visual similarity to one another. This will build off of the work that was done previously, comparing the Frick and Zeri collections, but will provide a usable interface which the institutions can use to compare the results.

Text Searching

A means of text searching will be provided which will index all of the text provided by the institutions and make it possible to find the results. Thus if you search for “cat” you’ll find all artworks and images that have the text “cat” associated with them.

Date Searching

It’ll be possible to search for artworks that have a date specified by searching using a particular date or date range. The exact means through which the searching will occur still has yet to be decided.

Image Searching

A way to upload an image, or provide a link to an image, and find visually similar images inside the collection. This will work similarly to the functionality on Ukiyo-e.org, allowing researchers to find images that are visually similar to the uploaded image.

Deliverables

A rough, usable, version of the site will be ready for use by members of PHAROS. It will likely only contain data from the Frick and Zeri collections. It’ll be possible to browse artworks and images in the institutions and search through them using text, dates, and image uploads.

Importing the Data (3 months, January – March 2016)

Institutional Data Import Interface

An interface for institutions to import their data and images into the database will be constructed. It will allow them to easily upload large numbers of images, and their corresponding data, and have it immediately appear in the database.

Private interfaces will be provided to the institutions so that only authorized personnel are permitted to contribute data.

Institutional Browsing Interface

An interface for institutions to browse their artworks and examine the connections between their artworks and the artworks in other institutions. Specifically institutions will be able to see when there are visually similar artworks in other institutions and potentially even see when there might be differences between the records from one institution to another. Finally a way to identify artworks that match similar artworks in the same institution (helping to identify potential mis-cataloged images) will be provided.

Deliverables

Institutions should now be able to import data and images from their collections directly into the database. Additionally they’ll be able to browse their data and gain a greater understanding of the data held in their collections, and compare it with other institutions.

Finalizing the Database (2 months, April – May 2016)

Internationalization

The interface of the site will be marked up so that all of the English text on the site can be translated into other languages. A means will be provided to the institutions to help translate the English text into their native language. Alternative forms of the site will be constructed for each language (en, it, de). Researchers and the public will be able to browse the version of the site that is most useful for them.

Finalize Design Work

The design of the site and database will be finalized. Any lingering design or usability concerns will be addressed at this time.

Write Documentation and Finalize Open Source Code

The code base will be finalized and the entirety of the project will be made available as open source code. Documentation on how the system works, and how someone could run their own version of the database, will be provided.

Deliverables

A fully-usable form of the database will be available for the PHAROS institutions, and the public, to use. All of the interface will be made available in multiple languages and extensive documentation will be provided to help those who wish to run their own version of the database.

Database Components

The database will have the following components:

Browsing Images and Text Searching

Images and metadata will be aggregated from the contributing archives and will be presented in an easy-to-browse format. The corresponding metadata for the works of art will be collected with the images, organized within the database, and will be searchable via a traditional search box. The specific artist attribution, title, and any descriptive text will be keyword searchable. For example, “Madonna” will find all images associated with that word in the metadata. In the context of this project, no attempt will be made to unify the search terms across languages or with similar meanings, however. Therefore, to obtain the most complete results from a text search for “Child” the researcher would also have to search the Italian and German words “bambino” and “kind,” while for “Madonna” a search for “Virgin” would expand the results. The unification of the data will be addressed in the future.

Image Similarity Analysis

When viewing the images associated with a single work of art, researchers will be able to see all of the images that are visually similar such as copies, versions, or portions of the work of art. Seamless navigation between images will allow for ease of comparison. Searching by Image

Using the image similarity analysis, researchers will be able to upload an image and find all identical or nearly identical works of art recorded in the contributing photo archives. This will greatly reduce the time required to find a work of art in the combined databases, no longer relying on the text documentation, such as artist attribution, or slowly and manually browsing through hundreds or thousands of images for matches.

Institutional Data Upload Interface

The digital images and metadata for the works of art will be contributed to the database individually by the participating archives. An easy-to-use interface will be developed so that staff at each institution can manually upload batches of new images and data to the site. An interface for logging in and contributing to the database will be provided for each institution. The requisite format for images and data will be clearly documented.

Multilingual Interface

The principal interface (buttons and navigation labels) for the database will be available in multiple languages as the database will be used by individuals and institutions from around the world. When the project is launched, the default language will be English, but the interface will be available in English, German, and Italian, reflecting the majority of PHAROS consortium members. Translations for the interface will be provided by the participating institutions. None of the content of the database will be automatically translated at this time.

System Documentation

Documentation provided with the system will clearly explain how the database can be used with any institution’s digitized photo archive collection. This will include information on how images and image metadata should be prepared for uploading to the service. Detailed instructions on how to set up and run the database also will be compiled to ensure long-term sustainability.

System Maintenance

The Getty Research Institute has committed to long-term maintenance of the server, however some additional maintenance—for example fixing minor bugs—will be required from time to time. A small amount of work will also be required to add new institutions to the database and train the new participants in the database procedures. The anticipated amount of work over the course of two years after the project launch has been amortized into the proposed budget.

Future Projects

There are a few areas of potential further research. Most of these will likely happen during future updates of this project.

Auto-identify word expansion across multiple languages to improve text searching (e.g. child, bambino, and kind all can map to the same term).
Explore alternative technology for image similarity matching. Especially if we’re able to find one that’s open source and freely available.
Attempt to unify records for artist names. Likely using something like the Getty ULAN Vocabulary.
Attempt to unify other attributes, such as location (also using Getty Vocabularies).

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Subscribe for email updates

@jeresig / Mastodon

Infrequent, short, updates and links.