Italian Art Computer Vision Analysis

Currently I’m working on a project to improve collaboration between art history photo archives specializing, for the time being, in Italian art.

The work has been progressing in a number of phases:

Phase 1: (Completed) Computer vision analysis was completed upon the Frick Photoarchive’s collection of anonymous Italian art images. The results of this analysis can be found here.

Phase 2: (Completed) The images from the Frick Photoarchive’s anonymous Italian collection and the Zeri Foundation’s 15th and 16th century Italian art collections were combined using computer vision techniques to find similar photos between the collections. The results of this work were detailed in a presentation at the annual PHAROS meeting.

Funding for this portion of the project was provided by a Digital Resources grant from the Kress Foundation, in cooperation with the Frick Photoarchive.

Phase 3: (Funded, In Progress) Creating a unified, searchable, database of Italian art photos compiled from the member institutions of PHAROS (International consortium of art history photo archives). There is a proposal of what the database will be. Work is currently in progress.

Code and Tools

If you are interested in performing your own analysis on your data you can use all the tools that I’ve created to do so. All of them are made freely available as Open Source projects that anyone can use.

Detailed information about the projects, their purpose, and how to use them can be found on their respective Github pages.

  • MatchEngine Data Analysis: Tools for analyzing the similarity data that comes from MatchEngine results.
  • MatchEngine Tools: Tools for efficiently uploading and downloading data from TinEye’s MatchEngine service.

Data Collection

In order to produce useful analysis two pieces of information will be required: a collection of JPEG images representing the artworks in the collection and a data file showing the relationships between the images and URLs pointing to further information about the artworks.


The first item that’ll need to be created will be a collection of the Italian art images that are within your collection. Depending upon the size of your collection it’s possible that this may be many gigabytes in size.

The best way to format the images are as follows:

  • Make sure all the images are properly formatted JPEGs (with a .jpg extension).
  • Each image should have a unique name (there should be no duplicate file names). For example there should be only one 3816.jpg file.
  • The images should be no smaller than 300 pixels in the smallest dimension. The larger they are the better the potential results.

CSV Data File

In addition to the images a data file should be provided which reveals the relationships between the images and the artworks they represent.

  • Data should be formatted as a CSV (Comma-Separated Value) file (and have an extension of .csv).
  • CSV files can be easily generated from a spreadsheet in both Microsoft Excel or Google Spreadsheets. You can select the “Export as CSV file” option.
  • It’s ok to include extra data in the CSV file, as long as the first four columns are as specified below.

Each row of the file represents a single image. There should be four pieces of information specified on each row of the file:

  1. Source Name: The name of the institution. This should be a single word, lowercase. For example for the Frick Photoarchive it’s just “frick”.
  2. Artwork ID: An ID representing the artwork which this image is representing. If your institution only has a single image for every artwork simply specify the image ID here instead.
  3. Image ID: An ID representing the image itself. This ID should directly correlate to the name of the JPEG file that was sent in the image Zip file. For example if the image file was named “3816.jpg” then this column should be “3816”.
  4. Artwork or Image URL: An accessible URL for researchers to learn more about the image, or artwork, in this row.

Some examples of equivalent data that has been generated for the Frick Photoarchive and Zeri Foundation are as follows:

Frick Photoarchive


Zeri Foundation

Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Ukiyo-e Database and Search

Japanese woodblock print database and search engine.

John Resig Twitter Updates


Infrequent, short, updates and links.