Software Bertillonage: Finding the provenance of an entity
|Julius Davies1, Daniel M. German1, Michael W. Godfrey2, Abram Hindle3||1 University of Victoria, 2 University of Waterloo, 3UC Davis|
Replication & Data
This site was built to help researchers replicate our 'Bertillonage' study of a proprietary e-commerce application. We also offer processed data for researchers who wish to see the results in more detail.
Click on hyperlinks in the two tables to the right to see additional processed data. When you drilldown you will see two types of files for each jar:
The full replication package is available for download: 2011-bertillonage-replication.zip (29.2 MB).
This zip file contains 80 Java binary archives (jars) that are very similar to the 84 we found in a proprietary e-commerce application running inside a North American financial institution. We also include instructions for obtaining an 81st jar.
To perform a full replication, we recommend mirroring the Maven 2 central repository. Circa late 2010, the Maven 2 central repository requires approximately 200GB of disk space, and takes about 5 days to download. Instructions for mirroring can be found in our original paper.
About the data
We were unable to take the jars from our original study outside of the financial institution. For replication purposes we have re-downloaded the closest matches possible for each library. All of the jars in this replication package (and summarized in the tables to the right) come from original project sites (i.e. apache.org, sourceforge.net, etc). We believe this creates a good approximation of the original artifacts that is also unencumbered by any legal or propietary concerns.
Many of the jars (around 30%) were byte-for-byte identical with jars from the original study, which is an interesting observation in and of itself, since it shows many libraries are never recompiled from source, even in Maven 2.