Tag Archives: alpha

A Timeline of Event Horizons

We’ve added a new experimental feature to the collections website. It’s an interactive visualization depicting when an object was produced and when that object was collected using some of the major milestones and individuals involved in the Cooper-Hewitt’s history itself as a bracketing device.

Specifically the years 1835 when Andrew Carnegie was born and 2014 when the museum will re-open after a major renovation to Carnegie’s New York City mansion where the collection is now housed. It’s not that Andrew Carnegie’s birth signals the beginning of time but rather it is the first of a series of events that shape the Cooper-Hewitt as we know it today.

The timeline’s goal is to visualize an individual object’s history relative to the velocity of major events that define the larger collection.

Many of those events overlap. The lives of Andrew Carnegie and the Hewitt Sisters all overlapped one another and they were all alive during the construction of Carnegie’s mansion and the creation of Hewitt Sister’s Cooper Union Museum for the Arts of Decoration. The life of the mansion overlaps the Cooper-Hewitt becoming part of the Smithsonian in 1976 and assuming the mantle of the National Design Museum in the mid-1990s.

Wherever possible we show both the start and end dates for an object represented as its own underlined event span. If we only know the start date for an object we indicate that using a blue arrow. The date that the object was acquired by the museum is indicated using a white arrow.

The soundtrack of histories that surround an object are depicted as a series of sequential and semi-transparent blocks layered one atop the other to try and reflect a density of proximate events. If you mouse over the label for an event it is highlighted, in orange, in the overall timeline.

We had three motivations in creating the timeline:

  • To continue to develop a visual language to represent the richness and the complexity of our collection. To create views that allows a person to understand the outline of a history and invite further investigation.
  • To start understanding the ways in which we need to expose the collection metadata so that it can play nicely with data visualization tools.
  • To get our feet wet with the D3 Javascript library which is currently the (friendly) 800-pound gorilla in the data visualization space. D3 is incredibly powerful but also a bit of a head-scratch to get started with so this is us, getting started.

This is only the first of many more visualizations to come and we are hoping to develop a series of building blocks and methodologies to allow to build more and more experimental features as quickly as we can think of them.

So head over to the experimental section of the collections website and enable the feature flag for the Object Timeline and have a play and let us know what you think!

We’ve also made the Github repository for the underlying Javascript library that powers the timeline public and released the code under a BSD license. It should be generic enough to work for any dataset that follows a similar pattern to ours and is not specific to a museum collection.

If you look under the hood you might be horrified at what you see. We made a very conscious decision, at this stage of things while we get to know D3, to focus more on the functionality of the timeline itself rather than the elegance of the code. This is a very early experiment and we would be grateful for bug fixes and suggestions for how to make it better.

Default Sort, or what would Shannon do?

Claude Shannon

Up until recently our collections website displayed search results ordered by, well, nothing in particular. This wasn’t necessarily by design, we just didn’t have any idea of how we should sort the results. We tossed around the idea of sorting things by date or alphabet, but this seemed kind of arbitrary. And as search results get more complex, ‘keyword frequency’ isn’t necessarily equivalent to ‘relevance’.

We added the ability on our ‘fancy search‘ to sort by all of these things, but we still needed a way to present search results by default.

Enter Claude Shannon and a bit of high level math (or ‘maths’ if, like some of the team, are from this thing called ‘the Commonwealth’).

Claude Shannon was a pretty smart guy and back in 1948 in a paper titled “A Mathematical Theory of Communication” he presented the idea of Entropy, or information theory. The concept is actually rather simple, and relies on a quick analysis of a dataset to discover the probability of different parts of data within the set.

For images you can think about it by looking at a histogram and thinking of the height of each bar in the histogram as a representation of the probability that a particular pixel value will be present. With this in mind you can get a sense of how “complex” an image is. Images with really flat histograms ( lots of pixel values present lots of times ) will have a very high Entropy, where as images with severely spiked histograms ( all black or all white for example ) will have a very low Entropy.

In other words, images with more fine detail have a higher Entropy and are more complicated to express, and usually take up more room on disk when compressed.

Sidewall, a-w-793, 193040

Think of an image of a wallpaper pattern like this one. It has a really high Entropy value because within the image there is lots of fine detail and texture. If we look at the histogram for the image we can see that there are lots of pixel values represented pretty evenly across the graph with a few spikes in the middle most likely representing the overall palette of the image.

Screen Shot 2013-06-21 at 10.24.25 AM

On the other hand, check out this image of a pretty smooth vase on a white background. The histogram for this image is less evenly distributed, leaning towards the right of the graph and thus has a much lower Entropy value.

Vase, 2010-6-3, 188388

Screen Shot 2013-06-21 at 10.36.42 AMWe thought it might be interesting to sort all of the images in our collection by Entropy, displaying the more complex and finer detailed images first, so I built a simple python script that takes an image as input and returns its “Shannon Entropy” as a float.

To chew through the entire collection we built this into a simple “httpony” and built a background task to run through every image in our collection and add its Shannon Entropy as a value in the collection database. We then indexed these values in Solr and added the option to sort by “image complexity” in our Fancy Search page.

Screen Shot 2013-06-21 at 10.41.49 AM

Sorting by Shannon Entropy is kind of interesting, and we noticed right away that a small byproduct of this process is that objects that simply dont have an image wind up at the end of the sort. In the end we liked the search results so much that we made “image complexity” the default sort across the entire website. You can always go into Fancy Search and change the sort criteria to your liking, but we thought image complexity seemed to be a pretty good place to start.

But what is the relationship between Claude Shannon and Shannen Doherty? Well, it looks like Shannen, herself, has a very high Shannon Entropy…



Screen Shot 2013-06-21 at 10.58.46 AM

And another award!

MUSE award-1024

This time we picked up a Gold award from the American Association of Museum’s Media and Technology MUSE awards. We won in the ‘APIs and applications’ category against some stiff competition from some very polished tablet and mobile apps. The category rewards “digital presentations, applications, and mashups that utilize existing data and online resources to transform content into new meaningful tools or experiences.”

Once again it is nice to see recognition, this time from the broader museum sector, for the value of ‘public alpha’ releases.

We won an award


The annual international gathering that is Museums and the Web has just passed and this year we were lucky enough to win one of the Best of the Web Awards in the Research/Collections category.

We are especially proud of this award because it represents critical evaluation by our peers. And we love that they called out its tone, experimental nature, and its early alpha release. These are exactly the qualities that we believe offer the most to others in the field – something that shiny, polished, and ‘finished’ projects often don’t. What we are doing can (and perhaps, should) be copied by others.

We dedicate the award to Bill Moggridge and we’d like to particularly thank the generosity of curatorial and registration staff in letting us experiment to try re-inventing the collections online paradigm – a task that is far from over.

Congratulations to all the other winners – it is nice to be in such great company!

tms-tools == this is a blog post about code

icebergs are kind of like giant underwater unicorns when you think about it


This is a blog post about code. Which means it’s really a blog post about data.

tms-tools is a suite of libraries and scripts to extract data from TMS as CSV files. Each database table is dumped as a separate CSV file. That’s it really.

It’s a blog post about data. Which means it’s really a blog post about control. It’s a blog post about preserving a measure of control over your own data.

At the end of it all TMS is a MS-SQL database and, in 2013, it still feels like an epic struggle just to get the raw data out of TMS so that single task is principally what these tools deal with.

tms-tools is the name we gave to the first set of scripts and libraries we wrote when we undertook to rebuild the collections website in the summer of 2012. The first step in that journey was creating a read-only clone of the collections database.

Quite a lot of this functionality can be accomplished from the TMS or MS-SQL applications themselves but that involves running a Windows machine and pressing a lot of buttons. This code is designed to be part of an otherwise automated system for working with your data.

TMS will remain the ultimate source of truth for our collection metadata but for us TMS didn’t turn out to be the best choice for developing and managing the public face of that data. The code in the tms-tools repository is meant to act as a bridge between those two different needs.

There is no attempt to interpret the data or the reconcile the twisty maze of relationships between the many tables in TMS. That is left as an exercise to the reader. This is not a one-button magic pony. This is code that works for us today. It has issues. If you choose to use it you will probably discover new issues. Yay, adventure!

We’re making the tms-tools code available today on Github, released under a BSD license.

We are making this code available because we know many others in our community face similar challenges. Maybe the work we’ve done so far can help others and going forward we can try to make things a little better, together.