Category Archives: Collection data

A colophon for bias

The term [colophon] derives from tablet inscriptions appended by a scribe to the end of a … text such as a chapter, book, manuscript, or record. In the ancient Near East, scribes typically recorded information on clay tablets. The colophon usually contained facts relative to the text such as associated person(s) (e.g., the scribe, owner, or commissioner of the tablet), literary contents (e.g., a title, “catch” phrase, number of lines), and occasion or purpose of writing.

Wikipedia

A couple of months ago we added the ability to search the collections website by color using more than one palette. A brief refresher: Our search by color functionality works by first extracting the dominant palette for an index. That means the top 5 colors out of a possible 32 million choices. 32 million is too large a surface area to search against so each of the five results are then “snapped” to their closest match on a much smaller grid of possible colors. These matches are then indexed and used to query our database when someone searches for objects matching a specific color.

It turns out that the CSS3 color palette which defines a fixed set of 138 colors is an excellent choice for doing this sort of thing. CSS is the acronym for Cascading Style Sheets (CSS) which is a “language used to describe the presentation” of a webpage separate from its content. Instead of asking people searching the collections website to be hyper-specific in their queries we take the color they are searching for and look for the nearest match in the CSS palette.

For example: #ef0403 becomes #ff0000 or “red”. #f2e463 becomes #f0e68c or “khaki” and so on.

This approach allows us to not only return matches for a specific color but also to show objects that are more like a color than not. It’s a nice way to demonstrate the breadth of the collection and also an invitation to pair objects that might never be seen together.

search-is-over.020-640

From the beginning we’ve always planned to support multiple color palettes. Since the initial search-by-color functionality was built in a hurry with a focus on seeing whether we could get it to work at all adding support for multiple palettes was always going to require some re-jiggering of the original code. Which of course means that finding the time to make those changes had to compete with the crush of everything else and on most days it got left behind.

Earlier this year Rebecca Alison Meyer the 6-year old daughter of Eric Meyer, a long-standing member of the CSS community, died of cancer. Eric’s contributions and work to promote the CSS standard can not be overstated. The web would be an entirely other (an entirely poorer) space without his efforts and so some people suggested that a 139th color be added to the CSS Color module to recognize his work and honor his daughter. In June Dominique Hazaël-Massieux wrote:

I’m not sure about how one goes adding names to CSS colors, and what the specific purpose they fulfill, but I think it would be a good recognition of @meyerweb ‘s impact on CSS, and a way to recognize that standardization is first and foremost a social process, to name #663399 color “Becca Purple”.

In reply Eric Meyer wrote:

I have been made aware of the proposal to add the named color beccapurple (equivalent to #663399) to the CSS specification, and also of the debate that surrounds it.

I understand the arguments both for and against the proposal, but obviously I am too close to both the subject and the situation to be able to judge for myself. Accordingly, I let the editors of the Colors specification know that I will accept whatever the Working Group decides on this issue, pro or con. The WG is debating the matter now.

I did set one condition: that if the proposal is accepted, the official name be rebeccapurple. A couple of weeks before she died, Rebecca informed us that she was about to be a big girl of six years old, and Becca was a baby name. Once she turned six, she wanted everyone (not just me) to call her Rebecca, not Becca.

She made it to six. For almost twelve hours, she was six. So Rebecca it is and must be.

Shortly after that #663399 or rebeccapurple was added to the CSS4 Colors module specification. At which point it only seemed right to finally add support for multiple color palettes to the collections website.

20140818-rebeccapurple-sm

Over the course of a month or so, in the margins of day, all of the search-by-color code was rewritten to work with more than a single palette and now you can search the collection for objects in the shade of rebeccapurple.

In addition to the CSS3 and CSS4 color palettes we also added support for the Crayola color palette. For example, the closest color to “rebeccapurple” in the Crayola scheme of things is “cyber grape”.

You can see all the possible nearest-colors for an object by appending /colors to an object page URL. For example:

https://collection.cooperhewitt.org/objects/18380795/colors

The dominant color for this object is #683e7e which maps to #58427c or “cyber grape” in Crayola-speak and #483d8b or “dark slate blue” in CSS3-speak and #663399 or “rebeccapurple” in CSS4-speak.

Now that we’ve done the work to support multiple palettes the only limits to adding more is time and imagination. I would like to add a greyscale palette. I would like to add one or more color-blind palettes. I would especially like to add a “blue” palette – one that spans non-photo blue through International Klein Blue all the way to Kind of Bloop midnight blue just to see where along that spectrum objects which aren’t even a little bit blue would fall.

Screen Shot 2014-10-26 at 12.42.02 PM

The point being that there are any number of color palettes that we can devise and use as a lens through which to see our collection. Part of the reason we chose to include the Crayola color palette in version “2” of search-by-color is because the colors they’ve chosen have been given expressive names whose meaning is richer than the sum of their descriptive parts. What does it mean for an object’s colors to be described as macaroni and cheese-ish or outer space-ish in nature? Erika Hall’s 2007 talk Copy is Interface is an excellent discussion of this idea.

I spoke about some of these things last month at the The Search is Over workshop, in London. I described the work we have done on the collections website, to date, as a kind of managing of absence. Specifically the absence of metadata and ways to compensate for its lack or incompleteness while still providing a meaningful catalog and resource.

It is through this work that we started to articulate the idea that: The value of the whole in aggregate, for all its flaws, outweighs the value of a perfect subset. The irregular nature of our collection metadata has also forced us to consider that even if there were a single unified interface to convey the complexities of our collection it is not a luxury we will enjoy any time soon.

search-is-over.023-640

Further the efforts of more and more institutions (the Cooper Hewitt included) to embark on mass-digitization projects forces an issue that we, as a sector, have been able to side-step until now: That no one, including lots of people who actually work at museums, have ever seen much of the work in our collections. So in relatively short-order we will transition from a space defined by an absence of data to one defined by a surfeit of, at the very lest, photographic evidence that no one will know how to navigate.

To be clear: This is a good problem to have but it does mean that we will need to starting thinking about models to recognize the shape of the proverbial elephant in the room and building tools to see it.

It is in those tools that another equally important challenge lies. The scale and the volume of the mass-digitization projects being undertaken means that out of necessity any kind of first-pass cataloging of that data will be done by machines. There simply isn’t the time (read: money) to allow things to be cataloged by human hands and so we will inevitably defer to the opinion of computer algorithms.

This is not necessary as dour a prediction as it might sound. Color search is an example of this scenario and so far it’s worked out pretty well for us. What search-by-color and other algorithmic cataloging points to is the need to develop an iconography, or a colophon, to indicate machine bias. To design and create language and conventions that convey the properties of the “extruder” that a dataset has been shaped by.

search-is-over.033-640

Those conventions don’t really exist yet. Bracketing search by color with an identifiable palette (a bias) is one stab at the problem but there are so many more places where we will need to signal the meaning (the subtext?) of an automated decision. We’ve tried to address one facet of this problem with the different graphic elements we use to indiciate the reasons why an object may not have an image.

missing-nnot-available-n

no-photography-n



Left to right: We’re supposed to have a picture for this object… but we can’t find it; This object has not been photographed; This object has been photographed but for some reason we’re not allowed to show it to you… you know, even though it’s been acquired by the Smithsonian.

Another obvious and (maybe?) easy place to try out this idea is search itself. Search engines are not, in fact, magic. Most search engines work the same way: A given string is “tokenized” and then each resultant piece is “filtered”. For the example the phrase “checkered Girard samples” might typically be tokenized by splitting things on whitespace but you could just as easily tokenize it by any pattern that can be expressed to a computer. So depending on your tokenized you might end up with a list like:

  • checkered
  • Girard
  • samples

Or:

  • checkered Girard
  • samples

Each one of those “tokens” are then analyzed and filtered according to their properties. Maybe they get grouped by their phonetics, which is essentially how the snap-to-grid trick works for the collection’s color search. Maybe they are grouped by what type of word they are: proper nouns, verbs, prepositions and so on. I’ve never actually seen a search engine that does this but there is nothing technically to prevent someone from doing it either.

The simplest and dumbest thing would be to indicate on a search results page that your query results were generated using one or more tokenizers or filters. In our case that would be (1) tokenizer and (5) filters.

Tokenizers:

    1. Unicode Standard Annex #29

Filters:

      1. Remove English possessives
      2. Lowercase all tokens
      3. Ingore a set list of stopwords
      4. Stem tokens according to the Porter Stemming Algorithm
      5. Convert non-ascii characters to ascii

That’s not very sexy or ooh-shiny but not everything needs to be. What it does, though, is provide a measure of transparency for people to gauge the reality that any result set is the product of choices which may have little or no relationship to the question being asked or the person asking that question.

These are devices, for sure, and they are not meant to replace a more considered understanding or contemplation of a topic but they can act as an important shorthand to indicate the arc of an answer’s motive.

search-is-over.038-640

And that’s just for search engines. Now imagine what happens when we all start pointing computer vision algorithms at our collections…


Update: Since publishing this blog post the nice people working on the GOV.UK websites launched “info” pages. Visitors can now append /info to any of the pages on the gov.uk website will and see what and who and how that part of the website is supposed to do. Writing about the project they say:

An ‘info’ page contains the user needs the page is intended to meet … Providing an easy way to jump from content to the underpinning needs allows content designers coming to a new topic to understand the need and build empathy with the users quicker. Publishing the GOV.UK user needs should also make the team’s work more transparent and traceable.

Bravo!

Video Capture for Collection Objects

Stepping inside a museum storage facility is a cool experience. Your usual gallery ambience (dramatic lighting, luxurious swaths of empty space, tidy labels that confidently explain all) is completely reversed. Fluorescent lights are overhead, keycode entry pads protect every door, and official ID badges are worn by every person you see. It’s like a hospital, but instead of patients there are 17th century nightgowns and Art Deco candelabras. Nestled into tiny, sterile beds of acid-free tissue paper and archival linen, the patients are occasionally woken and gently wheeled around for a state-of-the-art microscope scan, an elaborate chemical test, or a loving set of sutures.

A gloved, cardigan-ed museum worker pushing a rolling cart down a hallway of large white shelving units.

A rare peek inside the storage facility.

If you ask a staff member for an explanation of this or that object on the nearest cart or shelf, they might tell you a detailed story, or they might say that so far, not much is known. I like the element of unevenness in our knowledge, it’s very different from the uniform level of confidence one sees in a typical exhibition.

The web makes it possible to open this space to the public in all its unpolished glory – and many other museums have made significant inroads into new audiences by pulling back the curtain. The prospect is like catnip for the intellectually curious, but hemlock for most museum employees.

Typically, the only form of media that escapes this secretive storage facility are hi-res TIFFs artfully shot in an on-site photography studio. The seamless white backdrop and perfectly staged lighting, while beautiful and ideal for documentation, completely belie the working lab environment in which they were made.

We just launched a new video project called “Collections in Motion.” The idea is super simple: short videos that demonstrate collections objects that move, flip, click, fold, or have any moveable part.

Here are some of the underlying thoughts framing the project:

  • Still images don’t suffice for some objects. Many of them have moving parts, make sounds, have a sense of weight, etc that can’t be conveyed through images.
  • Our museum’s most popular videos on YouTube are all kinetic, kinda entrancing, moving objects. (Contour Craft 3D Printing, A Folding Bicycle, and a Pop-up Book, for example).
  • Videos played in the gallery generally don’t have sound or speakers available.
  • In research interviews with various types of visitors, many people said that they wouldn’t be interested in watching a long, involved video in a museum context.
  • Animated GIFs, 6-second Vines, and 15-second Instagram videos loom large in our contemporary visual/communication culture.
  • How might we think of the media we produce (videos, images, etc) as a part of an iterative process that we can learn from over time? Can we get comfortable with a lower quality but higher number of videos going out to the public, and seeing what sticks (through likes, comments, viewcount, etc)?

 

A screenshot from YouTube Analytics showing most popular videos: Contour Crafting, Folding Bicycle, Puss in Boots Pop-up book, et cetera

Our most popular YouTube videos for this quarter. They are all somewhat mesmerizing/cabinet-of-curiosity type things.

Here are some of the constraints on the project:

  • No budget (pairs nicely with the preceding bullet).
  • Moving collections objects is a conservation no-no. Every human touch, vibration and rub is bad for the long-long-longevity of the object (and not to mention the peace of mind of our conservators).
  • Conservators’ and curators’ time is in HIGH demand, especially as we get closer to our re-opening. They are busy writing new books, crafting wall labels, preparing gallery displays, etc. Finding a few hours to pull an object from storage and move it around on camera is a big challenge.

So, nerd world, what do you think?

Dataclimber explores colors in the Cooper Hewitt collection

Rubén Abad's #museumselfie outside of a museum

Rubén Abad’s #museumselfie outside of a museum

A few weeks ago we became aware of Rubén Abad’s poster which shows all the colours in our collection by decade. We sent a few questions over to Spain to find out more . . .

Q: What were some of the precursors to the color poster? What inspired you?

A: The idea came when I first saw Lev Manovich’s ‘Software Takes Command‘ book cover. When I started looking at the data, another couple of paintings came to my mind. For example, Salvador Dalí’s series about visual perception and ‘pixels’, as in Homage to Rothko (The Dalí Museum). By chance, I attended an exhibition here in Madrid where I discovered ‘Study for Index: Map of the World‘, by Art & Language (MACBA). By the time I came back home, it was clear that I wanted to display color evolution over time using a mosaic.

Q: Did you have any expectation about what the final product would look like? Did the end result surprise you?

A: I didn’t have any preconceived notion. I liked to see how groups of pieces appeared.

Q: What were the challenges of working with the dataset? What were the holes, problems? How could we make it better/easier to work with?

A: Being used to work with data made really easy for me to work with the collection’s dataset, so thanks for releasing it! The only complain I might have is having to parse some fields, like medium, to be able to store the information in a more comfortable format to be queried.

Q: What would you like to do next?

A: I have a network of people and objects in mind, in order to display who has the biggest ‘influence’ in the collection.

Q: If other museums made their data available like this, what might you do with it?

A: I’d like to work on a history of the object project. If we were able to access all the dates and places importants in the object history, we could try to cross all the objects info and maybe, it’s never known, find new hubs where pieces happened to be at the same time and why they were there. Another interesting project would be to find gender inequality among collections, not only when looking at artists/designers, but also with donors and funders and even among representations (iconography). Have this roles changed over the years? Are different depending on countries?

Dataclimber's color poster.

Dataclimber’s color poster.

Welcome to object phone. Your call has been placed in a queue.

I made another small thing. Again, another way for me to experiment with the Collection API, and again, another way to experiment with new ways of accessing the collection. This time, there aren’t many screen shots to display–there is no website to look at. This time, it’s “Welcome to object phone!”

(718) 213-4915

Object Phone” is ( presently ) a very, very simple implementation of a way to explore our collection by dialing a telephone, or sending a text message. I had been thinking of a few of the more popular museum oriented audio tour products, and how they all seem to be very CMS style in their design, and wondering if we could just use our own API.

For example, TourML and TAP ( which offer the web programmer a very powerful framework for programming a mobile guide using the Drupal CMS ) are very nice, but they are still very dependent on content production. The developer or content manager has to build and curate all of the content for the “tour.” This might be a good way to go about things, especially if you are leaning on an existing Drupal installation for a good deal of your content, but I was looking for a way to access existing data, and specifically the data in our collection website.

In the beginning of developing our collection website, we went through the process of assigning EVERYTHING a unique “bigint” in the form of what we are referring to as an “artisinal integer.” This means that each object record, each person record and each, well, everything else has a unique integer which no other thing can have. This is not in place of accession numbers–we will probably always have accession numbers The nice thing about unique integers is that they’re really easy to deal with on a programmatic level.

For example, if you text 18704235 to 718-213-4915 you should get a response that looks like the screenshot below. In fact you can text any object id number from our collection and get a similar response.

2013-04-18 10.15.18

You can also dial that same number and use your keypad to either search the collection by object ID, or ask for a random object. The application will respond to you using a text to speech converter, which is usually pretty good.

Presently, the app is not replying with a whole lot of information. You essentially get the object’s title and medium field if it has one. In many cases, asking for a random object may just result in something like “Drawing.” Many of our object records don’t have much more useful information than this, and also, I am trying to wrangle with the idea of how much information is useful in a voice and text message ( with a 160 character limit per SMS).

The whole system is leveraging the Twilio service and API. Twilio offers quite a range of possibilities, and I am very excited to experiment with more. For example, instead of text to speech, Twilio can play back .wav files. Additionally, Twilio can do things like dial another phone number, forward calls and record the caller’s voice. There are so many possibilities here that I wont even begin to list them, but for example, I could easily see us using this to capture user feedback in our galleries by phone and text.

I’m very interested in figuring out a way to search by voice. I’m sort of dreaming of programming the thing to go “Why don’t you just tell me the object number!” as in this great episode of Seinfeld which you can watch by clicking the image below.

Screen Shot 2013-04-18 at 10.35.01 AMIf you are interested, I have also made the code public on this Gist. It’s pretty messy and redundant right now, but you’ll get the idea.

One of the more complicated aspects of this project will be designing the phone interface so it makes sense. Currently, once you hear an object play back, the system just hangs up on you. It would be nice to offer the user a better way to manipulate the system which is still pleasant and easy to understand. By that same token, there is a completely different approach that is needed for the SMS end of things as you don’t really have a menu tree, but instead of list of possible commands the user need to learn. Fortunately, there is a ton of great work that has already been accomplished in this arena, specifically by the Walker Art Center’s very long running and very yellow website Art on Call.

Source code at github.com/cooperhewitt/objectphone

“cmd-P”

I made us a print stylesheet for object pages on the collections website. (What does that mean? It means you can print out the webpage and it will look nice).

Printout of Object #18621871 before stylesheet

Printout of Object #18621871.. before stylesheet.

Printout of Object #18621871 after stylesheet. Much better.

Printout of Object #18621871 after stylesheet. Much better. Office carpet courtesy of Tandus flooring.

This should be very useful for us in-house, especially curators and education.. and anyone doing exhibition planning.. (which right now is many of us).

It’s not very fancy or anything. Basically I just stripped away all the extraneous information and got right to the essential details, kind of like designing for mobile.

six printouts on standard paper from the collections website, taped in two rows to an iMac screen.

cascading style sheet is cascading.

In a moment of caffeinated Friday goofiness, Aaron printed out a bunch of weird objects he found (e.g. iPad described for aliens as “rectangular tablet computer with rounded corners”) and Scotch taped them all over Seb’s computer screen as a nice decorative touch for his return the next morning.

What we realized in looking at all the printouts, though, is that the simplified view of a collection record resembles a gallery wall label. And we’re currently knee-deep in the wall label discussion here at the Museum as we re-design the galleries (what does it need? what doesn’t it need? what can it do? how can it delight? how can it inform?).

I don’t yet have any conclusions to draw from that observation.. other than it’s a good frame to talk about our content and its presentation.

..to be continued!