Author Archives: asc

Exporting your visits

Screen Shot 2015-05-12 at 7.03.10 PM

Starting today you can export the items you have collected or created during your visits to the museum. When you export a visit we will bundle up all the objects you’ve collected and all the items you’ve created in to a static website that is then compressed and made available for you to download directly.

A static website means that you can view all of your visit items in any old web browser, even when it’s not connected to the Internet. It means that if you have your own website you can copy your visit export over it and host it and share it and, well… do whatever you want with it.

Where “whatever you want” means “so long as you comply” with the Smithsonian Terms of Use or assert your rights under Fair Use if you are based in the US.

We think that this is of particular importance to educators who may not have unfiltered or functional internet connections in their classrooms.

Screen Shot 2015-05-13 at 2.44.53 PM

A visit export doesn’t have all the same bells and whistles that your visit on the Cooper Hewitt collections website does but everything you need to view an export (except a web browser obviously) is contained in the file you download. There is a landing page, and a paginated view of everything you’ve done and a page for every object collected and each one of your creations.

Visit exports also come with a friendly and detailed JSON file for every item you’ve collected or created. If you don’t know what that last sentence means, don’t worry about it. It just means that everything you’ve done during a visit also has a file containing structured metadata about that activity which your developer friends may get excited about.

Screen Shot 2015-05-12 at 7.03.29 PM

Visit exports use are very own js-cooperhewitt-images library to manage square-cropped thumbnails that reveal the complete thumbnail when you mouse over them, just like on the collections website.

Screen Shot 2015-05-12 at 5.34.16 PM

Images for loan objects are not included with your visit download. That’s because they’re loan objects and we only have permission to host those images from our own collections website. Instead of including the images locally in your visit download every time there is a loan object we link directly to the image hosted on our own website.

If you’re not online (or your web browser hasn’t already cached a copy of the image on your hard drive) then your visit pages are smart enough to load a placeholder image for that object. Like this:

Screen Shot 2015-05-13 at 2.43.08 PM

We do the same for individual item pages too:

Screen Shot 2015-05-13 at 2.44.08 PM

online
Screen Shot 2015-05-13 at 2.43.41 PM

offline

Visit exports are deliberately minimal, by design. They contain a small amount of HTML markup that’s been enhanced with a little bit of JavaScript and CSS to create a minimally elegant export that people can easily tailor to their own needs. Some people may quibble with the idea that including both the jQuery and Bootstrap libraries is not really a “little bit of JavaScript and CSS” but we hope that we have done things in such a way that it’s easy for people to change if they choose to.

Visit exports are currently only available for visits that have been “paired” with your Cooper Hewitt account. A visit that has been exported is cached on our servers but it can be regenerated when something about your visit changes – you delete an item, or add a note and so on – not more than once per day. Each one of your visits (remember: each one of your paired visits) has a handy export button at the bottom of each page and you can see a list of all your exported/exportable visits by going to: https://collection.cooperhewitt.org/you/visits/exports/

Screen Shot 2015-05-13 at 11.05.26 AM

The exports themselves are generated using our own API and the recently released cooperhewitt.visit and cooperhewitt.visit.items family of methods. There is a bunch of bespoke code that we’ve written to manage how exports are scheduled and stored but the part that actually builds your export is a plain-vanilla API application using the same public API methods that you might use to generate your own visit export.

Screen Shot 2015-05-19 at 2.19.04 PM

In time we may open source the API application we’ve written but for now we’re going to keep putting it through its paces to make sure that it works consistently, as expected, and to force ourselves to use the same tools we’re making available to people outside the “hula hoop“.

Screen Shot 2015-05-12 at 5.53.41 PM

Finally, a little bit of administrivia: Your visit exports are made available under the Smithsonian Terms of Use agreement. You can read the entire document but the short (and relevant) bits are:

The Smithsonian Institution (the “Smithsonian”) provides the content on this website (www.si.edu), other Smithsonian websites, and third- party sites on which it maintains a presence (“SI Websites”) in support of its mission for the “increase and diffusion of knowledge.” The Smithsonian invites you to use its online content for personal, educational and other non-commercial purposes; this means that you are welcome you to make fair use of the Content as defined by copyright law. Information on United States copyright fair use law is available from the United States Copyright Office. Please note that you are responsible for determining whether your use is fair and for responding to any claims that may arise from your use.

In addition, the Smithsonian allows personal, educational, and other non-commercial uses of the Content on the following terms:

You must cite the author and source of the Content as you would material from any printed work.

You must also cite and link to, when possible, the SI Website as the source of the Content.

You may not remove any copyright, trademark, or other proprietary notices including attribution information, credits, and notices, that are placed in or near the text, images, or data.

In addition to copyright, you must comply with all other terms or restrictions (such as trademark, publicity and privacy rights, or contractual restrictions) as may be specified in the metadata or as may otherwise apply to the Content. Please note that you are responsible for making sure that your use does not violate or infringe upon the rights of anyone else.

Screen Shot 2015-05-18 at 3.59.53 PM

Enjoy!

Publishing is as publishing does – revealing ‘books’ in the collection

Screen Shot 2015-04-23 at 10.48.30 AM

Note: This book is actually 144 pages long and the count is a by-product of the way we’ve stitched things together. By the time you read this that problem may be fixed. So it goes, right?

We’ve added a new section to the Collections website: publications. You know, books.

This is the simplest dumbest thing we could think of to create a bridge between analog publications and the web. It’s only a handful of recent publications at the moment and whether or not older publications will be supported remains an open question, for now.

To be clear – there are already historical publications available for viewing on the main Cooper Hewitt website. As I was writing this blog post Micah reminded me that we’ve even uploaded them in to the Internet Archive so you can use their handy book reader to view the books online. All of which means that we’ll likely be importing those publications to the collections website soon enough.

All of this (newer) work is predicated on the fact that we have the luxury, with these specific publications, of operating outside the “work” versus “edition” dilemma that many other kinds of books have to negotiate. All we’ve done is created stable permanent URLs for each book and each page in that book. That’s it.

Screen Shot 2015-04-23 at 11.28.22 AM

The goal is not to reproduce the book online, for all the usual reasons, but to give meaningful atomic units of a book – pages – a presence on the Interwebs and a scaffolding for future stuff (object lists, additional photographs, notes and other ancillary materials and so on) as time and circumstance permit.

Related, Emily Fildes’ and Allison Foster’s Museums and the Web (2015) paper
What the Fonds?! The ups and downs of digitising Tate’s Archive
is a good discussion around the issues, both technical and user-facing, that are raised as various sources of disparate data (artworks, library and archive data, curatorial files) all start to share the same conceptual space on the web.

Screen Shot 2015-04-23 at 10.47.28 AM

We’re not there yet and it may take us a while to get there so in the meantime every page URL has a small half-toned reproduction of the book page in question. That’s meant to give people a visual cue and confidence in the URL itself — specifically they look the same — such that you might bookmark it, share it with a friend, or whatever awesome use you dream up without having to wonder whether the ground will shift out from underneath it.

Kind of like books, right?

Finally, all the links indicating how many pages a particular book has are “magic” – click on them and you’ll be redirected to a random page inside that book.

Enjoy!

Screen Shot 2015-04-23 at 12.29.48 PM
Screen Shot 2015-04-23 at 12.30.14 PM

We choose Bao Bao!

So, the Pen went live on March 10. We’re handing them out to every visitor and people are collecting objects all over the place. Yay!

The Pen not only represents a whole world of brand-new for the museum but an equally enormous world of change for staff and the ways they do their jobs. One of the places this has manifested itself is the sort of awkward reality of being able to collect an object in the galleries only to discover that the image for that object or, sometimes, the object itself still hasn’t been marked as public in the collections database.

It’s unfortunate but we’ll sort it all out over time. The more important question right now is how we handle objects that people have collected in the galleries (that are demonstrably public) but whose ground truth hasn’t bubbled back up to our own canonical source of truth.

In the early days when we were building and testing the API methods for recording the objects that people collected the site would return a freak-out-and-die error the moment it encountered something that a visitor didn’t have permissions to see. This is a pretty normal approach in software and systems development but it made testing over the overall system complicated and time-consuming.

In the interest of expediency we replaced the code that threw a temper tantrum with code that effectively said la la la la la… I can’t hear you! If a visitor tried to collect something that they didn’t have permissions to see we would simply drop it on the floor and pretend it never happened. This was useful in fleshing out the rest of the overall workflow of the system but we also understood that it was temporary at best.

Screen Shot 2015-03-20 at 12.07.07 PM

Allowing a user to collect something in the gallery and then denying any evidence of the event on their visit webpage would be… not good. So now we record the item being collected but we also record a status flag next to that event assuming that the disconnect between reality and the database will work itself out in favour of the visitor.

It also means that the act of collecting an object still has a permalink; something that a visitor can share or just hold on to for future reference even if the record itself is incomplete. And that record exists in the context of the visit itself. If you can see the other objects that you collected around the same time as a not-quite-public-yet object then they can act as a device to remember what that mystery thing is.

Which raises an important question: What should we use as a placeholder? Until a couple of days ago this is what we showed visitors.

streetview-cat-words

Although the “Google Street View Cat” has a rich pedigree of internet meme-iness it remains something of an acquired taste. This was a case of early debugging and blowing-off-steam code leaking in to production. It was also the result of a bug ticket that I filed for Sam on January 21 being far enough down to the list of things to do before and immediately after the launch of the Pen that it didn’t get resolved until this week. The ticket was simply titled “Animated pandas”.

As in, this:

This is the same thread that we’ve been pulling on ever since we started rebuilding the collections website: When we are unable to show something to a visitor (for whatever reason) what do we replace the silence with?

We choose Bao Bao!

API methods (new and old) to reflect reality

design-eagle-cloud

A quick end-of-week blog post to mention that now that the museum has re-opened we have updated the cooperhewitt.galleries.openingHours and cooperhewitt.galleries.isOpen API methods to reflect… well, reality.

In addition to the cooperhewitt.galleries API methods we’ve also published corresponding openingHours and isOpen methods for the cafe!

For example, cooperhewitt.galleries.isOpen.

curl 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.galleries.isOpen&access_token=***'

{
	"open": 0,
	"holiday": 0,
	"hours": {
		"open": "10:00",
		"close": "18:00"
	},
	"time": "18:01",
	"timezone": "America/New_York",
	"stat": "ok"
}

Or, cooperhewitt.cafe.openingHours.

curl -X GET 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.cafe.openingHours&access_token=***'

{
	"hours": {
		"Sunday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Monday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Tuesday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Wednesday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Thursday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Friday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Saturday": {
			"open": "07:30",
			"close": "21:00"
		}
	},
	"timezone": "America/New_York",
	"stat": "ok"
}

Because coffee, right?

HTTP ponies

Most of the image processing for the collections website is done using the Python programming language. This includes things like: extracting colours or calculating an image’s entropy (its “busy-ness”) or generating those small halftone versions of image that you might see while you wait for a larger image to load.

Soon we hope to start doing some more sophisticated computer vision related work which will almost certainly mean using the OpenCV tool chain. This likely means that we’ll continue to use Python because it has easy to use and easy to install bindings to hide most of the fiddly bits required to look at images with “robot eyes”.

shirt

The collections website itself is not written in Python and that’s okay. There are lots of ways for different languages to hold hands inside of a single “application” and we’ve used many of them. But we also think that most of these little pieces of functionality are useful in and of themselves and shouldn’t require that a person (including us) have to burden themselves with the minutiae of the collections website infrastructure to use them.

We’ve slowly been taking the various bits of code we’ve written over the years and putting them in to discrete libraries that can then be wrapped up in little standalone HTTP “pony” or “plumbing” servers. This idea of exposing bespoke pieces of functionality via a web server is hardly new. Dave Winer has been talking about “fractional horsepower HTTP servers” since 1997. What’s changed between then and now is that it’s more fun to say “HTTP pony” and it’s much easier to bake a little web server in to an application because HTTP has become the lingua franca of the internet and that means almost every programming language in use today knows how to “speak” it.

In practice we end up with a “stack” of individual pieces that looks something like this:

  1. Other people’s code that usually does all the heavy-lifting. An example of this might be Giv Parvaneh’s RoyGBiv library for extracting colours from images or Mike Migurski’s Atkinson library for dithering images.
  2. A variety of cooperhewitt.* libraries to hide the details of other people’s code.
  3. The cooperhewitt.flask.http_pony library which exports a setup of helper utilities for the running Flask-based HTTP servers. Things like: doing a minimum amount of sanity checking for uploads and filenames or handling (common) server configuration details.
  4. A variety of plumbing-SOMETHING-server HTTP servers which export functionality via HTTP GET and POST requests. For example: plumbing-atkinson-server, plumbing-palette-server and so on.
  5. Flask, a self-described “micro-framework” which is what handles all the details of the HTTP call and response life cycle.
  6. Optionally, a WSGI-compiliant server-container-thing-y for managing requests to a number Flask instances. Personally we like gunicorn but there are many to choose from.

Here is a not-really-but-treat-it-like-pseudo-code-anyway example without any error handling for the sake of brevity of a so-called “plumbing” server:

# Let's pretend this file is called 'example-server.py'.

import flask
from flask_cors import cross_origin
import cooperhewitt.example.code as code
import cooperhewitt.flask.http_pony as http_pony

app = http_pony.setup_flask_app('EXAMPLE_SERVER')

@app.route('/', methods=['GET', 'POST'])
@cross_origin(methods=['GET', 'POST'])
def do_something():

    if flask.request.method=='POST':
       path = http_pony.get_upload_path(app)
    else:
       path = http_pony.get_local_path(app)

    rsp = code.do_something(path)
    return flask.jsonify(rsp)

if __name__ == '__main__':
    http_pony.run_from_cli(app)

So then if we just wanted to let Flask take care of handling HTTP requests we would start the server like this:

$> python example-server.py -c example-server.cfg

And then we might talk to it like this:

$> curl -X POST -F 'file=@/path/to/file' http://localhost:5000

Or from the programming language of our choosing:

function example_do_something($path){
        $url = "http://localhost:5000";
        $file = curl_file_create($path);
        $body = array('file' => $file);
        $rsp = http_post($url, $body);
        return $rsp;
}

Notice the way that all the requests are being sent to localhost? We don’t expose any of these servers to the public internet or even between different machines on the same network. But we like having the flexibility to do that if necessary.

Finally if we just need to do something natively or want to write a simple command-line tool we can ignore all the HTTP stuff and do this:

$> python
>>> import cooperhewitt.example.code as code
>>> code.do_something("/path/to/file")

Which is a nice separation of concerns. It doesn’t mean that programs write themselves but they probably shouldn’t anyway.

If you think about things in terms of bricks and mortar you start to notice that there is a bad habit in (software) engineering culture of trying to standardize the latter or to treat it as if, with enough care and forethought, it might achieve sentience.

That’s a thing we try to guard against. Bricks, in all their uniformity, are awesome but the point of a brick is to enable a multiplicity of outcomes so we prefer to leave those details, and the work they entail, to people rather than software libraries.

face-tower

Most of this code has been open-sourced and hiding in plain sight for a while now but since we’re writing a blog post about it all, here is a list of related tools and libraries. These all fall into categories 2, 3 or 4 in the list above.

  • cooperhewitt.flask — Utility functions for writing Flask-based HTTP applications. The most important thing to remember about things in this class is that they are utility functions. They simply wrap some of the boilerplate tasks required to set up a Flask application but you will still need to take care of all the details.

Everything has a standard Python setup.py for installing all the required bits (and more importantly dependencies) in all the right places. Hopefully this will make it easier for us break out little bits of awesomeness as free agents and share them with the world. The proof, as always, will be in the doing.

face-mirror

We’ve also released go-ucd which is a set of libraries and tools written in Go for working with Unicode data. Or more specifically, for the time being since they are not general purpose Unicode tools, looking up the corresponding ASCII name for a Unicode character.

For example:

$> ucd 䍕
NET; WEB; NETWORK, NET FOR CATCHING RABBIT

Or:

$> ucd THIS → WAY
LATIN CAPITAL LETTER T
LATIN CAPITAL LETTER H
LATIN CAPITAL LETTER I
LATIN CAPITAL LETTER S
SPACE
RIGHTWARDS ARROW
SPACE
LATIN CAPITAL LETTER W
LATIN CAPITAL LETTER A
LATIN CAPITAL LETTER Y

There is, of course, a handy “pony” server (called ucd-server) for asking these questions over HTTP:

$> curl -X GET -s 'http://localhost:8080/?text=♕%20HAT' | python -mjson.tool
{
    "Chars": [
        {
            "Char": "u2655",
            "Hex": "2655",
            "Name": "WHITE CHESS QUEEN"
        },
        {
            "Char": " ",
            "Hex": "0020",
            "Name": "SPACE"
        },
        {
            "Char": "H",
            "Hex": "0048",
            "Name": "LATIN CAPITAL LETTER H"
        },
        {
            "Char": "A",
            "Hex": "0041",
            "Name": "LATIN CAPITAL LETTER A"
        },
        {
            "Char": "T",
            "Hex": "0054",
            "Name": "LATIN CAPITAL LETTER T"
        }
    ]
}

This one, potentially, has a very real and practical use-case but it’s not something we’re quite ready to talk about yet. In the meantime, it’s a fun and hopefully useful tool so we thought we’d share it with you.

Note: There are equivalent libraries and an HTTP pony for ucd written in Python but they are incomplete compared to the Go version and may eventually be deprecated altogether.

Comments, suggestions and gentle clue-bats are welcome and encouraged. Enjoy!

face-stand

A colophon for bias

The term [colophon] derives from tablet inscriptions appended by a scribe to the end of a … text such as a chapter, book, manuscript, or record. In the ancient Near East, scribes typically recorded information on clay tablets. The colophon usually contained facts relative to the text such as associated person(s) (e.g., the scribe, owner, or commissioner of the tablet), literary contents (e.g., a title, “catch” phrase, number of lines), and occasion or purpose of writing.

Wikipedia

A couple of months ago we added the ability to search the collections website by color using more than one palette. A brief refresher: Our search by color functionality works by first extracting the dominant palette for an index. That means the top 5 colors out of a possible 32 million choices. 32 million is too large a surface area to search against so each of the five results are then “snapped” to their closest match on a much smaller grid of possible colors. These matches are then indexed and used to query our database when someone searches for objects matching a specific color.

It turns out that the CSS3 color palette which defines a fixed set of 138 colors is an excellent choice for doing this sort of thing. CSS is the acronym for Cascading Style Sheets (CSS) which is a “language used to describe the presentation” of a webpage separate from its content. Instead of asking people searching the collections website to be hyper-specific in their queries we take the color they are searching for and look for the nearest match in the CSS palette.

For example: #ef0403 becomes #ff0000 or “red”. #f2e463 becomes #f0e68c or “khaki” and so on.

This approach allows us to not only return matches for a specific color but also to show objects that are more like a color than not. It’s a nice way to demonstrate the breadth of the collection and also an invitation to pair objects that might never be seen together.

search-is-over.020-640

From the beginning we’ve always planned to support multiple color palettes. Since the initial search-by-color functionality was built in a hurry with a focus on seeing whether we could get it to work at all adding support for multiple palettes was always going to require some re-jiggering of the original code. Which of course means that finding the time to make those changes had to compete with the crush of everything else and on most days it got left behind.

Earlier this year Rebecca Alison Meyer the 6-year old daughter of Eric Meyer, a long-standing member of the CSS community, died of cancer. Eric’s contributions and work to promote the CSS standard can not be overstated. The web would be an entirely other (an entirely poorer) space without his efforts and so some people suggested that a 139th color be added to the CSS Color module to recognize his work and honor his daughter. In June Dominique Hazaël-Massieux wrote:

I’m not sure about how one goes adding names to CSS colors, and what the specific purpose they fulfill, but I think it would be a good recognition of @meyerweb ‘s impact on CSS, and a way to recognize that standardization is first and foremost a social process, to name #663399 color “Becca Purple”.

In reply Eric Meyer wrote:

I have been made aware of the proposal to add the named color beccapurple (equivalent to #663399) to the CSS specification, and also of the debate that surrounds it.

I understand the arguments both for and against the proposal, but obviously I am too close to both the subject and the situation to be able to judge for myself. Accordingly, I let the editors of the Colors specification know that I will accept whatever the Working Group decides on this issue, pro or con. The WG is debating the matter now.

I did set one condition: that if the proposal is accepted, the official name be rebeccapurple. A couple of weeks before she died, Rebecca informed us that she was about to be a big girl of six years old, and Becca was a baby name. Once she turned six, she wanted everyone (not just me) to call her Rebecca, not Becca.

She made it to six. For almost twelve hours, she was six. So Rebecca it is and must be.

Shortly after that #663399 or rebeccapurple was added to the CSS4 Colors module specification. At which point it only seemed right to finally add support for multiple color palettes to the collections website.

20140818-rebeccapurple-sm

Over the course of a month or so, in the margins of day, all of the search-by-color code was rewritten to work with more than a single palette and now you can search the collection for objects in the shade of rebeccapurple.

In addition to the CSS3 and CSS4 color palettes we also added support for the Crayola color palette. For example, the closest color to “rebeccapurple” in the Crayola scheme of things is “cyber grape”.

You can see all the possible nearest-colors for an object by appending /colors to an object page URL. For example:

https://collection.cooperhewitt.org/objects/18380795/colors

The dominant color for this object is #683e7e which maps to #58427c or “cyber grape” in Crayola-speak and #483d8b or “dark slate blue” in CSS3-speak and #663399 or “rebeccapurple” in CSS4-speak.

Now that we’ve done the work to support multiple palettes the only limits to adding more is time and imagination. I would like to add a greyscale palette. I would like to add one or more color-blind palettes. I would especially like to add a “blue” palette – one that spans non-photo blue through International Klein Blue all the way to Kind of Bloop midnight blue just to see where along that spectrum objects which aren’t even a little bit blue would fall.

Screen Shot 2014-10-26 at 12.42.02 PM

The point being that there are any number of color palettes that we can devise and use as a lens through which to see our collection. Part of the reason we chose to include the Crayola color palette in version “2” of search-by-color is because the colors they’ve chosen have been given expressive names whose meaning is richer than the sum of their descriptive parts. What does it mean for an object’s colors to be described as macaroni and cheese-ish or outer space-ish in nature? Erika Hall’s 2007 talk Copy is Interface is an excellent discussion of this idea.

I spoke about some of these things last month at the The Search is Over workshop, in London. I described the work we have done on the collections website, to date, as a kind of managing of absence. Specifically the absence of metadata and ways to compensate for its lack or incompleteness while still providing a meaningful catalog and resource.

It is through this work that we started to articulate the idea that: The value of the whole in aggregate, for all its flaws, outweighs the value of a perfect subset. The irregular nature of our collection metadata has also forced us to consider that even if there were a single unified interface to convey the complexities of our collection it is not a luxury we will enjoy any time soon.

search-is-over.023-640

Further the efforts of more and more institutions (the Cooper Hewitt included) to embark on mass-digitization projects forces an issue that we, as a sector, have been able to side-step until now: That no one, including lots of people who actually work at museums, have ever seen much of the work in our collections. So in relatively short-order we will transition from a space defined by an absence of data to one defined by a surfeit of, at the very lest, photographic evidence that no one will know how to navigate.

To be clear: This is a good problem to have but it does mean that we will need to starting thinking about models to recognize the shape of the proverbial elephant in the room and building tools to see it.

It is in those tools that another equally important challenge lies. The scale and the volume of the mass-digitization projects being undertaken means that out of necessity any kind of first-pass cataloging of that data will be done by machines. There simply isn’t the time (read: money) to allow things to be cataloged by human hands and so we will inevitably defer to the opinion of computer algorithms.

This is not necessary as dour a prediction as it might sound. Color search is an example of this scenario and so far it’s worked out pretty well for us. What search-by-color and other algorithmic cataloging points to is the need to develop an iconography, or a colophon, to indicate machine bias. To design and create language and conventions that convey the properties of the “extruder” that a dataset has been shaped by.

search-is-over.033-640

Those conventions don’t really exist yet. Bracketing search by color with an identifiable palette (a bias) is one stab at the problem but there are so many more places where we will need to signal the meaning (the subtext?) of an automated decision. We’ve tried to address one facet of this problem with the different graphic elements we use to indiciate the reasons why an object may not have an image.

missing-nnot-available-n

no-photography-n

Left to right: We’re supposed to have a picture for this object… but we can’t find it; This object has not been photographed; This object has been photographed but for some reason we’re not allowed to show it to you… you know, even though it’s been acquired by the Smithsonian.

Another obvious and (maybe?) easy place to try out this idea is search itself. Search engines are not, in fact, magic. Most search engines work the same way: A given string is “tokenized” and then each resultant piece is “filtered”. For the example the phrase “checkered Girard samples” might typically be tokenized by splitting things on whitespace but you could just as easily tokenize it by any pattern that can be expressed to a computer. So depending on your tokenized you might end up with a list like:

  • checkered
  • Girard
  • samples

Or:

  • checkered Girard
  • samples

Each one of those “tokens” are then analyzed and filtered according to their properties. Maybe they get grouped by their phonetics, which is essentially how the snap-to-grid trick works for the collection’s color search. Maybe they are grouped by what type of word they are: proper nouns, verbs, prepositions and so on. I’ve never actually seen a search engine that does this but there is nothing technically to prevent someone from doing it either.

The simplest and dumbest thing would be to indicate on a search results page that your query results were generated using one or more tokenizers or filters. In our case that would be (1) tokenizer and (5) filters.

Tokenizers:

    1. Unicode Standard Annex #29

Filters:

      1. Remove English possessives
      2. Lowercase all tokens
      3. Ingore a set list of stopwords
      4. Stem tokens according to the Porter Stemming Algorithm
      5. Convert non-ascii characters to ascii

That’s not very sexy or ooh-shiny but not everything needs to be. What it does, though, is provide a measure of transparency for people to gauge the reality that any result set is the product of choices which may have little or no relationship to the question being asked or the person asking that question.

These are devices, for sure, and they are not meant to replace a more considered understanding or contemplation of a topic but they can act as an important shorthand to indicate the arc of an answer’s motive.

search-is-over.038-640

And that’s just for search engines. Now imagine what happens when we all start pointing computer vision algorithms at our collections…


Update: Since publishing this blog post the nice people working on the GOV.UK websites launched “info” pages. Visitors can now append /info to any of the pages on the gov.uk website will and see what and who and how that part of the website is supposed to do. Writing about the project they say:

An ‘info’ page contains the user needs the page is intended to meet … Providing an easy way to jump from content to the underpinning needs allows content designers coming to a new topic to understand the need and build empathy with the users quicker. Publishing the GOV.UK user needs should also make the team’s work more transparent and traceable.

Bravo!

The Medium is the Message (and pubsocketd)

Screen Shot 2014-08-02 at 1.31.57 PM

Have you ever wanted to see a real-time view of all the objects that people are looking at on the collections website? Now you can!

At least for objects with images. There are lots of opportunities to think about interesting ways to display objects without images but since everything that follows has been a weekends-and-mornings project we’ve opted to start with the “simple” thing first.

We have lots of different ways of describing media: 12, 865 ways at last count to be precise. The medium with the most objects (2, 963) associated with it is cotton but all of these numbers are essentially misleading. The history of the cataloging of the collection has preferenced precision and detail over the kind of rough bucketing (for example, tags) that lots of people are used to these days.

It’s a practice that can sometimes seem frustrating in the moment but, in the long-run, we’re better served for it. In time we will get around to assigning high-level categorizations for equally high-level browsing but it’s worth remembering that the practice of describing objects in minute detail predates things like databases, which we take for granted today. In fact these classifications, and their associated conventions and rituals, were the de-facto databases before computers or databases had even been invented.

But 13,000 different media, most of which only describe a single object, can be overwhelming. Where do you start? How do you know what to look for? Given the breadth of our collection what don’t we have? And given the level of detail we try to assign to objects how to do you whether a search doesn’t yield any results because it’s not in our collection or simply because we’re using a different name for the same thing you’re looking for?

This is a genuinely Big and Hairy Problem and we have not solved it yet. But the ability to relay objects as they are viewed by the public, in real-time, offers an interesting opportunity: What if we just displayed (and where possible, read aloud) the medium for that object?

Screen Shot 2014-08-02 at 1.31.29 PM

That’s all The Medium is the Message does: It is an ambient display that let you keep an eye on the kinds of things that are in our collection and offered a gentle, polite way to start to see the shape of all the different things that tell the story of the museum. It’s not a tool to help you take a quiz so much as a way to absorb an awareness of the collection as if by osmosis. To show people an aspect of the collection as an avenue to begin understanding its entirety.

We’re not thinking enough about sound. If we want all these things to communicate with us, and we don’t want to be starting at screens and they’re going to do more than flash a couple of lights, then we need to work with sound. Either ‘sound effects’ that mean something or devices that talk to us. Personally, I think it’ll be the latter morphing into the former. And this is worth thinking about because it’s already creeping up on us. Self-serve checkouts are talking at us, reversing trucks are beeping at us, trucks turning left are barking at us, incoherently – all with much less apparent thought and ‘design’ than we devote to screens.

— Russell Davies,  the internet of talking

While The Medium is the Message is a full-screen application that displays a scaled-up version of the square-crop thumbnail for an object it also tries to use your browser’s text-to-speech capabilities to read aloud that object’s medium. It may not be the kind of thing you want playing in a room full of people but alone in your room, or under a pair of headphones, it’s fun to imagine it as a kind of Music for Airports for cultural heritage.

Screen Shot 2014-08-02 at 1.32.27 PM

Text-to-speech is currently best supported in Chrome and Safari. Conversely the best support for crisp and pixelated image-rendering is in Firefox. Because… computers, right?

For the time being The Medium is the Message lives in a little sand-box all by itself over here:

http://medium.collection.cooperhewitt.org

Eventually we hope to merge it back in to the main collections website but since it’s all brand-new we’re going to put it some place where it can, if necessary, have little melt-downs and temper-tantrums without adversely affecting the rest of the collections website. It’s also worth noting that some internal networks – like at a big company or organization – might still disallow WebSockets traffic which is what we’re using for this. If that’s the case try waiting until you’re home.

Screen Shot 2014-08-04 at 3.16.31 PM
And now, for the Nerdy Bits: The rest of this blog post is captial-T technical so you can stop reading now if that’s not your thing (though we think it’s stil pretty interesting even if the details sound like gibberish).

The Medium is the Message is part of a larger project to investigate a few different tools in order to understand how they might fit together and to what effect. They are:

  • Redis and in particular its implementation of Publish/Subscribe messaging paradigm – Every once in a while there’s a piece of software that is released which feels like genuine magic. Arguably one of the last examples of this was memcached originally written by Brad Fitzpatrick, for the website LiveJournal and without which entire slices of the web as we now know it wouldn’t exist. Both Redis and memcached are similar in spirit in that their feature-set is limited by design but what they claim to do Just Works™ and both have broad support across the landscape of programming languages. That last piece is incredibly important since it means we can use Redis to bridge applications written in whatever language suits the problem best. We’ll return to that idea in the discussion of “step 0″ below.
  • Websockets – WebSockets are a way for a web browser and a server to create and maintain a persistent connection and to shuttle messages back and forth. Normally the chatter between a browser and a server happens akin to the way two people might send each other postcards in the mail and WebSockets are more like a pair of teenagers calling each on the phone and talking for hours and hours and hours. Sort of like Pub/Sub for a web browser, right? WebSockets have been around for a few years now but they are still a bit of a new territory; super-cool but not without some pitfalls.
  • Go – Go is a programming language from the nice people at Google, that recently celebrated its fourth anniversary. It is part of growing trend in language design to find a middle ground between loosely typed languages, and the need to develop stable applications with a minimum of fussiness. Go is probably not the language we would develop a complex user-facing application in but for long-running services with well-defined boundaries it seems kind of perfect. (Go’s notion of code-based channels are a fascinating parallel to both Pub/Sub and WebSockets but that’s a whole other blog post.)

Fun fact: The Labs’ very own Sam Brenner‘s ITP thesis project called Adventures of Teen Bloggers is an archive of old LiveJournal accounts in the shape of an 8-bit video game!

Adventures of Teen Bloggers

In order to test all of those technologies and how they might play together we built pubsocketd which is a simple daemon written in Go that subscribes to a Pub/Sub channel and ferries those messages to a browser using Websockets (WS).

  1. Listen for messages from a specific (Redis) Pub/Sub channel
  2. Accept incoming WS requests
  3. Shuttle any messages from the Pub/Sub channel to all the open WS connections

That’s it. It is left up to WS clients (your web browser) to figure out what to do with those messages.

$> ./pubsocketd -ws-origin=http://example.com
2014/08/01 17:23:38 [init] listening for websocket requests on 127.0.0.1:8080/, from http://example.com
2014/08/01 17:23:38 [init] listening for pubsub messages from 127.0.0.1:6379 sent to the pubsocketd channel
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][handshake] OK
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][request] OK
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][connect] OK
2014/08/01 17:23:53 [10.20.30.40][10.20.30.40:56401][send] OK
2014/08/01 17:24:05 [10.20.30.40][10.20.30.40:56401][send] OK
# and so on...

The “step 0″ in all of this is the ability for the collections website itself to connect to a Redis server and send a Pub/Sub message, whenever someone views an object, to the same channel that the pubsocketd server is listening to.

ws-liden

This allows for a nice clean separation of concerns and provides a simple way for related, but fundamentally discrete, applications to interact without getting up in each other’s business.

Given the scope of the project we probably could have accomplished the same thing, with less scaffolding, using Server-Sent Events (SSE) but this was as much an exercise designed to get our feet wet with both WebSockets and Go so it’s been worth doing it the “hard way”.

Matthew Rothenberg, creator of the popular EmojiTracker, was nice enough to open-source the Go-based SSE server endpoint he wrote to feed his application and we may eventually re-write The Medium is the Message, or future applications like it, to use that.

Screen Shot 2014-08-04 at 4.11.45 PM

We’ve open-sourced the code for pubsocketd under a BSD license and we welcome suggestions, patches and (gentle) clue-bats:

https://github.com/cooperhewitt/go-pubsocketd

Enjoy!

Robot Rothko

20140707-robot-rothko-infobox

Now that I’ve written this blog post it occurs to me that it would be trivial to build something similar on top of the Cooper Hewitt Collections API — since that’s ultimately where all this colour stuff comes from — so I will probably do that shortly and stick in it the Play section.

That’s something I wrote last week on my personal weblog. I was writing about a little web “application” that I’d made to generate algorithmic “multiforms” that recall the work of the late painter Mark Rothko. The source of the colors used to create these robot-multiforms are derived from photo uploads and extracted using the same code that the Cooper Hewitt uses to generate color palettes for the objects in our collection. We wrote about that process last year.

These robot “paintings” are built by fetching three photos and using their primary color to fill one of three stacked rectangles that make up the canvas. A dominant color for a fourth photo is used along with an inset CSS3 box-shadow to give the illusion a fuzzy, hazy background on which the rectangles sit. Every 60 seconds a new version is generated and the colors (and boxes) gently transition from old to new.

20140705-robot-rothko-2004

In that original blog post, I also wrote:

That’s it. It doesn’t do anything else and that’s part of the charm for me. It just sits in the background running in second-screen-mode stamping out robot-Rothko paintings. … It’s nice to have a new screen friend to spend the days the days with

They’re not really Rothko paintings, obviously, and to suggest that they are would do the painter a disservice. Rothko’s paintings are not just any random set of colors stacked on top of one another. Rothko worked long and hard to choose the arrangement of his paintings and it’s easy to imagine that he would have been horrified by some of the combinations that Robot Rothko offers up. But like the experimental Albers Boxes feature they are a nod and gesture – and a wink – towards the real thing.

20140707-robot-rothko-fluid

Having gotten things working for a personal non-museum and not-really-for-strangers project I decided that it would be nice to do something similar for for the museum which is absolutely for everyone. So, today we are launching Robot Rothko which is exactly the same as the application described above except that it uses objects from our collection instead of photos as its source material. Like this:

https://collection.cooperhewitt.org/play/robot-rothko/#info

 

See the #info part of that URL? That will cause the application to load with an information box that explaining what you’re looking at (and that will close itself automatically after 30 seconds). If you just want to jump straight to the application all you have to do is remove the #info from the URL.

https://collection.cooperhewitt.org/play/robot-rothko/

Robot Rothko will automatically update itself using random object records to create a new multiform every 60 seconds. Mouse over any color to see the object it represents. Click on the text to see our collection record for the object itself.

20140707-robot-rothko-decade

You can also filter stuff by person, decade. You can also filter by the year we acquired an object if you can guess where it is; that one still feels a little buggy so we’re going to hold off publishing the URL until we can figure out what’s wrong. Here are some examples of the first two:

https://collection.cooperhewitt.org/play/robot-rothko/people/18046041

https://collection.cooperhewitt.org/play/robot-rothko/decade/1910

Robot Rothko is native to the web which means it will work in any modern web browser whether it’s on your desktop or your phone or your tablet. It can be put it to fullscreen mode (by pressing shift-F) and if you save the website’s URL to your homescreen on your phone, or tablet, it is configured to launch without any of the usual browser chrome. If you use a Mac you can plug the URL for Robot Rothko in to Todd Ditchendorf’s handy Fluid.app which will turn it all in to a shiny desktop application. I am guessing there are equivalent tools for Windows or Linux but I don’t know what they are.

20140707-robot-rothko-tablet

If you’d like to generate your own Robot Rothkos there’s an API method for doing just that:

https://collection.cooperhewitt.org/api/methods/cooperhewitt.play.robotRothko

And of course it works with our recently announced support for DSON as a response format:

curl -X GET 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.play.robotRothko&access_token=SEEKRET&person_id=18041501&format=dson'
such "rothko" is such "canvas" is so "49" and "28" and "23" many and "palette" is so such "colour" is "#b8ab5b" , "id" is "18805769" , "epitaph" is "Folding Fan, 1900u201305. Medium: silk, wood, horn, metal, metal spangles. Gift of Lillian C. Hart. 1985-89-1." wow ? such "colour" is "#c7c7c7" . "id" is "18640557" ! "epitaph" is "Drawing, "Two Studies for Rectangul", ca. 1965. Pen and black ink on white wove paper. Gift of Vladimir Kagan. 1992-56-7." wow , such "colour" is "#db8952" , "id" is "18133219" , "epitaph" is "Fragment, mid-18th century. Medium: silknTechnique: plain weave patterned by supplementary warp floats and complementary weft floats. Gift of John Pierpont Morgan. 1902-1-811." wow many ? "background" is such "colour" is "#c7a9af" . "id" is "18761047" ! "epitaph" is "Booklet Cover Sheet, 1916. Color woodcut on lavender wove paper paper. Museum purchase from Drawings and Prints Council Fund and through gift of Margery and Edgar Masinter and Merrill C. Berman. 1999-50-1-3." wow wow , "filters" is so many and "stat" is "ok" wow

Robot Rothko lives in a new section of the collections website called “Play“. The distinction between the Play section and the Experimental Features section of the website can probably be easiest thought of as: Experimental features are things that apply to the entirety of the collections website, while Play things are small contained applications that use the collections API and focus on or build off a particular aspect of the collection. The first of these was Sam Brenner’s SkyDesigner and Robot Rothko is actually the third such application.

20140707-wwms-boom

In between those two was What Would Micah Say? (WWMS) a quick end-of-day project to test out the W3C’s Text-to-Speech APIs that are starting to appear in some web browsers (read: Chrome and Safari as of this writing, and make sure you have the volume turned up). The WWMS “application” was mostly a simple 20-minute exercise to test whether fetching some content dynamically and feeding to the text-to-speech APIs actually works and produces something useable. It does, which is very exciting because it opens up any number of accessibility-related improvements we can starting thinking about adding to the collections website.

That we happened to use the cooperhewitt.labs.whatWouldMicahSay API method and then configured the text-to-speech API to read his words as if spoken by a “French” robot made it all a little bit silly and a little more fun but those are important considerations. Because sometimes playing at – or making interesting – a technical problem is the best way to work through whether it is even worth pursuing in the first place.

20140707-robot-rothko-girard-2

Label Whisperer

Screen Shot 2014-01-24 at 6.06.47 PM

Have you ever noticed the way people in museums always take pictures of object labels? On many levels it is the very definition of an exercise in futility. Despite all the good intentions I’m not sure how many people ever look at those photos again. They’re often blurry or shot on an angle and even when you can make out the information there aren’t a lot of avenues for that data to get back in to the museum when you’re not physically in the building. If anything I bet that data gets slowly and painfully typed in to a search engine and then… who knows what happens.

As of this writing the Cooper-Hewitt’s luxury and burden is that we are closed for renovations. We don’t even have labels for people to take pictures of, right now. As we think through what a museum label should do it’s worth remembering that cameras and in particular cameras on phones and the software for doing optical character recognition (OCR) have reached a kind of maturity where they are both fast and cheap and simple. They have, in effect, showed up at the party so it seems a bit rude not to introduce ourselves.

I mentioned that we’re still working on the design of our new labels. This means I’m not going to show them to you. It also means that it would be difficult to show you any of the work that follows in this blog post without tangible examples. So, the first thing we did was to add a could-play-a-wall-label-on-TV endpoint to each object on the collection website. Which is just fancy-talk for “another web page”.

Simply append /label to any object page and we’ll display a rough-and-ready version of what a label might look like and the kind of information it might contain. For example:

http://collection.cooperhewitt.org/objects/18680219/label/

Now that every object on the collection website has a virtual label we can write a simple print stylesheet that allows us to produce a physical prototype which mimics the look and feel and size (once I figure out what’s wrong with my CSS) of a finished label in the real world.

photo 2

So far, so good. We have a system in place where we can work quickly to change the design of a “label” and test those changes on a large corpus of sample data (the collection) and a way to generate an analog representation since that’s what a wall label is.

Careful readers will note that some of these sample labels contain colour information for the object. These are just placeholders for now. As much as I would like to launch with this information it probably won’t make the cut for the re-opening.

Do you remember when I mentioned OCR software at the beginning of this blog post? OCR software has been around for years and its quality and cost and ease-of-use have run the gamut. One of those OCR application is Tesseract which began life in the labs at Hewlitt-Packard and has since found a home and an open source license at Google.

Tesseract is mostly a big bag of functions and libraries but it comes with a command-line application that you can use to pass it an image whose text you want to extract.

In our example below we also pass an argument called label. That’s the name of the file that Tesseract will write its output to. It will also add a .txt extension to the output file because… computers? These little details are worth suffering because when fed the image above this is what Tesseract produces:

$> tesseract label-napkin.jpg label
Tesseract Open Source OCR Engine v3.02.01 with Leptonica
$> cat label.txt
______________j________
Design for Textile: Napkins for La Fonda del
Sol Restaurant

Drawing, United States ca. 1959

________________________________________
Office of Herman Miller Furniture Company

Designed by Alexander Hayden Girard

Brush and watercolor on blueprint grid on white wove paper

______________._.._...___.___._______________________
chocolate, chocolate, sandy brown, tan

____________________..___.___________________________
Gift of Alexander H. Girard, 1969-165-327

I think this is exciting. I think this is exciting because Tesseract does a better than good enough job of parsing and extracting text that I can use that output to look for accession numbers. All the other elements in a wall label are sufficiently ambiguous or unstructured (not to mention potentially garbled by Tesseract’s robot eyes) that it’s not worth our time to try and derive any meaning from.

Conveniently, accession numbers are so unlike any other element on a wall label as to be almost instantly recognizable. If we can piggy-back on Tesseract to do the hard work of converting pixels in to words then it’s pretty easy to write custom code to look at that text and extract things that look like accession numbers. And the thing about an accession number is that it’s the identifier for the thing a person is looking at in the museum.

To test all of these ideas we built the simplest, dumbest HTTP pony server to receive photo uploads and return any text that Tesseract can extract. We’ll talk a little more about the server below but basically it has two endpoints: One for receiving photo uploads and another with a simple form that takes advantage of the fact that on lots of new phones the file upload form element on a website will trigger the phone’s camera.

This functionality is still early days but is also a pretty big deal. It means that the barrier to developing an idea or testing a theory and the barrier to participation is nothing more than the web browser on a phone. There are lots of reasons why a native application might be better suited or more interesting to a task but the time and effort required to write bespoke applications introduces so much hoop-jumping as to effectively make simple things impossible.

photo 2
photo 3

Given a simple upload form which triggers the camera and a submit button which sends the photo to a server we get back pretty much the same thing we saw when we ran Tesseract from the command line:

Untitled-cropped

We upload a photo and the server returns the raw text that Tesseract extracts. In addition we do a little bit of work to examine the text for things that look like accession numbers. Everything is returned as a blob of data (JSON) which is left up to the webpage itself to display. When you get down to brass tacks this is really all that’s happening:

$> curl -X POST -F "file=@label-napkin.jpg" http://localhost | python -mjson.tool
{
    "possible": [
        "1969-165-327"
    ],
    "raw": "______________j________nDesign for Textile: Napkins for La Fonda delnSol RestaurantnnDrawing, United States ca. 1959nn________________________________________nOffice of Herman Miller Furniture CompanynnDesigned by Alexander Hayden GirardnnBrush and watercolor on blueprint grid on white wove papernn______________._.._...___.___._______________________nchocolate, chocolate, sandy brown, tannn____________________..___.___________________________nGift of Alexander H. Girard, 1969-165-327"
}

Do you notice the way, in the screenshot above, that in addition to displaying the accession number we are also showing the object’s title? That information is not being extracted by the “label-whisperer” service. Given the amount of noise produced by Tesseract it doesn’t seem worth the effort. Instead we are passing each accession number to the collections website’s OEmbed endpoint and using the response to display the object title.

Here’s a screenshot of the process in a plain old browser window with all the relevant bits, including the background calls across the network where the robots are talking to one another, highlighted.

label-whisperer-napkin-boxes

  1. Upload a photo
  2. Extract the text in the photo and look for accession numbers
  3. Display the accession number with a link to the object on the CH collection website
  4. Use the extracted accession number to call the CH OEmbed endpoint for additional information about the object
  5. Grab the object title from the (OEmbed) response and update the page

See the way the OEmbed response contains a link to an image for the object? See the way we’re not doing anything with that information? Yeah, that…

But we proved that it can be done and, start to finish, we proved it inside of a day.

It is brutally ugly and there are still many failure states but we can demonstrate that it’s possible to transit from an analog wall label to its digital representation on a person’s phone. Whether they simply bookmark that object or email it to a friend or fall in to the rabbit hole of life-long scholarly learning is left an as exercise to the reader. That is not for us to decide. Rather we have tangible evidence that there are ways for a museum to adapt to a world in which all of our visitors have super-powers — aka their “phones” — and to apply those lessons to the way we design the museum itself.

We have released all the code and documentation required build your own “label whisperer” under a BSD license but please understand that it is only a reference implementation, at best. A variation of the little Flask server we built might eventually be deployed to production but it is unlikely to ever be a public-facing thing as it is currently written.

https://github.com/cooperhewitt/label-whisperer/

We welcome any suggestions for improvements or fixes that you might have. One important thing to note is that while accession numbers are pretty straightforward there are variations and the code as it written today does not account for them. If nothing else we hope that by releasing the source code we can use it as a place to capture and preserve a catalog of patterns because life is too short to spend very much of it training robot eyes to recognize accession numbers.

The whole thing can be built without any external dependencies if you’re using Ubuntu 13.10 and if you’re not concerned with performance can be run off a single “micro” Amazon EC2 instance. The source code contains a handy setup script for installing all the required packages.

Immediate next steps for the project are to make the label-whisperer server hold hands with Micah’s Object Phone since being able to upload a photo as a text message would make all of this accessible to people with older phones and, old phone or new, requires users to press fewer buttons. Ongoing next steps are best described as “learning from and doing everything” talked about in the links below:

Discuss!

Rijkscolors! (or colorific promiscuity)

 

rijkscolours-yellow

(Rijkscolors are currently disabled as we consider longer-term solutions for cross-institutional browsing and searching. It’ll be back soon!)

Rijkscolors are an experimental feature that allow you to browse not only images from the Cooper-Hewitt’s collection but also images from the Rijksmuseum by color!

We see this as one way to start to work through the age-old problem of browsing collections across multiple institutions. Not everyone arrives at the Cooper-Hewitt (or the Rijksmuseum) with an expert knowledge of our curatorial and collecting history and the sheer volume of “stuff” available can be overwhelming. Everyone, at some point, has the “Explore” problem: It’s the point where you have so much good stuff to share with people but no good (or many sort-of-bad) avenues for letting people know about it.

Color is an intuitive, comfortable and friendly way to let people warm up to the breadth and depth of our collections. Since adding the ability to search the collection by color it’s quickly become the primary way that people browse our collection (more on that below) and as such feels like an excellent tool for browsing across collections.

rijkscolours-4

Over time, we hope to add this functionality for many other cultural heritage institutions but chose to start with the Rijksmuseum because we share an historical focus in our early collecting practices and because they were nice (read: AWESOME) enough to make all their collection images available under a liberal Creative Commons license.

We then indexed all those images using the same tools we use to extract colors and measure busy-ness or “entropy” from our own collection and combined the two lists. Images from the Rijksmuseum have a different colored border to indicate that they are not part of our collection. Images from the Rijksmuseum link directly to the page for that object on the Rijksmuseum website itself.

rijkscolours-bunny-crop

As with the concordances for people we just want to hold hands (for now — Seb tells me this means we might want to move to second base in the future) with other museums and are happy to send visitors their way. After all, that’s what the Internet is for!

Rijkscolors is an experimental feature so you’ll need to enable it on a per-browser basis by visiting the experimental features section of the collection website, here:

http://collection.cooperhewitt.org/experimental/#rijkscolors

But wait, there’s more.

We’ve also made public all the code used to harvest metadata and images from the Rijksmuseum as well as the resultant data dumps mapping colors and entropy scores to Rijksmuseum accession numbers with internal Cooper-Hewitt object IDs. We created a custom mapping because we use Solr to do color search on the website and that requires a numeric ID as the primary key for an object.

Then we imported all the objects from the Rijksmuseum, along with their color values and other metrics, in to our Solr index giving them a magic department ID (aka 51949951 or the Rijksmuseum) and making them private by default. If you’ve enabled Riskscolors when we search for objects by color instead of only asking for things with a given color that are public we ask for things that are public OR part of department number 51949951. Simple!

The code and the data dumps are provided as-is, more of a reference implementation and a toolbox than anything you might use without modifications. We’ve put it all on GitHub and we welcome your suggestions and fixes:

https://github.com/cooperhewitt/rijksmuseum-collection/


We mentioned search vs browse so let’s take a peek at the last 30 days (Nov 11 to Dec 10, 2013) of visitor behaviour on the collection site.

last30 days nov-dec-2013 new vs returning

Or put another way:

  • 48.89% of visits used color navigation (anywhere – not just color palette page)
  • 4.39% of visits used normal search
  • 2.24% of visits used random button
  • 1.25% of visits used fancy search

The figures for color navigation are artificially inflated by the press the feature got in Slate, The Verge and elsewhere (the comments are amusing), but even removing that spike, color navigation is at least twice as used as search in the time period. We’ll report back on some new data once December and January are done.

last30 days nov-dec-2013 tos & ppv

Not unsurprisingly, visitors who use search spend a lot more time on the site and look at many more pages. They are also far more likely to be returning visitors. For newbies, though, color and random navigation methods are far more popular – and still result in healthy browsing depths.


In related news Nate Solas sent us a patch for the palette-server, the tool we use to extract colors from our collection imagery. He said:

“…this improves the color detection by making it a bit more human. It goes two ways: 1) boost all color “areas” by saturation, as if saturated colors take up more room in the image. 2) add a “magic” color if a few conditions are met: not already included, more than 2x the average image saturation, and above the minimum area for inclusion.”

palette-server-nate

We’ve now merged Nate’s changes in to our code base (technically it’s actually a change to Giv’s RoyGBiv code) and they will be applied the next time we run the color-extraction tools on our collection (and the Rijksmuseum’s collection). Thanks, Nate!

As with all the experimental features they are … well, experimental. They are a little rough around the edges and we may not have found (or even noticed) any outstanding problems or bugs. We hope that you’ll let us know if you find any and otherwise enjoy following along as we figure out where we’re going, even if we’re not always sure how we get there.

Screen Shot 2013-12-11 at 12.23.23 PM