Author Archives: asc

Object concordances – what is the simplest thing to match like with like?

eames-concordances-full

Do you notice anything special about this screenshot of Charles Eames’ famous No. 670 Chair?

It might be hard to see because it’s a tall screenshot and this is a small thumbnail. Have a look at the large version. Hint: It’s not the part where the chair is missing in the picture. It’s actually this, on the right-hand side of the object details:

eames-concordances-crop

Object concordances! With other museums! To the same objects in their collections!! On their own websites !!!

Before you get too excited (and think its actual working ‘Linked Data’), we should point out that as of this writing we have only “concordified” four distinct objects – this one, this one, this one and that one – eight times with four separate organizations, one of which is our own shop, so there is a lot of work left to do.

Screen Shot 2015-05-28 at 6.57.20 PM

If you look carefully you can see that most of the concordances, to date, were added within about 90 minutes of one another. That’s because Seb and I were talking about object concordances over lunch that day and agreed that we could probably push the simplest and dumbest thing out the door before I went home. It has been something that has been on the agenda since mid-2012.

Specifically, we maintain a fixed list of institutions with whom we will “concordify” objects. If your institution isn’t on that list yet it’s not personal. We can add as many institutions as we want but we think the narrow focus helps to explain the purpose of the tool. Then we simply record that institutions unique ID, the object ID for something in our collection and the object ID for something in their collection. That’s it.

Screen Shot 2015-05-28 at 6.10.49 PM

Currently the tools for adding concordances, or editing institutions, are … terrible.

(Or rather, they are the unadorned plumbing that makes the whole thing work. So they are beautiful and elegant in their own way but most people would be forgiven for not seeing those qualities right away.)

Short-term the goal is to build some friendlier “admin” web page for a few more people to add concordances without having to worry about the technical details. Medium-term the goal is to create restricted API methods for doing fancy-pants buttons and pop-up dialogs on the object pages themselves to allow staff to add concordances as they think of them or are otherwise just poking around the collections website. Maybe in the long term, ‘the crowd’ might be invited to do it too.

Screen Shot 2015-05-28 at 6.11.02 PM

Somewhere between those two things we will also build proper “index” pages on the collections website of all the objects that have been concordified, all the institutions that have concordified objects and so on. Just like we’ve already done for people.

The other thing we’ll do shortly is make sure that these concordances are included in the CC0 Cooper Hewitt collections metadata dump which is available on GitHub.

When we said “the simplest thing” we meant it.

There isn’t much yet but it’s a start – a tangible proof of what it could be – and if we’ve done our job right then it is one of those things that will grow exponentially, as always, as time and circumstance permit.

(If you’ve been a long time reader you might remember we did Rijkscolors back in 2013 as an experiment in automatically matching objects – but we were undone by language and structural differences in metadata, and the reality that humans might still be better at this at least until the sector irons a few things out)

Collect all the things – shoeboxes, shop items and the Pen

Screen Shot 2015-05-27 at 6.15.22 PM

You can now collect any object in the collection, or on display, from the collections website itself. Just like in the galleries there is a small “collect” icon on the top right-hand side of every object page on the collections website. It’s not just individual object pages but also all the object list pages, too. So many “collect” icons!

20150527-shoebox-visit-icon

  Objects that haven’t been collected yet have a grey icon.

  Objects that have been collected in the galleries, as part of a visit to the museum, have a pink icon.

  Objects that that have been collected on the collections website have an orange icon.

Simply click the grey icon to collect an object or click one of the orange or pink icons to remove or un-collect that object.

That’s it!

Screen Shot 2015-05-27 at 6.14.55 PM

Just like visit items, things you collect on the website have a permanent URL that can be made public to share with other people and can be given a bespoke title or description. Objects that you collect on the collections website live in something we’re calling the “shoebox”.

You can get your to shoebox by visiting https://collection.cooperhewitt.org/users/YOUR-USERNAME/shoebox or if you’re already logged in to your Cooper Hewitt account by visiting https://collection.cooperhewitt.org/you/shoebox/.

There is also a handy link in the Your stuff menu, located at the top-left of every page on the collections website.

The shoebox is the set of all the objects you’ve collected (or created) on the website or during your visits to the museum. Although visits and visit items overlap with things in your shoebox we still treat them differently because although you need to be logged in to you Cooper Hewitt account to add things to your shoebox a visit to the museum can be entirely anonymous if a visitor so chooses.

The default view for the shoebox is to display everything together in reverse-chronological order but you can filter the view to show only things collected online or things collected during a visit. You can also see the set of all the objects you’ve made public or private.

20150528-shoebox-loggedout

logged out view (large version)

2015-shoebox-loggedin

logged in view (large version)

But it’s not just objects, either. You can already collect videos during your museum visit so those are included too. Ultimately the only limit to what you might collect with the Pen is time-and-typing. Things we’re thinking about making collect-able include: entire exhibitions or the introductory texts on the wall for an exhibition or people or individual rooms in the Mansion.

Museum retail

We’ve started this process by allowing you to collect things in the museum Shop.

By “things in the Shop” we mean all the things that have ever been sold in the Shop over the years. And by “all the things” we mean almost all the things. There is some technical hoop-jumping related to inventory management systems and that is why we don’t have everything yet but we’ll get there in time.

We are a captial-D design museum with a capital-D design shop and many of the things that have been available in the Shop have gone on to become part of our permanent collection so it only makes sense to give them a home on the collections website. In fact MoMA already does similarly with their “find related products in the MoMA Store” feature though ours is a bit different.

20150527-shop-landing

You can see for yourself at https://collection.cooperhewitt.org/shop

The /shop section is divided in two parts: Brands and Items (and all the items for a given brand of course). There isn’t a whole lot of extra information beyond titles and links to the SHOP Cooper Hewitt website for those items that are currently in-stock but it’s a start. Like the rest of the collections website we’ve started with the idea that providing permanent stable URLs that people can have confidence we create something that can be improved on over time.

20150527-shop-brands-crop

Shop items and brands don’t get updated as regularly as we’d like yet. We are still working through the fiddly details of bridging our systems with the Shop’s ecommerce and POS system and some things still need to be done by hand. We’ve been able to get this far though so we expect things will only get better.

20150527-shoebox-listview-shop

You might be wondering…

You might be reading this and starting to wonder Hmmm… does that mean I can also collect things in the Shop as I walk around the museum with the Pen? the answer is… Yes!

As of this writing there are only one or two items that can be collected with the Pen because the Shop staff are still getting familiar with the tools and thinking about how making collect-able labels changes in their day-to-day workflow. The obvious future of this might be the infamous ‘wedding register’, however we believe that many museum visitors actually would like to bookmark objects to possibly buy later, or just remember as part of their overall visit to the ‘museum campus’.

Practically what that has meant are some changes to Sam‘s “tag writer” application (the subject of a future blog post) to fetch shop items via our API and then letting the Shop folks decide what they want to tag and when they want to do it.

There has been a whole lot of change here over the course of the last three years and allowing the various parts of the museum warm up to the possibilities that the Pen starts to afford at their own pace and with not only a minimum of fuss but plenty of wiggle-room for experimentation is really important.

In the meantime we hope that you enjoy collecting at least more, if not all, of the things that make up the museum.

Exporting your visits

Screen Shot 2015-05-12 at 7.03.10 PM

Starting today you can export the items you have collected or created during your visits to the museum. When you export a visit we will bundle up all the objects you’ve collected and all the items you’ve created in to a static website that is then compressed and made available for you to download directly.

A static website means that you can view all of your visit items in any old web browser, even when it’s not connected to the Internet. It means that if you have your own website you can copy your visit export over it and host it and share it and, well… do whatever you want with it.

Where “whatever you want” means “so long as you comply” with the Smithsonian Terms of Use or assert your rights under Fair Use if you are based in the US.

We think that this is of particular importance to educators who may not have unfiltered or functional internet connections in their classrooms.

Screen Shot 2015-05-13 at 2.44.53 PM

A visit export doesn’t have all the same bells and whistles that your visit on the Cooper Hewitt collections website does but everything you need to view an export (except a web browser obviously) is contained in the file you download. There is a landing page, and a paginated view of everything you’ve done and a page for every object collected and each one of your creations.

Visit exports also come with a friendly and detailed JSON file for every item you’ve collected or created. If you don’t know what that last sentence means, don’t worry about it. It just means that everything you’ve done during a visit also has a file containing structured metadata about that activity which your developer friends may get excited about.

Screen Shot 2015-05-12 at 7.03.29 PM

Visit exports use are very own js-cooperhewitt-images library to manage square-cropped thumbnails that reveal the complete thumbnail when you mouse over them, just like on the collections website.

Screen Shot 2015-05-12 at 5.34.16 PM

Images for loan objects are not included with your visit download. That’s because they’re loan objects and we only have permission to host those images from our own collections website. Instead of including the images locally in your visit download every time there is a loan object we link directly to the image hosted on our own website.

If you’re not online (or your web browser hasn’t already cached a copy of the image on your hard drive) then your visit pages are smart enough to load a placeholder image for that object. Like this:

Screen Shot 2015-05-13 at 2.43.08 PM

We do the same for individual item pages too:

Screen Shot 2015-05-13 at 2.44.08 PM

online
Screen Shot 2015-05-13 at 2.43.41 PM

offline

Visit exports are deliberately minimal, by design. They contain a small amount of HTML markup that’s been enhanced with a little bit of JavaScript and CSS to create a minimally elegant export that people can easily tailor to their own needs. Some people may quibble with the idea that including both the jQuery and Bootstrap libraries is not really a “little bit of JavaScript and CSS” but we hope that we have done things in such a way that it’s easy for people to change if they choose to.

Visit exports are currently only available for visits that have been “paired” with your Cooper Hewitt account. A visit that has been exported is cached on our servers but it can be regenerated when something about your visit changes – you delete an item, or add a note and so on – not more than once per day. Each one of your visits (remember: each one of your paired visits) has a handy export button at the bottom of each page and you can see a list of all your exported/exportable visits by going to: https://collection.cooperhewitt.org/you/visits/exports/

Screen Shot 2015-05-13 at 11.05.26 AM

The exports themselves are generated using our own API and the recently released cooperhewitt.visit and cooperhewitt.visit.items family of methods. There is a bunch of bespoke code that we’ve written to manage how exports are scheduled and stored but the part that actually builds your export is a plain-vanilla API application using the same public API methods that you might use to generate your own visit export.

Screen Shot 2015-05-19 at 2.19.04 PM

In time we may open source the API application we’ve written but for now we’re going to keep putting it through its paces to make sure that it works consistently, as expected, and to force ourselves to use the same tools we’re making available to people outside the “hula hoop“.

Screen Shot 2015-05-12 at 5.53.41 PM

Finally, a little bit of administrivia: Your visit exports are made available under the Smithsonian Terms of Use agreement. You can read the entire document but the short (and relevant) bits are:

The Smithsonian Institution (the “Smithsonian”) provides the content on this website (www.si.edu), other Smithsonian websites, and third- party sites on which it maintains a presence (“SI Websites”) in support of its mission for the “increase and diffusion of knowledge.” The Smithsonian invites you to use its online content for personal, educational and other non-commercial purposes; this means that you are welcome you to make fair use of the Content as defined by copyright law. Information on United States copyright fair use law is available from the United States Copyright Office. Please note that you are responsible for determining whether your use is fair and for responding to any claims that may arise from your use.

In addition, the Smithsonian allows personal, educational, and other non-commercial uses of the Content on the following terms:

You must cite the author and source of the Content as you would material from any printed work.

You must also cite and link to, when possible, the SI Website as the source of the Content.

You may not remove any copyright, trademark, or other proprietary notices including attribution information, credits, and notices, that are placed in or near the text, images, or data.

In addition to copyright, you must comply with all other terms or restrictions (such as trademark, publicity and privacy rights, or contractual restrictions) as may be specified in the metadata or as may otherwise apply to the Content. Please note that you are responsible for making sure that your use does not violate or infringe upon the rights of anyone else.

Screen Shot 2015-05-18 at 3.59.53 PM

Enjoy!

Publishing is as publishing does – revealing ‘books’ in the collection

Screen Shot 2015-04-23 at 10.48.30 AM

Note: This book is actually 144 pages long and the count is a by-product of the way we’ve stitched things together. By the time you read this that problem may be fixed. So it goes, right?

We’ve added a new section to the Collections website: publications. You know, books.

This is the simplest dumbest thing we could think of to create a bridge between analog publications and the web. It’s only a handful of recent publications at the moment and whether or not older publications will be supported remains an open question, for now.

To be clear – there are already historical publications available for viewing on the main Cooper Hewitt website. As I was writing this blog post Micah reminded me that we’ve even uploaded them in to the Internet Archive so you can use their handy book reader to view the books online. All of which means that we’ll likely be importing those publications to the collections website soon enough.

All of this (newer) work is predicated on the fact that we have the luxury, with these specific publications, of operating outside the “work” versus “edition” dilemma that many other kinds of books have to negotiate. All we’ve done is created stable permanent URLs for each book and each page in that book. That’s it.

Screen Shot 2015-04-23 at 11.28.22 AM

The goal is not to reproduce the book online, for all the usual reasons, but to give meaningful atomic units of a book – pages – a presence on the Interwebs and a scaffolding for future stuff (object lists, additional photographs, notes and other ancillary materials and so on) as time and circumstance permit.

Related, Emily Fildes’ and Allison Foster’s Museums and the Web (2015) paper
What the Fonds?! The ups and downs of digitising Tate’s Archive
is a good discussion around the issues, both technical and user-facing, that are raised as various sources of disparate data (artworks, library and archive data, curatorial files) all start to share the same conceptual space on the web.

Screen Shot 2015-04-23 at 10.47.28 AM

We’re not there yet and it may take us a while to get there so in the meantime every page URL has a small half-toned reproduction of the book page in question. That’s meant to give people a visual cue and confidence in the URL itself — specifically they look the same — such that you might bookmark it, share it with a friend, or whatever awesome use you dream up without having to wonder whether the ground will shift out from underneath it.

Kind of like books, right?

Finally, all the links indicating how many pages a particular book has are “magic” – click on them and you’ll be redirected to a random page inside that book.

Enjoy!

Screen Shot 2015-04-23 at 12.29.48 PM
Screen Shot 2015-04-23 at 12.30.14 PM

We choose Bao Bao!

So, the Pen went live on March 10. We’re handing them out to every visitor and people are collecting objects all over the place. Yay!

The Pen not only represents a whole world of brand-new for the museum but an equally enormous world of change for staff and the ways they do their jobs. One of the places this has manifested itself is the sort of awkward reality of being able to collect an object in the galleries only to discover that the image for that object or, sometimes, the object itself still hasn’t been marked as public in the collections database.

It’s unfortunate but we’ll sort it all out over time. The more important question right now is how we handle objects that people have collected in the galleries (that are demonstrably public) but whose ground truth hasn’t bubbled back up to our own canonical source of truth.

In the early days when we were building and testing the API methods for recording the objects that people collected the site would return a freak-out-and-die error the moment it encountered something that a visitor didn’t have permissions to see. This is a pretty normal approach in software and systems development but it made testing over the overall system complicated and time-consuming.

In the interest of expediency we replaced the code that threw a temper tantrum with code that effectively said la la la la la… I can’t hear you! If a visitor tried to collect something that they didn’t have permissions to see we would simply drop it on the floor and pretend it never happened. This was useful in fleshing out the rest of the overall workflow of the system but we also understood that it was temporary at best.

Screen Shot 2015-03-20 at 12.07.07 PM

Allowing a user to collect something in the gallery and then denying any evidence of the event on their visit webpage would be… not good. So now we record the item being collected but we also record a status flag next to that event assuming that the disconnect between reality and the database will work itself out in favour of the visitor.

It also means that the act of collecting an object still has a permalink; something that a visitor can share or just hold on to for future reference even if the record itself is incomplete. And that record exists in the context of the visit itself. If you can see the other objects that you collected around the same time as a not-quite-public-yet object then they can act as a device to remember what that mystery thing is.

Which raises an important question: What should we use as a placeholder? Until a couple of days ago this is what we showed visitors.

streetview-cat-words

Although the “Google Street View Cat” has a rich pedigree of internet meme-iness it remains something of an acquired taste. This was a case of early debugging and blowing-off-steam code leaking in to production. It was also the result of a bug ticket that I filed for Sam on January 21 being far enough down to the list of things to do before and immediately after the launch of the Pen that it didn’t get resolved until this week. The ticket was simply titled “Animated pandas”.

As in, this:

This is the same thread that we’ve been pulling on ever since we started rebuilding the collections website: When we are unable to show something to a visitor (for whatever reason) what do we replace the silence with?

We choose Bao Bao!

API methods (new and old) to reflect reality

design-eagle-cloud

A quick end-of-week blog post to mention that now that the museum has re-opened we have updated the cooperhewitt.galleries.openingHours and cooperhewitt.galleries.isOpen API methods to reflect… well, reality.

In addition to the cooperhewitt.galleries API methods we’ve also published corresponding openingHours and isOpen methods for the cafe!

For example, cooperhewitt.galleries.isOpen.

curl 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.galleries.isOpen&access_token=***'

{
	"open": 0,
	"holiday": 0,
	"hours": {
		"open": "10:00",
		"close": "18:00"
	},
	"time": "18:01",
	"timezone": "America/New_York",
	"stat": "ok"
}

Or, cooperhewitt.cafe.openingHours.

curl -X GET 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.cafe.openingHours&access_token=***'

{
	"hours": {
		"Sunday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Monday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Tuesday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Wednesday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Thursday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Friday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Saturday": {
			"open": "07:30",
			"close": "21:00"
		}
	},
	"timezone": "America/New_York",
	"stat": "ok"
}

Because coffee, right?

HTTP ponies

Most of the image processing for the collections website is done using the Python programming language. This includes things like: extracting colours or calculating an image’s entropy (its “busy-ness”) or generating those small halftone versions of image that you might see while you wait for a larger image to load.

Soon we hope to start doing some more sophisticated computer vision related work which will almost certainly mean using the OpenCV tool chain. This likely means that we’ll continue to use Python because it has easy to use and easy to install bindings to hide most of the fiddly bits required to look at images with “robot eyes”.

shirt

The collections website itself is not written in Python and that’s okay. There are lots of ways for different languages to hold hands inside of a single “application” and we’ve used many of them. But we also think that most of these little pieces of functionality are useful in and of themselves and shouldn’t require that a person (including us) have to burden themselves with the minutiae of the collections website infrastructure to use them.

We’ve slowly been taking the various bits of code we’ve written over the years and putting them in to discrete libraries that can then be wrapped up in little standalone HTTP “pony” or “plumbing” servers. This idea of exposing bespoke pieces of functionality via a web server is hardly new. Dave Winer has been talking about “fractional horsepower HTTP servers” since 1997. What’s changed between then and now is that it’s more fun to say “HTTP pony” and it’s much easier to bake a little web server in to an application because HTTP has become the lingua franca of the internet and that means almost every programming language in use today knows how to “speak” it.

In practice we end up with a “stack” of individual pieces that looks something like this:

  1. Other people’s code that usually does all the heavy-lifting. An example of this might be Giv Parvaneh’s RoyGBiv library for extracting colours from images or Mike Migurski’s Atkinson library for dithering images.
  2. A variety of cooperhewitt.* libraries to hide the details of other people’s code.
  3. The cooperhewitt.flask.http_pony library which exports a setup of helper utilities for the running Flask-based HTTP servers. Things like: doing a minimum amount of sanity checking for uploads and filenames or handling (common) server configuration details.
  4. A variety of plumbing-SOMETHING-server HTTP servers which export functionality via HTTP GET and POST requests. For example: plumbing-atkinson-server, plumbing-palette-server and so on.
  5. Flask, a self-described “micro-framework” which is what handles all the details of the HTTP call and response life cycle.
  6. Optionally, a WSGI-compiliant server-container-thing-y for managing requests to a number Flask instances. Personally we like gunicorn but there are many to choose from.

Here is a not-really-but-treat-it-like-pseudo-code-anyway example without any error handling for the sake of brevity of a so-called “plumbing” server:

# Let's pretend this file is called 'example-server.py'.

import flask
from flask_cors import cross_origin
import cooperhewitt.example.code as code
import cooperhewitt.flask.http_pony as http_pony

app = http_pony.setup_flask_app('EXAMPLE_SERVER')

@app.route('/', methods=['GET', 'POST'])
@cross_origin(methods=['GET', 'POST'])
def do_something():

    if flask.request.method=='POST':
       path = http_pony.get_upload_path(app)
    else:
       path = http_pony.get_local_path(app)

    rsp = code.do_something(path)
    return flask.jsonify(rsp)

if __name__ == '__main__':
    http_pony.run_from_cli(app)

So then if we just wanted to let Flask take care of handling HTTP requests we would start the server like this:

$> python example-server.py -c example-server.cfg

And then we might talk to it like this:

$> curl -X POST -F 'file=@/path/to/file' https://localhost:5000

Or from the programming language of our choosing:

function example_do_something($path){
        $url = "https://localhost:5000";
        $file = curl_file_create($path);
        $body = array('file' => $file);
        $rsp = http_post($url, $body);
        return $rsp;
}

Notice the way that all the requests are being sent to localhost? We don’t expose any of these servers to the public internet or even between different machines on the same network. But we like having the flexibility to do that if necessary.

Finally if we just need to do something natively or want to write a simple command-line tool we can ignore all the HTTP stuff and do this:

$> python
>>> import cooperhewitt.example.code as code
>>> code.do_something("/path/to/file")

Which is a nice separation of concerns. It doesn’t mean that programs write themselves but they probably shouldn’t anyway.

If you think about things in terms of bricks and mortar you start to notice that there is a bad habit in (software) engineering culture of trying to standardize the latter or to treat it as if, with enough care and forethought, it might achieve sentience.

That’s a thing we try to guard against. Bricks, in all their uniformity, are awesome but the point of a brick is to enable a multiplicity of outcomes so we prefer to leave those details, and the work they entail, to people rather than software libraries.

face-tower

Most of this code has been open-sourced and hiding in plain sight for a while now but since we’re writing a blog post about it all, here is a list of related tools and libraries. These all fall into categories 2, 3 or 4 in the list above.

  • cooperhewitt.flask — Utility functions for writing Flask-based HTTP applications. The most important thing to remember about things in this class is that they are utility functions. They simply wrap some of the boilerplate tasks required to set up a Flask application but you will still need to take care of all the details.

Everything has a standard Python setup.py for installing all the required bits (and more importantly dependencies) in all the right places. Hopefully this will make it easier for us break out little bits of awesomeness as free agents and share them with the world. The proof, as always, will be in the doing.

face-mirror

We’ve also released go-ucd which is a set of libraries and tools written in Go for working with Unicode data. Or more specifically, for the time being since they are not general purpose Unicode tools, looking up the corresponding ASCII name for a Unicode character.

For example:

$> ucd 䍕
NET; WEB; NETWORK, NET FOR CATCHING RABBIT

Or:

$> ucd THIS → WAY
LATIN CAPITAL LETTER T
LATIN CAPITAL LETTER H
LATIN CAPITAL LETTER I
LATIN CAPITAL LETTER S
SPACE
RIGHTWARDS ARROW
SPACE
LATIN CAPITAL LETTER W
LATIN CAPITAL LETTER A
LATIN CAPITAL LETTER Y

There is, of course, a handy “pony” server (called ucd-server) for asking these questions over HTTP:

$> curl -X GET -s 'https://localhost:8080/?text=♕%20HAT' | python -mjson.tool
{
    "Chars": [
        {
            "Char": "u2655",
            "Hex": "2655",
            "Name": "WHITE CHESS QUEEN"
        },
        {
            "Char": " ",
            "Hex": "0020",
            "Name": "SPACE"
        },
        {
            "Char": "H",
            "Hex": "0048",
            "Name": "LATIN CAPITAL LETTER H"
        },
        {
            "Char": "A",
            "Hex": "0041",
            "Name": "LATIN CAPITAL LETTER A"
        },
        {
            "Char": "T",
            "Hex": "0054",
            "Name": "LATIN CAPITAL LETTER T"
        }
    ]
}

This one, potentially, has a very real and practical use-case but it’s not something we’re quite ready to talk about yet. In the meantime, it’s a fun and hopefully useful tool so we thought we’d share it with you.

Note: There are equivalent libraries and an HTTP pony for ucd written in Python but they are incomplete compared to the Go version and may eventually be deprecated altogether.

Comments, suggestions and gentle clue-bats are welcome and encouraged. Enjoy!

face-stand

A colophon for bias

The term [colophon] derives from tablet inscriptions appended by a scribe to the end of a … text such as a chapter, book, manuscript, or record. In the ancient Near East, scribes typically recorded information on clay tablets. The colophon usually contained facts relative to the text such as associated person(s) (e.g., the scribe, owner, or commissioner of the tablet), literary contents (e.g., a title, “catch” phrase, number of lines), and occasion or purpose of writing.

Wikipedia

A couple of months ago we added the ability to search the collections website by color using more than one palette. A brief refresher: Our search by color functionality works by first extracting the dominant palette for an index. That means the top 5 colors out of a possible 32 million choices. 32 million is too large a surface area to search against so each of the five results are then “snapped” to their closest match on a much smaller grid of possible colors. These matches are then indexed and used to query our database when someone searches for objects matching a specific color.

It turns out that the CSS3 color palette which defines a fixed set of 138 colors is an excellent choice for doing this sort of thing. CSS is the acronym for Cascading Style Sheets (CSS) which is a “language used to describe the presentation” of a webpage separate from its content. Instead of asking people searching the collections website to be hyper-specific in their queries we take the color they are searching for and look for the nearest match in the CSS palette.

For example: #ef0403 becomes #ff0000 or “red”. #f2e463 becomes #f0e68c or “khaki” and so on.

This approach allows us to not only return matches for a specific color but also to show objects that are more like a color than not. It’s a nice way to demonstrate the breadth of the collection and also an invitation to pair objects that might never be seen together.

search-is-over.020-640

From the beginning we’ve always planned to support multiple color palettes. Since the initial search-by-color functionality was built in a hurry with a focus on seeing whether we could get it to work at all adding support for multiple palettes was always going to require some re-jiggering of the original code. Which of course means that finding the time to make those changes had to compete with the crush of everything else and on most days it got left behind.

Earlier this year Rebecca Alison Meyer the 6-year old daughter of Eric Meyer, a long-standing member of the CSS community, died of cancer. Eric’s contributions and work to promote the CSS standard can not be overstated. The web would be an entirely other (an entirely poorer) space without his efforts and so some people suggested that a 139th color be added to the CSS Color module to recognize his work and honor his daughter. In June Dominique Hazaël-Massieux wrote:

I’m not sure about how one goes adding names to CSS colors, and what the specific purpose they fulfill, but I think it would be a good recognition of @meyerweb ‘s impact on CSS, and a way to recognize that standardization is first and foremost a social process, to name #663399 color “Becca Purple”.

In reply Eric Meyer wrote:

I have been made aware of the proposal to add the named color beccapurple (equivalent to #663399) to the CSS specification, and also of the debate that surrounds it.

I understand the arguments both for and against the proposal, but obviously I am too close to both the subject and the situation to be able to judge for myself. Accordingly, I let the editors of the Colors specification know that I will accept whatever the Working Group decides on this issue, pro or con. The WG is debating the matter now.

I did set one condition: that if the proposal is accepted, the official name be rebeccapurple. A couple of weeks before she died, Rebecca informed us that she was about to be a big girl of six years old, and Becca was a baby name. Once she turned six, she wanted everyone (not just me) to call her Rebecca, not Becca.

She made it to six. For almost twelve hours, she was six. So Rebecca it is and must be.

Shortly after that #663399 or rebeccapurple was added to the CSS4 Colors module specification. At which point it only seemed right to finally add support for multiple color palettes to the collections website.

20140818-rebeccapurple-sm

Over the course of a month or so, in the margins of day, all of the search-by-color code was rewritten to work with more than a single palette and now you can search the collection for objects in the shade of rebeccapurple.

In addition to the CSS3 and CSS4 color palettes we also added support for the Crayola color palette. For example, the closest color to “rebeccapurple” in the Crayola scheme of things is “cyber grape”.

You can see all the possible nearest-colors for an object by appending /colors to an object page URL. For example:

https://collection.cooperhewitt.org/objects/18380795/colors

The dominant color for this object is #683e7e which maps to #58427c or “cyber grape” in Crayola-speak and #483d8b or “dark slate blue” in CSS3-speak and #663399 or “rebeccapurple” in CSS4-speak.

Now that we’ve done the work to support multiple palettes the only limits to adding more is time and imagination. I would like to add a greyscale palette. I would like to add one or more color-blind palettes. I would especially like to add a “blue” palette – one that spans non-photo blue through International Klein Blue all the way to Kind of Bloop midnight blue just to see where along that spectrum objects which aren’t even a little bit blue would fall.

Screen Shot 2014-10-26 at 12.42.02 PM

The point being that there are any number of color palettes that we can devise and use as a lens through which to see our collection. Part of the reason we chose to include the Crayola color palette in version “2” of search-by-color is because the colors they’ve chosen have been given expressive names whose meaning is richer than the sum of their descriptive parts. What does it mean for an object’s colors to be described as macaroni and cheese-ish or outer space-ish in nature? Erika Hall’s 2007 talk Copy is Interface is an excellent discussion of this idea.

I spoke about some of these things last month at the The Search is Over workshop, in London. I described the work we have done on the collections website, to date, as a kind of managing of absence. Specifically the absence of metadata and ways to compensate for its lack or incompleteness while still providing a meaningful catalog and resource.

It is through this work that we started to articulate the idea that: The value of the whole in aggregate, for all its flaws, outweighs the value of a perfect subset. The irregular nature of our collection metadata has also forced us to consider that even if there were a single unified interface to convey the complexities of our collection it is not a luxury we will enjoy any time soon.

search-is-over.023-640

Further the efforts of more and more institutions (the Cooper Hewitt included) to embark on mass-digitization projects forces an issue that we, as a sector, have been able to side-step until now: That no one, including lots of people who actually work at museums, have ever seen much of the work in our collections. So in relatively short-order we will transition from a space defined by an absence of data to one defined by a surfeit of, at the very lest, photographic evidence that no one will know how to navigate.

To be clear: This is a good problem to have but it does mean that we will need to starting thinking about models to recognize the shape of the proverbial elephant in the room and building tools to see it.

It is in those tools that another equally important challenge lies. The scale and the volume of the mass-digitization projects being undertaken means that out of necessity any kind of first-pass cataloging of that data will be done by machines. There simply isn’t the time (read: money) to allow things to be cataloged by human hands and so we will inevitably defer to the opinion of computer algorithms.

This is not necessary as dour a prediction as it might sound. Color search is an example of this scenario and so far it’s worked out pretty well for us. What search-by-color and other algorithmic cataloging points to is the need to develop an iconography, or a colophon, to indicate machine bias. To design and create language and conventions that convey the properties of the “extruder” that a dataset has been shaped by.

search-is-over.033-640

Those conventions don’t really exist yet. Bracketing search by color with an identifiable palette (a bias) is one stab at the problem but there are so many more places where we will need to signal the meaning (the subtext?) of an automated decision. We’ve tried to address one facet of this problem with the different graphic elements we use to indiciate the reasons why an object may not have an image.

missing-nnot-available-n

no-photography-n

Left to right: We’re supposed to have a picture for this object… but we can’t find it; This object has not been photographed; This object has been photographed but for some reason we’re not allowed to show it to you… you know, even though it’s been acquired by the Smithsonian.

Another obvious and (maybe?) easy place to try out this idea is search itself. Search engines are not, in fact, magic. Most search engines work the same way: A given string is “tokenized” and then each resultant piece is “filtered”. For the example the phrase “checkered Girard samples” might typically be tokenized by splitting things on whitespace but you could just as easily tokenize it by any pattern that can be expressed to a computer. So depending on your tokenized you might end up with a list like:

  • checkered
  • Girard
  • samples

Or:

  • checkered Girard
  • samples

Each one of those “tokens” are then analyzed and filtered according to their properties. Maybe they get grouped by their phonetics, which is essentially how the snap-to-grid trick works for the collection’s color search. Maybe they are grouped by what type of word they are: proper nouns, verbs, prepositions and so on. I’ve never actually seen a search engine that does this but there is nothing technically to prevent someone from doing it either.

The simplest and dumbest thing would be to indicate on a search results page that your query results were generated using one or more tokenizers or filters. In our case that would be (1) tokenizer and (5) filters.

Tokenizers:

    1. Unicode Standard Annex #29

Filters:

      1. Remove English possessives
      2. Lowercase all tokens
      3. Ingore a set list of stopwords
      4. Stem tokens according to the Porter Stemming Algorithm
      5. Convert non-ascii characters to ascii

That’s not very sexy or ooh-shiny but not everything needs to be. What it does, though, is provide a measure of transparency for people to gauge the reality that any result set is the product of choices which may have little or no relationship to the question being asked or the person asking that question.

These are devices, for sure, and they are not meant to replace a more considered understanding or contemplation of a topic but they can act as an important shorthand to indicate the arc of an answer’s motive.

search-is-over.038-640

And that’s just for search engines. Now imagine what happens when we all start pointing computer vision algorithms at our collections…


Update: Since publishing this blog post the nice people working on the GOV.UK websites launched “info” pages. Visitors can now append /info to any of the pages on the gov.uk website will and see what and who and how that part of the website is supposed to do. Writing about the project they say:

An ‘info’ page contains the user needs the page is intended to meet … Providing an easy way to jump from content to the underpinning needs allows content designers coming to a new topic to understand the need and build empathy with the users quicker. Publishing the GOV.UK user needs should also make the team’s work more transparent and traceable.

Bravo!

The Medium is the Message (and pubsocketd)

Screen Shot 2014-08-02 at 1.31.57 PM

Have you ever wanted to see a real-time view of all the objects that people are looking at on the collections website? Now you can!

At least for objects with images. There are lots of opportunities to think about interesting ways to display objects without images but since everything that follows has been a weekends-and-mornings project we’ve opted to start with the “simple” thing first.

We have lots of different ways of describing media: 12, 865 ways at last count to be precise. The medium with the most objects (2, 963) associated with it is cotton but all of these numbers are essentially misleading. The history of the cataloging of the collection has preferenced precision and detail over the kind of rough bucketing (for example, tags) that lots of people are used to these days.

It’s a practice that can sometimes seem frustrating in the moment but, in the long-run, we’re better served for it. In time we will get around to assigning high-level categorizations for equally high-level browsing but it’s worth remembering that the practice of describing objects in minute detail predates things like databases, which we take for granted today. In fact these classifications, and their associated conventions and rituals, were the de-facto databases before computers or databases had even been invented.

But 13,000 different media, most of which only describe a single object, can be overwhelming. Where do you start? How do you know what to look for? Given the breadth of our collection what don’t we have? And given the level of detail we try to assign to objects how to do you whether a search doesn’t yield any results because it’s not in our collection or simply because we’re using a different name for the same thing you’re looking for?

This is a genuinely Big and Hairy Problem and we have not solved it yet. But the ability to relay objects as they are viewed by the public, in real-time, offers an interesting opportunity: What if we just displayed (and where possible, read aloud) the medium for that object?

Screen Shot 2014-08-02 at 1.31.29 PM

That’s all The Medium is the Message does: It is an ambient display that let you keep an eye on the kinds of things that are in our collection and offered a gentle, polite way to start to see the shape of all the different things that tell the story of the museum. It’s not a tool to help you take a quiz so much as a way to absorb an awareness of the collection as if by osmosis. To show people an aspect of the collection as an avenue to begin understanding its entirety.

We’re not thinking enough about sound. If we want all these things to communicate with us, and we don’t want to be starting at screens and they’re going to do more than flash a couple of lights, then we need to work with sound. Either ‘sound effects’ that mean something or devices that talk to us. Personally, I think it’ll be the latter morphing into the former. And this is worth thinking about because it’s already creeping up on us. Self-serve checkouts are talking at us, reversing trucks are beeping at us, trucks turning left are barking at us, incoherently – all with much less apparent thought and ‘design’ than we devote to screens.

— Russell Davies,  the internet of talking

While The Medium is the Message is a full-screen application that displays a scaled-up version of the square-crop thumbnail for an object it also tries to use your browser’s text-to-speech capabilities to read aloud that object’s medium. It may not be the kind of thing you want playing in a room full of people but alone in your room, or under a pair of headphones, it’s fun to imagine it as a kind of Music for Airports for cultural heritage.

Screen Shot 2014-08-02 at 1.32.27 PM

Text-to-speech is currently best supported in Chrome and Safari. Conversely the best support for crisp and pixelated image-rendering is in Firefox. Because… computers, right?

For the time being The Medium is the Message lives in a little sand-box all by itself over here:

https://medium.collection.cooperhewitt.org

Eventually we hope to merge it back in to the main collections website but since it’s all brand-new we’re going to put it some place where it can, if necessary, have little melt-downs and temper-tantrums without adversely affecting the rest of the collections website. It’s also worth noting that some internal networks – like at a big company or organization – might still disallow WebSockets traffic which is what we’re using for this. If that’s the case try waiting until you’re home.

Screen Shot 2014-08-04 at 3.16.31 PM
And now, for the Nerdy Bits: The rest of this blog post is captial-T technical so you can stop reading now if that’s not your thing (though we think it’s stil pretty interesting even if the details sound like gibberish).

The Medium is the Message is part of a larger project to investigate a few different tools in order to understand how they might fit together and to what effect. They are:

  • Redis and in particular its implementation of Publish/Subscribe messaging paradigm – Every once in a while there’s a piece of software that is released which feels like genuine magic. Arguably one of the last examples of this was memcached originally written by Brad Fitzpatrick, for the website LiveJournal and without which entire slices of the web as we now know it wouldn’t exist. Both Redis and memcached are similar in spirit in that their feature-set is limited by design but what they claim to do Just Works™ and both have broad support across the landscape of programming languages. That last piece is incredibly important since it means we can use Redis to bridge applications written in whatever language suits the problem best. We’ll return to that idea in the discussion of “step 0” below.
  • Websockets – WebSockets are a way for a web browser and a server to create and maintain a persistent connection and to shuttle messages back and forth. Normally the chatter between a browser and a server happens akin to the way two people might send each other postcards in the mail and WebSockets are more like a pair of teenagers calling each on the phone and talking for hours and hours and hours. Sort of like Pub/Sub for a web browser, right? WebSockets have been around for a few years now but they are still a bit of a new territory; super-cool but not without some pitfalls.
  • Go – Go is a programming language from the nice people at Google, that recently celebrated its fourth anniversary. It is part of growing trend in language design to find a middle ground between loosely typed languages, and the need to develop stable applications with a minimum of fussiness. Go is probably not the language we would develop a complex user-facing application in but for long-running services with well-defined boundaries it seems kind of perfect. (Go’s notion of code-based channels are a fascinating parallel to both Pub/Sub and WebSockets but that’s a whole other blog post.)

Fun fact: The Labs’ very own Sam Brenner‘s ITP thesis project called Adventures of Teen Bloggers is an archive of old LiveJournal accounts in the shape of an 8-bit video game!

Adventures of Teen Bloggers

In order to test all of those technologies and how they might play together we built pubsocketd which is a simple daemon written in Go that subscribes to a Pub/Sub channel and ferries those messages to a browser using Websockets (WS).

  1. Listen for messages from a specific (Redis) Pub/Sub channel
  2. Accept incoming WS requests
  3. Shuttle any messages from the Pub/Sub channel to all the open WS connections

That’s it. It is left up to WS clients (your web browser) to figure out what to do with those messages.

$> ./pubsocketd -ws-origin=https://example.com
2014/08/01 17:23:38 [init] listening for websocket requests on 127.0.0.1:8080/, from https://example.com
2014/08/01 17:23:38 [init] listening for pubsub messages from 127.0.0.1:6379 sent to the pubsocketd channel
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][handshake] OK
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][request] OK
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][connect] OK
2014/08/01 17:23:53 [10.20.30.40][10.20.30.40:56401][send] OK
2014/08/01 17:24:05 [10.20.30.40][10.20.30.40:56401][send] OK
# and so on...

The “step 0” in all of this is the ability for the collections website itself to connect to a Redis server and send a Pub/Sub message, whenever someone views an object, to the same channel that the pubsocketd server is listening to.

ws-liden

This allows for a nice clean separation of concerns and provides a simple way for related, but fundamentally discrete, applications to interact without getting up in each other’s business.

Given the scope of the project we probably could have accomplished the same thing, with less scaffolding, using Server-Sent Events (SSE) but this was as much an exercise designed to get our feet wet with both WebSockets and Go so it’s been worth doing it the “hard way”.

Matthew Rothenberg, creator of the popular EmojiTracker, was nice enough to open-source the Go-based SSE server endpoint he wrote to feed his application and we may eventually re-write The Medium is the Message, or future applications like it, to use that.

Screen Shot 2014-08-04 at 4.11.45 PM

We’ve open-sourced the code for pubsocketd under a BSD license and we welcome suggestions, patches and (gentle) clue-bats:

https://github.com/cooperhewitt/go-pubsocketd

Enjoy!

Robot Rothko

20140707-robot-rothko-infobox

Now that I’ve written this blog post it occurs to me that it would be trivial to build something similar on top of the Cooper Hewitt Collections API — since that’s ultimately where all this colour stuff comes from — so I will probably do that shortly and stick in it the Play section.

That’s something I wrote last week on my personal weblog. I was writing about a little web “application” that I’d made to generate algorithmic “multiforms” that recall the work of the late painter Mark Rothko. The source of the colors used to create these robot-multiforms are derived from photo uploads and extracted using the same code that the Cooper Hewitt uses to generate color palettes for the objects in our collection. We wrote about that process last year.

These robot “paintings” are built by fetching three photos and using their primary color to fill one of three stacked rectangles that make up the canvas. A dominant color for a fourth photo is used along with an inset CSS3 box-shadow to give the illusion a fuzzy, hazy background on which the rectangles sit. Every 60 seconds a new version is generated and the colors (and boxes) gently transition from old to new.

20140705-robot-rothko-2004

In that original blog post, I also wrote:

That’s it. It doesn’t do anything else and that’s part of the charm for me. It just sits in the background running in second-screen-mode stamping out robot-Rothko paintings. … It’s nice to have a new screen friend to spend the days the days with

They’re not really Rothko paintings, obviously, and to suggest that they are would do the painter a disservice. Rothko’s paintings are not just any random set of colors stacked on top of one another. Rothko worked long and hard to choose the arrangement of his paintings and it’s easy to imagine that he would have been horrified by some of the combinations that Robot Rothko offers up. But like the experimental Albers Boxes feature they are a nod and gesture – and a wink – towards the real thing.

20140707-robot-rothko-fluid

Having gotten things working for a personal non-museum and not-really-for-strangers project I decided that it would be nice to do something similar for for the museum which is absolutely for everyone. So, today we are launching Robot Rothko which is exactly the same as the application described above except that it uses objects from our collection instead of photos as its source material. Like this:

https://collection.cooperhewitt.org/play/robot-rothko/#info

 

See the #info part of that URL? That will cause the application to load with an information box that explaining what you’re looking at (and that will close itself automatically after 30 seconds). If you just want to jump straight to the application all you have to do is remove the #info from the URL.

https://collection.cooperhewitt.org/play/robot-rothko/

Robot Rothko will automatically update itself using random object records to create a new multiform every 60 seconds. Mouse over any color to see the object it represents. Click on the text to see our collection record for the object itself.

20140707-robot-rothko-decade

You can also filter stuff by person, decade. You can also filter by the year we acquired an object if you can guess where it is; that one still feels a little buggy so we’re going to hold off publishing the URL until we can figure out what’s wrong. Here are some examples of the first two:

https://collection.cooperhewitt.org/play/robot-rothko/people/18046041

https://collection.cooperhewitt.org/play/robot-rothko/decade/1910

Robot Rothko is native to the web which means it will work in any modern web browser whether it’s on your desktop or your phone or your tablet. It can be put it to fullscreen mode (by pressing shift-F) and if you save the website’s URL to your homescreen on your phone, or tablet, it is configured to launch without any of the usual browser chrome. If you use a Mac you can plug the URL for Robot Rothko in to Todd Ditchendorf’s handy Fluid.app which will turn it all in to a shiny desktop application. I am guessing there are equivalent tools for Windows or Linux but I don’t know what they are.

20140707-robot-rothko-tablet

If you’d like to generate your own Robot Rothkos there’s an API method for doing just that:

https://collection.cooperhewitt.org/api/methods/cooperhewitt.play.robotRothko

And of course it works with our recently announced support for DSON as a response format:

curl -X GET 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.play.robotRothko&access_token=SEEKRET&person_id=18041501&format=dson'
such "rothko" is such "canvas" is so "49" and "28" and "23" many and "palette" is so such "colour" is "#b8ab5b" , "id" is "18805769" , "epitaph" is "Folding Fan, 1900u201305. Medium: silk, wood, horn, metal, metal spangles. Gift of Lillian C. Hart. 1985-89-1." wow ? such "colour" is "#c7c7c7" . "id" is "18640557" ! "epitaph" is "Drawing, "Two Studies for Rectangul", ca. 1965. Pen and black ink on white wove paper. Gift of Vladimir Kagan. 1992-56-7." wow , such "colour" is "#db8952" , "id" is "18133219" , "epitaph" is "Fragment, mid-18th century. Medium: silknTechnique: plain weave patterned by supplementary warp floats and complementary weft floats. Gift of John Pierpont Morgan. 1902-1-811." wow many ? "background" is such "colour" is "#c7a9af" . "id" is "18761047" ! "epitaph" is "Booklet Cover Sheet, 1916. Color woodcut on lavender wove paper paper. Museum purchase from Drawings and Prints Council Fund and through gift of Margery and Edgar Masinter and Merrill C. Berman. 1999-50-1-3." wow wow , "filters" is so many and "stat" is "ok" wow

Robot Rothko lives in a new section of the collections website called “Play“. The distinction between the Play section and the Experimental Features section of the website can probably be easiest thought of as: Experimental features are things that apply to the entirety of the collections website, while Play things are small contained applications that use the collections API and focus on or build off a particular aspect of the collection. The first of these was Sam Brenner’s SkyDesigner and Robot Rothko is actually the third such application.

20140707-wwms-boom

In between those two was What Would Micah Say? (WWMS) a quick end-of-day project to test out the W3C’s Text-to-Speech APIs that are starting to appear in some web browsers (read: Chrome and Safari as of this writing, and make sure you have the volume turned up). The WWMS “application” was mostly a simple 20-minute exercise to test whether fetching some content dynamically and feeding to the text-to-speech APIs actually works and produces something useable. It does, which is very exciting because it opens up any number of accessibility-related improvements we can starting thinking about adding to the collections website.

That we happened to use the cooperhewitt.labs.whatWouldMicahSay API method and then configured the text-to-speech API to read his words as if spoken by a “French” robot made it all a little bit silly and a little more fun but those are important considerations. Because sometimes playing at – or making interesting – a technical problem is the best way to work through whether it is even worth pursuing in the first place.

20140707-robot-rothko-girard-2