A couple of weeks ago we blogged about a new feature and a new kind of Custom Search Engine (CSE) that you could create on the fly. Today, we thought we'd dig a bit deeper and describe the underlying infrastructure that powers this. With our new Linked CSEs, we are exposing the API to create and control CSEs.
Until now, you created a CSE either by using the wizard or by writing an XML file and uploading it to Google (via the "Advanced" tab on the control panel). To change any aspect of the CSE, you had to either use the control panel or upload the new XML specification. This imposed several limitations:
- Creating and maintaining a CSE was a manual process.
- It was difficult to create a large number of CSEs.
- It was difficult to use other data sources such as iCal, RSS, Google Base, etc. to programmatically create CSEs.
The search box code for these CSEs (found on the "Code" tab in the control panel) includes a "cx" parameter with every search request (for example, <input type="hidden" name="cx" value="005946352831473999820:qs1idu8ptku" />), which specifies an internal identifier for the CSE.
Linked CSEs overcome these limitations. In short, you can now specify your CSE using a "cref" parameter that points to a URL, anywhere on the web. You update this URL at your end and don't have to upload it or edit your CSE using our tools. The URL can take arguments to produce dynamic CSEs, based on the current page, the current user visiting your site, etc. You can see this in action on our "on the fly" demo page: when you type "http://www.cs.berkeley.edu/~russell/ai.html" in the form text field, the javascript on that page constructs a "cref" parameter that contains http://www.google.com/cse/tools/makecse?url=http://www.cs.berkeley.edu/~russell/ai.html. This URL (visit it!) contains an XML specification for a CSE. You can use any script you want, or reference a static file, when creating your CSE. And there's nothing special about our makecse example script: we're hoping that our developers and the developer community will build many other such CSE-generating tools.
How does this work? With Linked CSEs, you designate a CSE specification URL with each search request (as a hidden form field in your search box HTML code). Google retrieves the CSE specification from the URL when your user searches in the CSE. We cache and refresh the results so that only the first search to your CSE incurs any delay. The flexibility to specify how your search engine should behave, just when your user is doing the query, using whatever data sources you want, opens up many possibilities:
- You can use our makecse tool to generate CSEs from different sources of links:
- You can combine multiple sources of links using our makeannotations tool and the <Include> tag. For example, its easy to create a search engine from the links on the front pages of techmeme, slashdot and digg.
- You can write your own tools to produce <Annotations /> XML from other data sources such as Google Calendar or iCal feeds, Google Base or any other structured source of information.
- You can automatically generate any number of CSEs, each possibly tuned to a particular user. For example, we've created a sample that builds a CSE from a user's digg.com friend network and submissions using the Digg API. Try it out and view the source. This makes use of two simple python CGI scripts:
- diggannos.py generates <Annotations> from the specified user's submitted stories
- diggcse.py generates <GoogleCustomizations> from the specified user's friend network. For each friend, it generates an <Include> element pointing to the appropriate diggannos.py URL
- diggannos.py generates <Annotations> from the specified user's submitted stories
You can test any Custom Search Engine XML by going to http://www.google.com/coop/cse/cref and entering the URL. Putting a search box on your site is as easy as copying a small bit of HTML code and modifying the "cref" parameter.