Tuesday, June 26, 2007

Custom Search Engine APIs

Posted by: Matt Wytock, Software Engineer

A couple of weeks ago we blogged about a new feature and a new kind of Custom Search Engine (CSE) that you could create on the fly. Today, we thought we'd dig a bit deeper and describe the underlying infrastructure that powers this. With our new Linked CSEs, we are exposing the API to create and control CSEs.

Until now, you created a CSE either by using the wizard or by writing an XML file and uploading it to Google (via the "Advanced" tab on the control panel). To change any aspect of the CSE, you had to either use the control panel or upload the new XML specification. This imposed several limitations:

  • Creating and maintaining a CSE was a manual process.
  • It was difficult to create a large number of CSEs.
  • It was difficult to use other data sources such as iCal, RSS, Google Base, etc. to programmatically create CSEs.

The search box code for these CSEs (found on the "Code" tab in the control panel) includes a "cx" parameter with every search request (for example, <input type="hidden" name="cx" value="005946352831473999820:qs1idu8ptku" />), which specifies an internal identifier for the CSE.

Linked CSEs overcome these limitations. In short, you can now specify your CSE using a "cref" parameter that points to a URL, anywhere on the web. You update this URL at your end and don't have to upload it or edit your CSE using our tools. The URL can take arguments to produce dynamic CSEs, based on the current page, the current user visiting your site, etc. You can see this in action on our "on the fly" demo page: when you type "http://www.cs.berkeley.edu/~russell/ai.html" in the form text field, the javascript on that page constructs a "cref" parameter that contains http://www.google.com/cse/tools/makecse?url=http://www.cs.berkeley.edu/~russell/ai.html. This URL (visit it!) contains an XML specification for a CSE. You can use any script you want, or reference a static file, when creating your CSE. And there's nothing special about our makecse example script: we're hoping that our developers and the developer community will build many other such CSE-generating tools.

How does this work? With Linked CSEs, you designate a CSE specification URL with each search request (as a hidden form field in your search box HTML code). Google retrieves the CSE specification from the URL when your user searches in the CSE. We cache and refresh the results so that only the first search to your CSE incurs any delay. The flexibility to specify how your search engine should behave, just when your user is doing the query, using whatever data sources you want, opens up many possibilities:

You can test any Custom Search Engine XML by going to http://www.google.com/coop/cse/cref and entering the URL. Putting a search box on your site is as easy as copying a small bit of HTML code and modifying the "cref" parameter.

Linked CSEs are a very big step for Google Custom Search. We hope you will find them as cool as we do. As always, thank you for your support and keep the feedback coming.