request and uses the above specification and the supplied query. You can simulate this process by visiting:
A different specification is generated for every Wikipedia page (based on
url) by a tiny
AppEngine application at
http://googlecustomsearch.appspot.com. The specification defines a search engine with two facets, labeled "internal" (
Linked Wikipedia pages) and "external" (
Linked non-Wikipedia pages). The list of "internal" (and "external") webpages to search over is provided by this line in the specification:
<Include href="http://googlecustomsearch.appspot.com/wikipedia/annotations.do?url=en.wikipedia.org%2Fwiki%2FNASA" type="Annotations"/>
This causes Google to visit the webapp at a new URL (
annotations.do). Our webapp now collects links from the NASA article, classifies them as "internal" or "external", and returns the annotations in an XML format. You can see the result at (
view source in browser)
Now Google can finish building the Custom Search engine for the NASA article, and compute the results for [mars]. The results are returned to your web browser and displayed in the appropriate tab.
But wait! Our little AppEngine webapp doesn't have the CPU horsepower or bandwidth to scan Wikipedia pages on-demand or in nearly-real-time for thousands of Wikipedia users. Instead, the webapp asks Google to scan the page, via a Custom Search tool called
makeannotations. The request looks something like this:
After
makeannotations returns the list of links in the NASA article in XML, the webapp simply rewrites the XML according to the domain of each link.
Since we are creating the per-page search engines on demand, there can sometimes be a short delay in the creation of the search engine, e.g., for new or obscure pages. However, for popular Wikipedia pages, these definitions should be cached, and you should see no delays. In fact, we use a
ping method to load up the Custom Search engine in advance before you search. Remember that if there are not many links on the Wikipedia page you are searching from, you may sometimes find no matches for linked pages.
We've
open sourced the code for this application. Feel free to work with it. Feel free to extend the skin beyond Monobook and Vector. We built this skin with the help of
Wikipedia, and hope that you will provide
feedback on your experience. You can also provide your feedback directly to
Wikipedia.
Posted by: Paul Komarek, Software Engineer and Jeffrey Scudder, Developer Programs Engineer