Programmable Search Engine Blog
The latest news, updates and tips from the Programmable Search Engine team
Specifying patterns for your Custom Search Engine
Thursday, February 21, 2008
Posted by: Vrishali Wagle, Software Engineer
Creating a basic Custom Search Engine (CSE) is very easy. You enter a list of sites, select a few basic preferences, and you are done, right? But in fact there's more to Custom Search -- consider it a very powerful way of building your own search engine on top of Google search. You can exclude sites, add labels for drill-down and even change the ranking of results for your search engine. In this blog post, we look at the basic element of Custom Search -
URL patterns
URL patterns
specify the part of the web you want to search or exclude from your search. Custom Search is based on approximation algorithms that use these patterns to give you your customized results.
Consider the "
I Love Veggies
" search engine that we created. Here's how the "
I Love Veggies
" search engine made use of patterns effectively:
Be very specific. Use the longest possible pattern for specifying a site. For example, in the "I Love Veggies" search engine, we wanted to search all of www.goveg.com, so we added "www.goveg.com/*" as a pattern. But we wanted to search only the vegetarian part of the "allrecipes.com" site. So instead of adding all of "allrecipes.com/*" we added the more specific "allrecipes.com/Recipes/Everyday-Cooking/Vegetarian/*".
Specify multiple pages in a site with a "*" at the end of the pattern. If you specify just "www.goveg.com", Custom Search will search just the single page http://www.goveg.com. You need to remember this only if you are write your XML file of annotations directly. If you are using the Control Panel, it automatically adds the "/*" at the end for you, unless you indicate otherwise.
Sometimes, you might have a few hosts on a domain with the same path that you want to search. In our example, we wanted to search "mideastfood.about.com/od/vegetarianrecipes/*" and "indianfood.about.com/od/vegetarianrecipes/*". In such a case it is better to specify these patterns individually instead of a very general "*.about.com/od/vegetarianrecipes/*" as more specific the patterns, better the approximation.
You can only use the * in the hostname at the beginning of the pattern and it can only represent a full token. For example, "*.about.com/*" is a valid pattern and so is "*.food.about.com/*". However, "*ood.about.com/*" is not valid, nor is "food.*.about.com/*".
Keep reading this blog for more tips and tricks as we develop our "
I Love Veggies
" search engine. If you have specific questions or feature requests you can visit our
Help Center
or ask a question on the
Discussion group
.
Archive
2024
Apr
2023
Dec
Oct
Jun
2022
Aug
Jul
Jun
Feb
2021
Apr
Jan
2020
Nov
Oct
Sep
Aug
Jun
Apr
2019
Jul
Jun
Apr
2018
Jul
2017
Jul
Feb
2014
Mar
2013
Dec
Oct
Sep
Mar
Jan
2012
Aug
Jun
May
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Useful links
Programmable Search Engine
Help Center
Support Forum
Developer Documentation
Google AdSense Blog