Finding search engine requests

"Some things Mankind was never meant to know. For everything else, there's Google."
On this page:

Search engine keywords script

The different log files created by your web server tell you who came to see your site (the IP), when they did, what they looked at but also why they came here. Nowadays most people turn to Google for any kind of internet access. And the referrer field of the access_log file contains the full google address that led to your site, including the search keywords used. A goldmine of information, and it works with most other search engines as well.

I know that most traffic analysis tools return search engine keywords as well, but they have severe drawbacks: some are payware, some are complex to install/configure, some return insufficient data (they don't know all the search engine names and query syntaxes). If you notice a new search engine, it takes only a few seconds to customize this script.

The following is the shell script I wrote to analyze those searches. It extracts search engine referrers from the log file (Apache access_log with combined referrer field, but can also work with MS IIS) and sorts the most searched pages, the most used search phrases and keywords.

It also keeps a list of already used keywords and gives you at the end the list of 'new' searches for that run. I run this script daily as a cron job and give a glimpse at the result some time during the day. It's a priceless indicator of why people come to your website, what they are looking for, and maybe what they are missing (if you see search keywords that are not actually relevant to your site). This list will also make you aware of groundbreaking news (or links) affecting your site, for instance if some big site just started linking to you from their front page, you'll know why your server just ground to a halt (this is called the slashdot effect).

It runs on any unix but also on my Win2000 PC in a batch file with cygwin and perl installed and very little modifications (%% instead of %, "" instead of ", " instead of ', different temporary files...).



Keywords for my site

The following is data I pulled from the referrer field of the log files. It shows which pages are found most by search engines and which search strings or keywords were typed by the curious user in order to arrive on my site. What is this wombat doing here ?!? It's searched more than penguins and I'm surprised it interests people more than penguins. Actually the most used keywords are not the most interesting: while I expect searches of "Antarctic Penguin" or "climbing wallpapers" to lead to my site, there are some really weird requests at the bottom of the pile. I also made the following discovery... I wonder in how many other categories I come out on top.

The script above is not meant for spamming search engines but for writing a better site. For instance when I discovered that most people were asking questions about Antarctica, I decided to answer the questions myself and wrote several FAQs about penguins, how to get to Antarctica, global warming and more... This short script is a very useful tool that allows me to taylor my website to what visitors really want.



Preventing image theft

On seeing the list of referrers, I noticed some abuse by blogs inlining my images directly ("stealing my bandwidth"), some popular blogs with thousands of daily access were using my large images as a background... The following sites are henceforth banned as of 2003/03/01: migente.com, ezboard.com, blackplanet.com, livejournal.com, geocities.com, blogspot.com, asianavenue.com, xanga.com, keenspot.com, neopets.com... Other blogs can keep inlining my images as long as they keep a low profile.

How do I disallow them you ask ? Very simple, on an Apache server, all I need to do is put the following in my .htaccess file in the main directory:

# Image theft prevention .htaccess file for Apache
SetEnvIfNoCase Referer "^http://([^/]*)board([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)chat([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)blog([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)forum([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)member([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)journal([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)diary([^/]*)/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)discuss([^/]*)/" spam_ref=1

SetEnvIfNoCase Referer "^http://([^/]*)migente\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)blackplanet\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)geocities\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)asianavenue\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)xanga\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)keenspot\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)neopets\.com/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)xyz\.cz/" spam_ref=1
SetEnvIfNoCase Referer "^http://([^/]*)eendje\.be/" spam_ref=1

<FilesMatch "(.jpg)">
	Order Allow,Deny
	Allow from all
	Deny from env=spam_ref
</FilesMatch>

And there's no way around it for the blogmaster except to ask his visitors to use a browser that doesn't use the referrer field... A more violent way to do this would be to ban image theft from all other website (but then nothing will show on the Google image search):

SetEnvIfNoCase Referer "^http://www\.yourwebsite\.com/" good_ref=1
SetEnvIfNoCase Referer "^http://www\.yourwebsite\.com/" good_ref=1
<FilesMatch "(.jpg|.gif|.png)">
	Order Deny,Allow
	Deny from all
	Allow from env=good_ref
</FilesMatch>

Forcing use of the www

And while we are talking about mod_rewrite in the .htaccess file, here's a way to change anything.website.com or website.com into www.website.com.

# Redirect website.com and anything.website.com to www.website.com
RewriteCond %{HTTP_HOST} !www.website.com
RewriteRule ^.*$ http://www.website.com%{REQUEST_URI}
#RewriteRule ^.*$ http://%{SERVER_NAME}%{REQUEST_URI}

No snooping for images

If you have directories that contain only images and don't want people to browse the entire directory, you can either disallow browsing in the server config file, which you may not have access to, or you can simply put the following index.html file in each directory:

<!DOCTYPE html>
<HTML>
<HEAD>
<TITLE>No snooping for images !</TITLE>
<META HTTP-EQUIV="Refresh" CONTENT="1; URL=../">
<META HTTP-EQUIV="Expires" CONTENT=now>
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
</HEAD>
<BODY>
<A HREF="../">Sorry, no snooping around allowed !</A>
</BODY>
</HTML>


Google tricks

Search for a specific filename
"Index +of" filename where the quotes are relevant will look into publicly served directories. For instance "Index +of" Alpha-HOWTO.pdf
Search for plenty of files
"index of /" directory modified keywords or +"Index of" +filename.exe +"parent directory" and look down the list...
Full word wildcard
Use "something * somethingelse" where the quotes are relevant. For instance "How to * a penguin". Note that you cannot use it to search for incomplete words (like helico*) or at the end or start of a sentence.
Boolean search
Google accepts a boolean syntax with '-', 'OR' and parentheses: download game (chopper OR choplifter OR helicopter) -crash
What is the most popular site of all&mbsp;?
Just search for http
Looking for a list ?
Use Google sets, like thyme/sage/parsley to figure out what to put in the next recipe or to find other bands you'll like.
Looking for a definition ?
Use the define: option like in define:http
Searching for synonyms with the '~' prefix
For instance linux ~tutorial
The '.'
Use search.for.this instead of "search for this" for a little less typing
Local search
Use local.google.com to search locally. Type in your zip code and wifi for instance.

Strange but true web searches

There are some really funny (or pathetic) keyword combinations used by people to arrive on my site; are the search engines on crack or what ? I didn't make those up, so culprits' IPs and search engine names have been witheld to protect the guilty, but I've kept the (mis)spelling. And since they led to my site, I consider it's the same as asking me directly, so here are a few answers...

Where can I buy a penguin?
Why don't you ask your mom to make you a little brother instead ? For a while it will be just as smelly and noisy.
hot chick
hairy chick
Sorry, penguins here only. Disappointed ?
pichers of pengwines
Never heard of that wine. I'll bring you a pitcher of Chateau Pétrus 1976 instead.
Do Penguins die when they fall over?
No. Do you ?
do people in antarctica shower?
Hmmm, are you sure you want an answer to that one ?
How penguin survive in the arctic
They didn't, there aren't any anymore... C:-(
how to keep a pizza warm in antarctica
In your stomach
how People help Leopard seals
Mostly by staying too close from them when they are hungry...
screwed a penguin
Forget it, their feet are too cold and they have fish on their breath. And seals are much more sexy.
pictures of sex with fram animals
The Fram was the sailboat used by Amundsen to explore Antarctica. I really don't know what they were doing with the dogs and donkeys they had on board but none of them made it back alive...
women having sexual intercourse with dogs for pleasure
You might be more lucky if you leave out at least the last 2 words in your request... I know, you are wondering why there are women willing to have sex with dogs but not with you ! Sad, heh ?
how many times a day do men think about sex
I could probably answer you but I'm busy thinking about... something
how many times in a minute do men think about sex
Now we're talking!
if the world is a cube what would happen to the weather
Err, and the point of the question is...?
teach me
Err, yeah, OK, as soon as I have a minute, OK ?
where can i find five minute speeches that would be humorous to teenage boys?
Hmmm, how about writing one. If you care the least.
what is kangaroos real name?
Well, there's one called Taz in some cartoons, but I think his actor's name is different. You should check www.imdb.com.
How Often Are Animals Eaten
Daily. Sometimes more.
weird ugly
Hey! Thanks!!!
stole al capone bike god
I'm still wondering about this one
what is meaning of french phrase che che la femme?
'chercher', and I see you are still looking...
young pussy being fucked til it bleeds
What about searching for "English grammar", "sensitivity training", "being nice when meeting people"...?
how do woman walk after sexual intercourse
Well, if you've been really good they might walk with their legs apart, otherwise they might be more like running away...
what can I do to make my woman a nymphomaniac?
Take showers ? Pop your zits ? Be nice ? Get a life ? Purchase a personality ? That will already improve your chances a lot...
taking a shower?
You have to look on internet to learn how to do that !?!
what is an itch?
Stop washing for a month and you'll figure it out by yourself.
how to determine womens sizes and weight?
OK, there are things called meters and scales that are used for that purpose, but they involve the use of numbers, which might be just a wee bit confusing. Or you might just ask, but it might prove dangerous.
where can I find quotes about men who cheat on their girlfriends?
I see you are getting prepared...
how do you deal with arseholes for husbands
Preparation H
how do you get your back hole so big for sex?
Hmmm, let me get a mirror, I think I need to check my back...
how long after tonsillectomy oral sex
You should ask your surgeon, I'm sure it'll make his day
how do you ask a soldier to be the last person to die for a mistake?
Politely.
how do you know if your hamster is having sex
The little smile on his little face
how many meters can a grizzly bear run in one second
More than you anyway...
how many dinosaurs were found in between 1992-2004
None. I'm pretty sure.
how many faces are carved on Mt. Rushmore?
5, there's also a self portrait from Billy-Bob at the base.
how do you know when you don't have to buy her anymore drinks?
Because she just finished the BJ
how do you know when you've had really good sex
You are smoking, and it's not a cigarette
how long do i have to lay here after sex
Until he falls asleep...
how do you impress a man?
Show up naked. Bring beer.
hitting your nuts on a pole
I wonder if this has anything to do with climbing...
How fast do cuts to the penis heal?
I told you not to play with that meat cleaver!
how do you pronounce guillotine
Ciak!
you don't have to be beautiful to turn me on I just need your body baby you don't need expirience to turn me out I show you all about you have to be rich to be my girl lyrics
See, you already know the entire lyrics to the song, so why look for it ?
Should I date an engineer?
Absolutely. Didn't you know that engineers are the best lovers of all, the most romantic, entertaining, best husbands, best fathers... Did I forget something ?
how many odd number are there?
As many as prime numbers...
My girlfriend won't stop eating chocolate. She's getting fatter every day
Sucks to be you dude... Cover yourself in chocolate, maybe ?
does a full floppy disk weigh more than a blank one?
Yes. That's why you should reformat all your floppies before moving. They will be easier to transport.
How is the efficiency of the computer affected if the CPU breaks?
Well, you should start by reading 'Computers for Dummies'. No, make that for imbeciles.
how much to put mega bites on a computer
You first have to find someone who's really hungry
How much for a computer with all the accessories and herd drives
'Herd drive' ? Must be some kind of groupware...
why do men's feet smell worse than womens?
You obviously haven't met Jenny after a week of climbing in the desert... But then again mine are surely worse. Hey, good question !
what can I eat to freshen my breath after drinking wine?
Garlic ? Fresh tripe ? Gorgonzola ? A year old egg ? A couple of thoroughly chewed cigarettes ?
how do you tell if an egg is fresh when you live on top of a mountain
If it's broken and spilled all over you backpack, then it may have been fresh.
where can i open a virtual fortune cookie?
In a virtual chinese restaurant, but their virtual chinese food is terrible, so I don't recommend it.
How many calories are in one cup of lazagna
In italy they eat it in plates. Cups are for their espresso. And it's spelled with an 's'.
how many snails france drinks in a year
Now you're confused. We eat them. It's stupid frat boys who drink tequilla worms.
how to freeze dry moral mushrooms
Now that I think of it, I know at least one immoral mushroom...
what to do if your hard drive stops working?
Cry.
how to fix computer of system hated without dick
?!?
how to fix sex object in computer
In C++:
#include <sex.h>
sex.fix();
Do you Care If I Dont Know What 2 Say
Does the time you save by abbreviating 'To' to '2' makes it worth capitalizing every word in your sentence ?
where can I find a free spring wallpaper?
They are not very convenient, they bounce off the walls.
jeffrey dahmer wallpaper
I don't even want to start thinking about it... Argh !
maradona lobotomy
Go ahead.
what does pericolo mean in english?
Watch o... Oooops, too late.
how to like french people?
Why don't you take a trip to France ? Hmmm, no maybe not a good idea. Why don't you invite some over ? Hmmm, now that I think about it, that may be a hard question... Just send in a check and you can like me for a while.
where can i see pictures of climbers taking a fall?
Not here, go to joesimpson.com...
what does a computer engineer do?
Downloads porn all night and tries to make a witty website during the day.