I was reading Scott’s post on WASP (gotta install that, I’m a busybody, too!). And he talks a little about how some of the big sites don’t use online analytics (services like Google Analytics, Omniture and CoreMetrics) - presumably they do log file analysis. It reminded me of a talk I recently had with some guys at another major website who also have their own proprietary log analysis tools. So I wondered a bit about the pros and cons.

Here’s the thing, for dead right on accuracy it is most likely not possible to beat log file analysis. If the website served it it’s going in the log. Placing a javascript or image based beacon on your site can fail for a wide variety of reasons - not enough time spent on the page to make the call, javascript disabled, javascript incompatible, some wierd firewall/adware type blocker - a lot of things could potentially interfere with beacons on the page. So the log files give you the opportunity to be more accurate - but the trick is actually getting the level of accuracy up.

The obvious problem with log files is that you get a lot of traffic from robots and spiders - these shouldn’t be counted toward your analytics numbers, but it can be increasingly tough to weed them all out. There are a lot of spiders now and more springing up every day. Online beacons skirt this problem by taking advantage of the fact that spiders don’t execute the javascript at all - so they don’t even have to worry about them. There’s various tricks one can do, using cookies but cookies won’t appear in your log until after the first hit of that user - they actually have to hit your page before you can cookie them so that first page will not show their information. In this day and age, first visits are also often last visits. So you need to be clever about such things.

Firefox’s precaching is another example of log file difficulty. Firefox will grab a page because there’s a link to it on the page you a currently viewing - if you click that link, it comes up right away, if not - no big deal for you - but that went right into the logs. That’s a really hard line to know to disregard since it looks and feels just like a firefox view. Again, javascript neatly skirts the problem.

The other advantage that online analytics have is that they tend to be run by folks with lots of computing power and the stats you get are close to real time (if delayed by a few hours for some services). This is a real boon as you can see how things are doing during the course of the day and react to that. Log analysis doesn’t inherently prevent you from knowing this information, but most big analysis I’ve heard of (especially for the big sites) happens once a day - so you only find out how you did the day after.

If you’re a big organization,, have development resources to devote to analysis and need to suss out really detailed information about the usage of your website - log analysis could be the ticket. You can log a lot of interesting information and develop code that understands the nature of your urls to give you a really specific understanding of your website. If that’s not you I don’t really see any advantage to using a standard log analysis package over something like Google Analytics. The numbers are going to be wildly different with online analytics generally being significantly less than any log analysis package and you’ll have to believe one. i’m guessing that the Google numbers are closer to true than log analysis, just because there’s far too many variables for them all to take care of.

← newer Breakfast Links: iPhone Upgraded, Soprano’s Prop Sale & Magic Eye Tetris  ↑  Weather forecasts 10% chance of mushroom cloud older →

TwitterCounter for @nybble73