Web Metrics are hard
So online metrics are hard, there’s no question about it. Some are harder than others (oh, say.. time spent on site as an example), but they are generally difficult. Even the most basic metric, hits/page views, is difficult and building on top of that the notion of counting visits is more difficult and unique visitors the most perilous of the three.
The problem with counting hits is that there’re all sorts of critters now visiting your site that aren’t really people. All sorts of spiders and crawlers and robots that shouldn’t be counted as part of your legitimate ad loving traffic. Logfile analysis is one way to pull this data - it has all the information of everything your webserver has ever served up. The rub is that it is tough to determine real from automated - looking at User Agents works somewhat, but it is nearly impossible to maintain a list of every known robot out there and then to keep up with them. Getting most of them right, especially, the big ones, might be close enough for some. On the other hand, web beacons (like Google Analytics and Omniture) use javascript to count your pageviews live. This has the advantage of never counting robots (since robots, to my knowledge, never interpret javascript) but may miss pageviews if the browser doesn’t run javascript, has javascript turned off or the user simply clicks to another page before the beacon executes.
Counting unique visitors is even more difficult. Assuming that you were counting your pageviews correctly, it’s remarkably difficult to accurately determine who one visitor is. The most enduring example of this is the user who surfs from home and at work - there is really no way to generically determine that they are and I’d guess every analytics package, online or off, double counts them.
So, as you can see, pulling those numbers reliably is a feat. However, in general, I think that most web beacons are very close to the truth on this. I won’t say the same for most log analysis, but I don’t have a breadth of experience with these so there very well may be many that get it right. But check out this piece from the nytimes that talks about how ComScore and Nielsen’s numbers are radically different (and lower) than the publishers’ own numbers.
At first blush you figure that the independent auditors numbers must be more right, since it is obviously in the publishers’ favour to inflate their traffic numbers (thus making them more attractive to advertisers). But here’s the thing, ComScore and Nielsen use neither of the above methods of pulling this data, they’re using the old method of selecting a panel of users, installing some software on their systems to monitor where they go and then extrapolate the world of internet users from that base. This is literally the dumbest thing I’ve heard all day (sure the day is young, but I’m confident I’ll hear nothing dumber).
Selecting this panel gives all sorts of bias to the data. For an extreme example of this sample bias, look no further than the oft and justifiably maligned Alexa. These numbers are so demonstrably wrong it’s amazing. I can imagine that Nielsen has some better distribution, but there’s always going to be selection bias. Do you or anyone you know or anyone you know know anyone who is part of the Nielsen net ratings panel? Does that give you any confidence that you are being represented in their scores?
In this online world where all the data is there to pull real numbers I don’t understand why they would be doing panel sampling to estimate the true numbers, they are all there. Far from understanding their own fallibility or at least their weakness/strength when compared to these other numbers, they are attacking them assuming theirs to be the one true metric. Nielsen isn’t used to a world with metrics that compete with their own (and I suspect that if there were other tv metrics, you would find a similar difference). If they employed a combination of panel and beacon/log analysis and did some kind of super fancy PhD voodoo reconciling the two, that might provide the most interesting insight into traffic. If they partnered with Omniture, as an example, that could be a huge partnership in the metrics world. Unfortunately, they don’t seem to be interested in this.
Beyond even all that are the numbers that 3rd party advertising systems pull, like DoubleClick. They tell their users how many impressions each ad receives. This often diverges from how many impressions the publisher believes they’ve delivered because of the way impressions are counted. For example, DoubleClick stops counting impressions after 3 or 5 deliveries of an ad to the same user. A publisher who wasn’t doing that would continue counting and thus be over delivering by DoubleClick’s measures. The problem is that there is no standard.
As you can see there’s a lot of trickiness to pulling metrics. It is a little bit of the wild west out there with many competing standards. It would be good to get some kind of organizational body that had the clout to force some sort of uniformity on all these different services, but I doubt that that will happen anytime soon. Still, if I had to put my money on one of these horses, I’d bet that the big web beacons are the most accurate of the lot - handily beating log and panel analysis.
UPDATE: Well, as Kirkunit mentions below… Quantcast is one pioneer of the hybrid model. And it’s great. I’d sure like to see more people enter that game, though.








October 22nd, 2007 at 12:15 pm
Nice, meaty post.
Don’t be so quick to sell the concept of random sampling short — it’s mathematically rigorous and the people behind it are both serious and seriously smart. Random sampling gives us remarkably accurate predictions of election returns, and it gives us reliable statistics that would be impossible to count accurately, such as the recent Lancet article estimating the number of civilian deaths in Iraq since the invasion.
However, in the two cases I cite, the result set is quite limited — a household is going to vote for one of a small set of candidates; or a household has either had somebody die or it hasn’t. With a range of results as deeply variable as “what websites did you visit today” Nielsen and comScore seem to fall short. I’m making a wild guess that the size of their sample just doesn’t work with that much variability — especially when it comes to smaller sites, niche/regional/local sites, etc. Of course, there are ways to calculate the confidence of your numbers, and I’m betting the people at Nielsen didn’t have to take Stats 101 twice to pass it like I did.
Hitwise uses a much larger sample — as I understand it, Hitwise samples traffic from partner ISPs and therefore has millions of people in their sample. Nielsen disagrees with the non-randomness of the sample, Hitwise disagrees with Nielsen’s sample size/make-up, etc.
A nice middle ground is Quantcast. They use Hitwise-like ISP relationships for baseline numbers, then invite sites to install beacons to “correct” their entries. This is a compelling model because unlike Nielsen/comScore, which charge exorbitant rates to advertising agencies for their data (and therefore have an economic incentive to keep their results and their methodologies a secret), Quantcast is built on openness — Everybody can see the top-line numbers, and publishers have a way of knowing how Quantcast is representing their traffic, and have a way to true up the numbers so they match with their own internal research.
By the way, I don’t work in TV, but I bet this same set of problems is going to start cropping up as TV acts more like the Internet. Have the Nielsen samples gotten big enough to handle the 400 TV channels we now have? My cable company knows what channels I watch — How do Nielsen’s numbers compare to my cable company’s numbers?
October 22nd, 2007 at 1:07 pm
That’s a really good point. I think the panel method works, as you say, when there are very few choices that the panel is deciding among. There aren’t any niches. But with the web (and TV, too) niches are everywhere.
And I agree again with you - I’m sure that this problem already exists in TV but because right now there’s only one game in town we don’t know it. Look at the movements to save “unwatched shows” like Jericho and going way back, Party of 5. Clearly there was a large audience that simply was not represented in the ratings.
Notice how these disputes are between the providers of metrics, not by their consumers. As I say, the consumers don’t care if they are real or not as long as all their competitors believe the same fiction the playing field is level.
October 22nd, 2007 at 1:30 pm
It’s interesting to me that the publishers, who are accused of having an economic interest in reporting a bigger audience, cite all sorts of proof behind their numbers — registered user counts, log files, beacon stats, etc. while the big research companies tend to rely on their reputation and on the secretiveness of their methodology. I’m not sure which side the advertisers, on the whole, believe more.
A sad truth for small and medium sites is that, as a practical matter, advertisers can’t look at every website’s media kit to figure out where to buy ads; they have to rely on people like the big research houses to find good sites on which to advertise, and if they’re misrepresenting your site’s traffic (or not representing your site at all), you’re disqualified before the game even starts.
By the way, I didn’t mean to imply that random sampling itself “doesn’t work” with such a variety of choices; just that the sampling methodology probably needs to be updated.
If random sampling itself is flawed, human progress is screwed. It’s how we organize our democratic institutions, how we test the safety and efficacy of medicine, etc.
October 22nd, 2007 at 1:35 pm
I should have also added that the actual practitioners of “web analytics” have long since moved on from this particular debate. The really interesting stuff about analyzing web traffic is in visitor engagement. People like Eric Peterson are working this kind of stuff out, like, today.
October 22nd, 2007 at 1:56 pm
I think that random sampling works as well, but as the number of options grows, so necessarily the size of the sampling needs to grow. And I’m no statistician, so I’m probably wrong, but it would seem like there would be an exponential relationship there as you need to ensure the capture ever increasing numbers of niches. For the web especially, I’m not really clear how practical this is if you care about even medium sized sites.
The visitor engagement is an interesting concept. But that, to me, doesn’t compete with the metrics discussed above. That is, this model he proposes, seems - necessarily - an internal model that doesn’t allow comparison with other people models of their sites. It is subjective and it’s power comes from your ability to adapt it to make sense of your own particular set of circumstances. So it could be a useful to to optimize your own site, but not as a comparative metric with others.
At least, that’s what it seems like to me.
November 16th, 2007 at 10:47 am
[...] as far as I know, there aren’t really any widely available ones that had good information. Quantcast would be one, but very few sites actually use it. So I hunted around EatonWeb’s site and [...]
May 14th, 2008 at 10:38 am
[...] Online metrics are hard enough when you’ve limited it to just a specific site with all the help the metric measurer could want. It becomes orders of magnitude harder to try and figure out this information internet wide. It seems like Mozilla is planning on stepping into that ring. [...]
August 6th, 2008 at 10:45 am
[...] situation. Reporting on RSS subscriptions is an approximate business at best. You know how general web metrics are really hard to get? And that’s with all the benefits of browser sophistication and what not? RSS is even [...]