<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Web Metrics are hard</title>
	<atom:link href="http://comments.deasil.com/2007/10/22/web-metrics-are-hard/feed/" rel="self" type="application/rss+xml" />
	<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/</link>
	<description>escape colon w q</description>
	<pubDate>Wed, 07 Jan 2009 18:25:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Don&#8217;t blame Feedburner/Google, it&#8217;s Netvibes&#8217; fault</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-12117</link>
		<dc:creator>Don&#8217;t blame Feedburner/Google, it&#8217;s Netvibes&#8217; fault</dc:creator>
		<pubDate>Wed, 06 Aug 2008 14:45:44 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-12117</guid>
		<description>[...] situation. Reporting on RSS subscriptions is an approximate business at best. You know how general web metrics are really hard to get? And that&#8217;s with all the benefits of browser sophistication and what not? RSS is even [...]</description>
		<content:encoded><![CDATA[<p>[...] situation. Reporting on RSS subscriptions is an approximate business at best. You know how general web metrics are really hard to get? And that&#8217;s with all the benefits of browser sophistication and what not? RSS is even [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mozilla&#8217;s data project could be useful, but still biased.</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-10736</link>
		<dc:creator>Mozilla&#8217;s data project could be useful, but still biased.</dc:creator>
		<pubDate>Wed, 14 May 2008 14:38:47 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-10736</guid>
		<description>[...] Online metrics are hard enough when you&#8217;ve limited it to just a specific site with all the help the metric measurer could want. It becomes orders of magnitude harder to try and figure out this information internet wide. It seems like Mozilla is planning on stepping into that ring. [...]</description>
		<content:encoded><![CDATA[<p>[...] Online metrics are hard enough when you&#8217;ve limited it to just a specific site with all the help the metric measurer could want. It becomes orders of magnitude harder to try and figure out this information internet wide. It seems like Mozilla is planning on stepping into that ring. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PageRank, worth it or not?</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-7073</link>
		<dc:creator>PageRank, worth it or not?</dc:creator>
		<pubDate>Fri, 16 Nov 2007 15:47:01 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-7073</guid>
		<description>[...] as far as I know, there aren&#8217;t really any widely available ones that had good information. Quantcast would be one, but very few sites actually use it. So I hunted around EatonWeb&#8217;s site and [...]</description>
		<content:encoded><![CDATA[<p>[...] as far as I know, there aren&#8217;t really any widely available ones that had good information. Quantcast would be one, but very few sites actually use it. So I hunted around EatonWeb&#8217;s site and [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: felix</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6412</link>
		<dc:creator>felix</dc:creator>
		<pubDate>Mon, 22 Oct 2007 18:56:55 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6412</guid>
		<description>I think that random sampling works as well, but as the number of options grows, so necessarily the size of the sampling needs to grow. And I'm no statistician, so I'm probably wrong, but it would seem like there would be an exponential relationship there as you need to ensure the capture ever increasing numbers of niches. For the web especially, I'm not really clear how practical this is if you care about even medium sized sites.

The visitor engagement is an interesting concept. But that, to me, doesn't compete with the metrics discussed above. That is, this model he proposes, seems - necessarily - an internal model that doesn't allow comparison with other people models of their sites. It is subjective and it's power comes from your ability to adapt it to make sense of your own particular set of circumstances. So it could be a useful to to optimize your own site, but not as a comparative metric with others.

At least, that's what it seems like to me.</description>
		<content:encoded><![CDATA[<p>I think that random sampling works as well, but as the number of options grows, so necessarily the size of the sampling needs to grow. And I&#8217;m no statistician, so I&#8217;m probably wrong, but it would seem like there would be an exponential relationship there as you need to ensure the capture ever increasing numbers of niches. For the web especially, I&#8217;m not really clear how practical this is if you care about even medium sized sites.</p>
<p>The visitor engagement is an interesting concept. But that, to me, doesn&#8217;t compete with the metrics discussed above. That is, this model he proposes, seems - necessarily - an internal model that doesn&#8217;t allow comparison with other people models of their sites. It is subjective and it&#8217;s power comes from your ability to adapt it to make sense of your own particular set of circumstances. So it could be a useful to to optimize your own site, but not as a comparative metric with others.</p>
<p>At least, that&#8217;s what it seems like to me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kirkunit</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6411</link>
		<dc:creator>kirkunit</dc:creator>
		<pubDate>Mon, 22 Oct 2007 18:35:51 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6411</guid>
		<description>I should have also added that the actual practitioners of "web analytics" have long since moved on from this particular debate. The really interesting stuff about analyzing web traffic is in visitor engagement. People like Eric Peterson are &lt;a href="http://blog.webanalyticsdemystified.com/weblog/2007/10/how-to-measure-visitor-engagement-redux.html" rel="nofollow"&gt;working this kind of stuff out&lt;/a&gt;, like, today.</description>
		<content:encoded><![CDATA[<p>I should have also added that the actual practitioners of &#8220;web analytics&#8221; have long since moved on from this particular debate. The really interesting stuff about analyzing web traffic is in visitor engagement. People like Eric Peterson are <a href="http://blog.webanalyticsdemystified.com/weblog/2007/10/how-to-measure-visitor-engagement-redux.html" rel="nofollow">working this kind of stuff out</a>, like, today.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kirkunit</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6410</link>
		<dc:creator>kirkunit</dc:creator>
		<pubDate>Mon, 22 Oct 2007 18:30:27 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6410</guid>
		<description>It's interesting to me that the publishers, who are accused of having an economic interest in reporting a bigger audience, cite all sorts of proof behind their numbers -- registered user counts, log files, beacon stats, etc. while the big research companies tend to rely on their reputation and on the secretiveness of their methodology. I'm not sure which side the advertisers, on the whole, believe more. 

A sad truth for small and medium sites is that, as a practical matter, advertisers can't look at every website's media kit to figure out where to buy ads; they have to rely on people like the big research houses to find good sites on which to advertise, and if they're misrepresenting your site's traffic (or not representing your site at all), you're disqualified before the game even starts.

By the way, I didn't mean to imply that random sampling itself "doesn't work" with such a variety of choices; just that the sampling methodology probably needs to be updated. 

If random sampling itself is flawed, human progress is screwed. It's how we organize our democratic institutions, how we test the safety and efficacy of medicine, etc.</description>
		<content:encoded><![CDATA[<p>It&#8217;s interesting to me that the publishers, who are accused of having an economic interest in reporting a bigger audience, cite all sorts of proof behind their numbers &#8212; registered user counts, log files, beacon stats, etc. while the big research companies tend to rely on their reputation and on the secretiveness of their methodology. I&#8217;m not sure which side the advertisers, on the whole, believe more. </p>
<p>A sad truth for small and medium sites is that, as a practical matter, advertisers can&#8217;t look at every website&#8217;s media kit to figure out where to buy ads; they have to rely on people like the big research houses to find good sites on which to advertise, and if they&#8217;re misrepresenting your site&#8217;s traffic (or not representing your site at all), you&#8217;re disqualified before the game even starts.</p>
<p>By the way, I didn&#8217;t mean to imply that random sampling itself &#8220;doesn&#8217;t work&#8221; with such a variety of choices; just that the sampling methodology probably needs to be updated. </p>
<p>If random sampling itself is flawed, human progress is screwed. It&#8217;s how we organize our democratic institutions, how we test the safety and efficacy of medicine, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: felix</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6409</link>
		<dc:creator>felix</dc:creator>
		<pubDate>Mon, 22 Oct 2007 18:07:07 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6409</guid>
		<description>That's a really good point. I think the panel method works, as you say, when there are very few choices that the panel is deciding among. There aren't any niches. But with the web (and TV, too) niches are everywhere.
And I agree again with you - I'm sure that this problem already exists in TV but because right now there's only one game in town we don't know it. Look at the movements to save "unwatched shows" like Jericho and going way back, Party of 5. Clearly there was a large audience that simply was not represented in the ratings.
Notice how these disputes are between the providers of metrics, not by their consumers. As I say, the consumers don't care if they are real or not as long as all their competitors believe the same fiction the playing field is level.</description>
		<content:encoded><![CDATA[<p>That&#8217;s a really good point. I think the panel method works, as you say, when there are very few choices that the panel is deciding among. There aren&#8217;t any niches. But with the web (and TV, too) niches are everywhere.<br />
And I agree again with you - I&#8217;m sure that this problem already exists in TV but because right now there&#8217;s only one game in town we don&#8217;t know it. Look at the movements to save &#8220;unwatched shows&#8221; like Jericho and going way back, Party of 5. Clearly there was a large audience that simply was not represented in the ratings.<br />
Notice how these disputes are between the providers of metrics, not by their consumers. As I say, the consumers don&#8217;t care if they are real or not as long as all their competitors believe the same fiction the playing field is level.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kirkunit</title>
		<link>http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6403</link>
		<dc:creator>kirkunit</dc:creator>
		<pubDate>Mon, 22 Oct 2007 17:15:38 +0000</pubDate>
		<guid isPermaLink="false">http://comments.deasil.com/2007/10/22/web-metrics-are-hard/#comment-6403</guid>
		<description>Nice, meaty post. 

Don't be so quick to sell the concept of random sampling short -- it's mathematically rigorous and the people behind it are both serious and seriously smart. Random sampling gives us remarkably accurate predictions of election returns, and it gives us reliable statistics that would be impossible to count accurately, such as the recent Lancet article estimating the number of civilian deaths in Iraq since the invasion.

However, in the two cases I cite, the result set is quite limited -- a household is going to vote for one of a small set of candidates; or a household has either had somebody die or it hasn't. With a range of results as deeply variable as "what websites did you visit today" Nielsen and comScore seem to fall short. I'm making a wild guess that the size of their sample just doesn't work with that much variability -- especially when it comes to smaller sites, niche/regional/local sites, etc. Of course, there are ways to calculate the confidence of your numbers, and I'm betting the people at Nielsen didn't have to take Stats 101 twice to pass it like I did.

Hitwise uses a much larger sample -- as I understand it, Hitwise samples traffic from partner ISPs and therefore has millions of people in their sample. Nielsen disagrees with the non-randomness of the sample, Hitwise disagrees with Nielsen's sample size/make-up, etc. 

A nice middle ground is Quantcast. They use Hitwise-like ISP relationships for baseline numbers, then invite sites to install beacons to "correct" their entries. This is a compelling model because unlike Nielsen/comScore, which charge exorbitant rates to advertising agencies for their data (and therefore have an economic incentive to keep their results and their methodologies a secret), Quantcast is built on openness -- Everybody can see the top-line numbers, and publishers have a way of knowing how Quantcast is representing their traffic, and have a way to true up the numbers so they match with their own internal research.

By the way, I don't work in TV, but I bet this same set of problems is going to start cropping up as TV acts more like the Internet. Have the Nielsen samples gotten big enough to handle the 400 TV channels we now have? My cable company knows what channels I watch -- How do Nielsen's numbers compare to my cable company's numbers?</description>
		<content:encoded><![CDATA[<p>Nice, meaty post. </p>
<p>Don&#8217;t be so quick to sell the concept of random sampling short &#8212; it&#8217;s mathematically rigorous and the people behind it are both serious and seriously smart. Random sampling gives us remarkably accurate predictions of election returns, and it gives us reliable statistics that would be impossible to count accurately, such as the recent Lancet article estimating the number of civilian deaths in Iraq since the invasion.</p>
<p>However, in the two cases I cite, the result set is quite limited &#8212; a household is going to vote for one of a small set of candidates; or a household has either had somebody die or it hasn&#8217;t. With a range of results as deeply variable as &#8220;what websites did you visit today&#8221; Nielsen and comScore seem to fall short. I&#8217;m making a wild guess that the size of their sample just doesn&#8217;t work with that much variability &#8212; especially when it comes to smaller sites, niche/regional/local sites, etc. Of course, there are ways to calculate the confidence of your numbers, and I&#8217;m betting the people at Nielsen didn&#8217;t have to take Stats 101 twice to pass it like I did.</p>
<p>Hitwise uses a much larger sample &#8212; as I understand it, Hitwise samples traffic from partner ISPs and therefore has millions of people in their sample. Nielsen disagrees with the non-randomness of the sample, Hitwise disagrees with Nielsen&#8217;s sample size/make-up, etc. </p>
<p>A nice middle ground is Quantcast. They use Hitwise-like ISP relationships for baseline numbers, then invite sites to install beacons to &#8220;correct&#8221; their entries. This is a compelling model because unlike Nielsen/comScore, which charge exorbitant rates to advertising agencies for their data (and therefore have an economic incentive to keep their results and their methodologies a secret), Quantcast is built on openness &#8212; Everybody can see the top-line numbers, and publishers have a way of knowing how Quantcast is representing their traffic, and have a way to true up the numbers so they match with their own internal research.</p>
<p>By the way, I don&#8217;t work in TV, but I bet this same set of problems is going to start cropping up as TV acts more like the Internet. Have the Nielsen samples gotten big enough to handle the 400 TV channels we now have? My cable company knows what channels I watch &#8212; How do Nielsen&#8217;s numbers compare to my cable company&#8217;s numbers?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
