<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Dissecting the Netflix Dataset</title>
	<atom:link href="http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/</link>
	<description>A goal is a dream with a deadline.</description>
	<pubDate>Fri, 29 Aug 2008 04:14:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: data warehousing and mining</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-27725</link>
		<dc:creator>data warehousing and mining</dc:creator>
		<pubDate>Wed, 21 Mar 2007 16:27:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-27725</guid>
		<description>&lt;strong&gt;data warehousing and mining&lt;/strong&gt;

 Here</description>
		<content:encoded><![CDATA[<p><strong>data warehousing and mining</strong></p>
<p> Here</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#187; Evolving Datasets and the Netflix Prize [ Data Sciences Analytics ]</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-26908</link>
		<dc:creator>&#187; Evolving Datasets and the Netflix Prize [ Data Sciences Analytics ]</dc:creator>
		<pubDate>Mon, 19 Mar 2007 09:36:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-26908</guid>
		<description>[...] For a good look at some of the characteristics of it, see  Dissecting the Netflix Dataset [...]</description>
		<content:encoded><![CDATA[<p>[...] For a good look at some of the characteristics of it, see  Dissecting the Netflix Dataset [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Markus</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-15722</link>
		<dc:creator>Markus</dc:creator>
		<pubDate>Sat, 17 Feb 2007 05:33:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-15722</guid>
		<description>It was quite useful reading, found some interesting details about this topic. Thanks.</description>
		<content:encoded><![CDATA[<p>It was quite useful reading, found some interesting details about this topic. Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kris Brower &#187; Blog Archive &#187; Netflix Data Statistics</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-14014</link>
		<dc:creator>Kris Brower &#187; Blog Archive &#187; Netflix Data Statistics</dc:creator>
		<pubDate>Thu, 01 Feb 2007 11:31:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-14014</guid>
		<description>[...] I found some interesting analysis of the netflix prize dataset, but it was not exactly what I was looking for. I downloaded the dataset, and decided to do some analysis myself. I went with what I though was the easy route by installing pyflix, which in hindsight might not have been the fastest approach(pyflix took about 90 minutes to install after I figured out how to install NumPy in Ubuntu). Here are the results of my analysis of movie ratings over time. [...]</description>
		<content:encoded><![CDATA[<p>[...] I found some interesting analysis of the netflix prize dataset, but it was not exactly what I was looking for. I downloaded the dataset, and decided to do some analysis myself. I went with what I though was the easy route by installing pyflix, which in hindsight might not have been the fastest approach(pyflix took about 90 minutes to install after I figured out how to install NumPy in Ubuntu). Here are the results of my analysis of movie ratings over time. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-12059</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Mon, 01 Jan 2007 15:27:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-12059</guid>
		<description>Alexandre, very good points. I've never been a Netflix user myself, but route #1 certainly makes sense. Your second proposition made me chuckle. This is exactly what we're pursuing in this competition, but somehow, it never occurred to me that Netflix could have applied the same strategy themselves! :)</description>
		<content:encoded><![CDATA[<p>Alexandre, very good points. I&#8217;ve never been a Netflix user myself, but route #1 certainly makes sense. Your second proposition made me chuckle. This is exactly what we&#8217;re pursuing in this competition, but somehow, it never occurred to me that Netflix could have applied the same strategy themselves! :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alexandre Rafalovitch</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-12022</link>
		<dc:creator>Alexandre Rafalovitch</dc:creator>
		<pubDate>Sun, 31 Dec 2006 23:18:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-12022</guid>
		<description>&#62; Are early adopters (techies) more discerning in their ratings/choices? Are the movies getting better? (Doubt it!)

I can see two options here:
1) Movies (carried by NetFlix) did get better. I think it was fairly recently that they started buying festival's movies as well as adding other sources that had less 'popular' but more interesting movies.
2) Algorithms. If their own suggestions algorithms got better over time, people will get better movies to rate.</description>
		<content:encoded><![CDATA[<p>&gt; Are early adopters (techies) more discerning in their ratings/choices? Are the movies getting better? (Doubt it!)</p>
<p>I can see two options here:<br />
1) Movies (carried by NetFlix) did get better. I think it was fairly recently that they started buying festival&#8217;s movies as well as adding other sources that had less &#8216;popular&#8217; but more interesting movies.<br />
2) Algorithms. If their own suggestions algorithms got better over time, people will get better movies to rate.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Statistical Modeling, Causal Inference, and Social Science</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-11719</link>
		<dc:creator>Statistical Modeling, Causal Inference, and Social Science</dc:creator>
		<pubDate>Sat, 23 Dec 2006 18:38:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-11719</guid>
		<description>&lt;strong&gt;Distributions of rankings&lt;/strong&gt;

A few postings ago, Andrew wondered about the shape of the long tail. The OneEyedMan's comment reminded us that the extensive NetFlixPrize dataset contains information about almost half a million users' ratings on almost 20000 movies. It's an excell...</description>
		<content:encoded><![CDATA[<p><strong>Distributions of rankings</strong></p>
<p>A few postings ago, Andrew wondered about the shape of the long tail. The OneEyedMan&#8217;s comment reminded us that the extensive NetFlixPrize dataset contains information about almost half a million users&#8217; ratings on almost 20000 movies. It&#8217;s an excell&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-10044</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Wed, 01 Nov 2006 22:05:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-10044</guid>
		<description>Chris, that's an interesting point. I didn't know Netflix allowed half-star ratings before. However, looking at my data, it seems like they must have converted all such ratings to whole numbers. I'm not able to find a single half rating. 

Now the questions is, when they rounded the numbers, did they round up or down? :)</description>
		<content:encoded><![CDATA[<p>Chris, that&#8217;s an interesting point. I didn&#8217;t know Netflix allowed half-star ratings before. However, looking at my data, it seems like they must have converted all such ratings to whole numbers. I&#8217;m not able to find a single half rating. </p>
<p>Now the questions is, when they rounded the numbers, did they round up or down? :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-10027</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Wed, 01 Nov 2006 17:08:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-10027</guid>
		<description>In response to your question: "in 1998 the average was 3.4, by 2005 it steadily moved to 3.8. I wonder what accounts for the drift?"

Netflix used to allow for half star ratings.  I would imagine that this would have an affect on the scores.  Are people more likely to "round up" their ratings (a 3.5 would be a 4 vs a 3) than down?

Can you isolate the ratings made during the time when half star ratings were possible vs when they were not?

Maybe Netflix needs to reconsider going back to a half star rating system to improve the accuracy of their predictions before worrying about designing a new algorithm?</description>
		<content:encoded><![CDATA[<p>In response to your question: &#8220;in 1998 the average was 3.4, by 2005 it steadily moved to 3.8. I wonder what accounts for the drift?&#8221;</p>
<p>Netflix used to allow for half star ratings.  I would imagine that this would have an affect on the scores.  Are people more likely to &#8220;round up&#8221; their ratings (a 3.5 would be a 4 vs a 3) than down?</p>
<p>Can you isolate the ratings made during the time when half star ratings were possible vs when they were not?</p>
<p>Maybe Netflix needs to reconsider going back to a half star rating system to improve the accuracy of their predictions before worrying about designing a new algorithm?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/#comment-9928</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Tue, 31 Oct 2006 00:41:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/10/29/dissecting-the-netflix-dataset/#comment-9928</guid>
		<description>Grant, I added a log-log plot right at the bottom of the post. 

Technically, you will never find a true Power law (Zipf, Pareto) distribution in real life simply because you will never have a sample of infinite size - you will always have a drooping tail. However, just as you have pointed out, the drooping tail itself can give us a lot of information. In this case, it looks like Netflix could really push the demand further down the tail by adopting more niche titles.</description>
		<content:encoded><![CDATA[<p>Grant, I added a log-log plot right at the bottom of the post. </p>
<p>Technically, you will never find a true Power law (Zipf, Pareto) distribution in real life simply because you will never have a sample of infinite size - you will always have a drooping tail. However, just as you have pointed out, the drooping tail itself can give us a lot of information. In this case, it looks like Netflix could really push the demand further down the tail by adopting more niche titles.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
