<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Loading Netflix Dataset into SQL</title>
	<atom:link href="http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/</link>
	<description>A goal is a dream with a deadline.</description>
	<pubDate>Thu, 20 Nov 2008 20:07:49 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-103825</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Fri, 06 Jun 2008 12:07:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-103825</guid>
		<description>Awesome, thanks for sharing the tips Larry!</description>
		<content:encoded><![CDATA[<p>Awesome, thanks for sharing the tips Larry!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Larry Freeman</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-103603</link>
		<dc:creator>Larry Freeman</dc:creator>
		<pubDate>Sun, 01 Jun 2008 00:15:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-103603</guid>
		<description>Hi,

I just discovered your web site.  I really like it! :-)

I just started a blog myself which will focus on analyzing algorithms.  I've started with the algorithm that won the Netflix Progress Prize.  At this point, my approach is pure SQL on MySQL with very minor help from awk and java.

I've hacked a script for loading up all the Netflix data with indexing in 3 hours.  Here are the &lt;a href="http://setupandconfig.blogspot.com/2008/04/loading-up-netflix-prize-data-into.html" rel="nofollow"&gt;details&lt;/a&gt;

I've also used this script to reproduce the 11 global effects reported in the BellKor paper.  It's &lt;a href="http://algorithmsanalyzed.blogspot.com/2008/05/bellkor-algorithm-global-effects.html" rel="nofollow"&gt;here&lt;/a&gt;

Cheers,

-Larry</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I just discovered your web site.  I really like it! :-)</p>
<p>I just started a blog myself which will focus on analyzing algorithms.  I&#8217;ve started with the algorithm that won the Netflix Progress Prize.  At this point, my approach is pure SQL on MySQL with very minor help from awk and java.</p>
<p>I&#8217;ve hacked a script for loading up all the Netflix data with indexing in 3 hours.  Here are the <a href="http://setupandconfig.blogspot.com/2008/04/loading-up-netflix-prize-data-into.html" rel="nofollow">details</a></p>
<p>I&#8217;ve also used this script to reproduce the 11 global effects reported in the BellKor paper.  It&#8217;s <a href="http://algorithmsanalyzed.blogspot.com/2008/05/bellkor-algorithm-global-effects.html" rel="nofollow">here</a></p>
<p>Cheers,</p>
<p>-Larry</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-103429</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Thu, 22 May 2008 10:55:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-103429</guid>
		<description>Rasha, I've heard that Netflix changed their data files slightly (I believe they released more data since this was posted). Hence, it's not surprising that my transform script is not working as well as it should. More likely than not, it's the regular expression that's outdated - take a close look at your data files and how the data is being extracted.</description>
		<content:encoded><![CDATA[<p>Rasha, I&#8217;ve heard that Netflix changed their data files slightly (I believe they released more data since this was posted). Hence, it&#8217;s not surprising that my transform script is not working as well as it should. More likely than not, it&#8217;s the regular expression that&#8217;s outdated - take a close look at your data files and how the data is being extracted.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rasha</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-103391</link>
		<dc:creator>Rasha</dc:creator>
		<pubDate>Tue, 20 May 2008 10:49:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-103391</guid>
		<description>I have tried this over 10 times, the transform script creates all 17000 files in data-load folder but they are all empty!!! :(</description>
		<content:encoded><![CDATA[<p>I have tried this over 10 times, the transform script creates all 17000 files in data-load folder but they are all empty!!! :(</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-58392</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Tue, 07 Aug 2007 12:51:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-58392</guid>
		<description>Francisco, looks great. Good luck!</description>
		<content:encoded><![CDATA[<p>Francisco, looks great. Good luck!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Francisco Marco-Serrano</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-58390</link>
		<dc:creator>Francisco Marco-Serrano</dc:creator>
		<pubDate>Tue, 07 Aug 2007 12:43:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-58390</guid>
		<description>Netflix Prize for Dummies.
====================

Hi, I'm quite limited in programming and databases so I'm using the Prize as my Quest for learning. I'm documenting here: http://blogs.kproductivity.com/fmwaves/category/culture/movies/netflix/

PS The dummy is ME!</description>
		<content:encoded><![CDATA[<p>Netflix Prize for Dummies.<br />
====================</p>
<p>Hi, I&#8217;m quite limited in programming and databases so I&#8217;m using the Prize as my Quest for learning. I&#8217;m documenting here: <a href="http://blogs.kproductivity.com/fmwaves/category/culture/movies/netflix/" rel="nofollow">http://blogs.kproductivity.com/fmwaves/category/culture/movies/netflix/</a></p>
<p>PS The dummy is ME!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-35349</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Mon, 30 Apr 2007 14:27:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-35349</guid>
		<description>Peter, thanks for the link. That looks like a nice way to go about it!</description>
		<content:encoded><![CDATA[<p>Peter, thanks for the link. That looks like a nice way to go about it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-35026</link>
		<dc:creator>Peter</dc:creator>
		<pubDate>Sun, 29 Apr 2007 17:31:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-35026</guid>
		<description>Hi, 

i just found an even better way which avoids the unnecessary creation of the temporary files.

http://www.juretta.com/log/2007/04/24/loading_the_netflix_dataset_into_mysql/

Peter</description>
		<content:encoded><![CDATA[<p>Hi, </p>
<p>i just found an even better way which avoids the unnecessary creation of the temporary files.</p>
<p><a href="http://www.juretta.com/log/2007/04/24/loading_the_netflix_dataset_into_mysql/" rel="nofollow">http://www.juretta.com/log/2007/04/24/loading_the_netflix_dataset_into_mysql/</a></p>
<p>Peter</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-25912</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Fri, 16 Mar 2007 15:27:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-25912</guid>
		<description>Ben, my import went really fast, so it must be something with the DB configuration. A couple of thoughts:

1) Set your 'max packet size' to something much higher than the alloted ~1mb.
2) Flat out remove all innodb references in your config file.
3) Remove some of the keys on the table and see if that helps. Building the index is usually by far the most expensive operation.
4) Run the import, and then do 'show full processlist' to get a feel for where the bottleneck is.

I'm not running any special hardware on my end: AthlonXP 2800, 2GB RAM, 10K RPM hard drive. Chances are, that could be worse than what you've got.</description>
		<content:encoded><![CDATA[<p>Ben, my import went really fast, so it must be something with the DB configuration. A couple of thoughts:</p>
<p>1) Set your &#8216;max packet size&#8217; to something much higher than the alloted ~1mb.<br />
2) Flat out remove all innodb references in your config file.<br />
3) Remove some of the keys on the table and see if that helps. Building the index is usually by far the most expensive operation.<br />
4) Run the import, and then do &#8217;show full processlist&#8217; to get a feel for where the bottleneck is.</p>
<p>I&#8217;m not running any special hardware on my end: AthlonXP 2800, 2GB RAM, 10K RPM hard drive. Chances are, that could be worse than what you&#8217;ve got.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben</title>
		<link>http://www.igvita.com/2006/12/01/loading-netflix-dataset-into-sql/#comment-25571</link>
		<dc:creator>Ben</dc:creator>
		<pubDate>Thu, 15 Mar 2007 16:53:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2006/12/01/loading-netflix-dataset-into-sql/#comment-25571</guid>
		<description>I have followed the directions above but importing the data is taking hours... you say it only took you 20 minutes!?  I am on OSx, MySQL5, with pretty good hardware.  I am using the my-huge.cnf as suggested but the process in still taking forever.  To turn off InnoDB/transaction support all you need to do is set the table type to MyISAM like listed above, correct?  Or is there a setting I have to use to turn it off?  Thanks.</description>
		<content:encoded><![CDATA[<p>I have followed the directions above but importing the data is taking hours&#8230; you say it only took you 20 minutes!?  I am on OSx, MySQL5, with pretty good hardware.  I am using the my-huge.cnf as suggested but the process in still taking forever.  To turn off InnoDB/transaction support all you need to do is set the table type to MyISAM like listed above, correct?  Or is there a setting I have to use to turn it off?  Thanks.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
