<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Bayes Classification in Ruby</title>
	<atom:link href="http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/</link>
	<description>A goal is a dream with a deadline.</description>
	<pubDate>Fri, 12 Mar 2010 07:16:26 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-201026</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Sun, 28 Jun 2009 15:33:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-201026</guid>
		<description>Piyush: Garbage in, garbage out. You will need to take the data out of the database and deserialize it before you run the stemmer. Also, be aware that the porter stemmer used in the gem is english specific. I'm not sure how well, if at all, it would work on other languages.</description>
		<content:encoded><![CDATA[<p>Piyush: Garbage in, garbage out. You will need to take the data out of the database and deserialize it before you run the stemmer. Also, be aware that the porter stemmer used in the gem is english specific. I&#8217;m not sure how well, if at all, it would work on other languages.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Piyush</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-200997</link>
		<dc:creator>Piyush</dc:creator>
		<pubDate>Sun, 28 Jun 2009 11:26:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-200997</guid>
		<description>How to handle stemming of speacial characters ?

I am serializing the data and saving in DB , then running the stemmer on the data , but stemming of serialized data is improper as special characters like " ' " gets converted to *#8217 and stemming removes *# and saves 8217 which is wrong .

How can we handle such cases ?

I hope I am clear .</description>
		<content:encoded><![CDATA[<p>How to handle stemming of speacial characters ?</p>
<p>I am serializing the data and saving in DB , then running the stemmer on the data , but stemming of serialized data is improper as special characters like &#8221; &#8216; &#8221; gets converted to *#8217 and stemming removes *# and saves 8217 which is wrong .</p>
<p>How can we handle such cases ?</p>
<p>I hope I am clear .</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Karl Baum</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-199611</link>
		<dc:creator>Karl Baum</dc:creator>
		<pubDate>Wed, 17 Jun 2009 13:18:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-199611</guid>
		<description>Hi.  This is an excellent gem but I've noticed one limitation.  It seems that this implementation never responds with an "I don't know"  For example, if the probability that some words are "spam" is 49% and the probability "not spam" is "50%" should it respond "not spam"?  As the user of the api, i would like to have control over this.  I noticed that the following implementation introduces the idea of a threshold.

http://blog.saush.com/2009/02/naive-bayesian-classifiers-and-ruby/

The idea is that if the ratio between the best and the second best is below a certain threshold, it will respond with the default category (it doesn't know).

Of course, the sited implementation is not conveniently packaged as gem ;-).

Just wanted to get everyone's thoughts and see if i was missing something about the classifier gem.

thx.</description>
		<content:encoded><![CDATA[<p>Hi.  This is an excellent gem but I&#8217;ve noticed one limitation.  It seems that this implementation never responds with an &#8220;I don&#8217;t know&#8221;  For example, if the probability that some words are &#8220;spam&#8221; is 49% and the probability &#8220;not spam&#8221; is &#8220;50%&#8221; should it respond &#8220;not spam&#8221;?  As the user of the api, i would like to have control over this.  I noticed that the following implementation introduces the idea of a threshold.</p>
<p><a href="http://blog.saush.com/2009/02/naive-bayesian-classifiers-and-ruby/" rel="nofollow">http://blog.saush.com/2009/02/naive-bayesian-classifiers-and-ruby/</a></p>
<p>The idea is that if the ratio between the best and the second best is below a certain threshold, it will respond with the default category (it doesn&#8217;t know).</p>
<p>Of course, the sited implementation is not conveniently packaged as gem ;-).</p>
<p>Just wanted to get everyone&#8217;s thoughts and see if i was missing something about the classifier gem.</p>
<p>thx.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: convert</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-161511</link>
		<dc:creator>convert</dc:creator>
		<pubDate>Mon, 19 Jan 2009 16:58:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-161511</guid>
		<description>Thanks for the tip Mike!</description>
		<content:encoded><![CDATA[<p>Thanks for the tip Mike!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-156347</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Wed, 31 Dec 2008 14:33:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-156347</guid>
		<description>Thanks for sharing the example Mischa, great stuff.</description>
		<content:encoded><![CDATA[<p>Thanks for sharing the example Mischa, great stuff.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mischa Fierer</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-156229</link>
		<dc:creator>Mischa Fierer</dc:creator>
		<pubDate>Tue, 30 Dec 2008 04:31:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-156229</guid>
		<description>Here's an example of how to implement this in a blog-like Rails app:

http://gist.github.com/41508</description>
		<content:encoded><![CDATA[<p>Here&#8217;s an example of how to implement this in a blog-like Rails app:</p>
<p><a href="http://gist.github.com/41508" rel="nofollow">http://gist.github.com/41508</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-105421</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Tue, 08 Jul 2008 11:29:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-105421</guid>
		<description>Nihar, this depends on how you're adding new training samples. But in general, the classifier can change with every new sample you add to it. You need a good and large sample to guarantee good results.

Having said that, you also don't want to fall into the &lt;a href="http://en.wikipedia.org/wiki/Overfitting" rel="nofollow"&gt;overfitting trap&lt;/a&gt; - you should be testing your classifier between runs on different samples.</description>
		<content:encoded><![CDATA[<p>Nihar, this depends on how you&#8217;re adding new training samples. But in general, the classifier can change with every new sample you add to it. You need a good and large sample to guarantee good results.</p>
<p>Having said that, you also don&#8217;t want to fall into the <a href="http://en.wikipedia.org/wiki/Overfitting" rel="nofollow">overfitting trap</a> - you should be testing your classifier between runs on different samples.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nihar gadkari</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-105353</link>
		<dc:creator>nihar gadkari</dc:creator>
		<pubDate>Mon, 07 Jul 2008 16:57:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-105353</guid>
		<description>Hey i am new to machine learning in general and this is the first classifier i have ever used.I am using the classifier gem to classify sentences as positive and negative to carry ou sentiment analysis .I am populating the training set on the fly and i am getting different sentence classifcation outputs  for each run for the same training data.My question here is is the classifier maintaing state in between runs ?If so is there any way to prohibit this from happening?Once again I am a novice when it come to machine learning fundamentals therefore I would jus like to know if i am missing something basic.</description>
		<content:encoded><![CDATA[<p>Hey i am new to machine learning in general and this is the first classifier i have ever used.I am using the classifier gem to classify sentences as positive and negative to carry ou sentiment analysis .I am populating the training set on the fly and i am getting different sentence classifcation outputs  for each run for the same training data.My question here is is the classifier maintaing state in between runs ?If so is there any way to prohibit this from happening?Once again I am a novice when it come to machine learning fundamentals therefore I would jus like to know if i am missing something basic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Subelsky</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-103832</link>
		<dc:creator>Mike Subelsky</dc:creator>
		<pubDate>Fri, 06 Jun 2008 15:34:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-103832</guid>
		<description>I was able to get this working;  all you have to do is run "Marshal.dump(classifier)" and store the output to a text column in your database.  For MySQL I got errors unless I made the column type "longtext", a la:

change_column :tags, :bayes_data, :longtext

Then to restore the classifier, you just do:

Marshal.load(bayes_data) and voila!</description>
		<content:encoded><![CDATA[<p>I was able to get this working;  all you have to do is run &#8220;Marshal.dump(classifier)&#8221; and store the output to a text column in your database.  For MySQL I got errors unless I made the column type &#8220;longtext&#8221;, a la:</p>
<p>change_column :tags, :bayes_data, :longtext</p>
<p>Then to restore the classifier, you just do:</p>
<p>Marshal.load(bayes_data) and voila!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/comment-page-1/#comment-103827</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Fri, 06 Jun 2008 12:10:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/blog/2007/05/23/bayes-classification-in-ruby/#comment-103827</guid>
		<description>Thanks for the tip Mike!</description>
		<content:encoded><![CDATA[<p>Thanks for the tip Mike!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
