<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Splunk Your Distributed Logs in EC2</title>
	<atom:link href="http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/</link>
	<description>A goal is a dream with a deadline.</description>
	<pubDate>Fri, 29 Aug 2008 03:28:59 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: links for 2008-07-24 &#171; Brent Sordyl&#8217;s Blog</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-107022</link>
		<dc:creator>links for 2008-07-24 &#171; Brent Sordyl&#8217;s Blog</dc:creator>
		<pubDate>Thu, 24 Jul 2008 14:31:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-107022</guid>
		<description>[...] Splunk Your Distributed Logs in EC2 Managing logs is hard as it is, and now imagine you have several (dozen) servers in EC2, the process becomes a chore, the debugging is hard, and frustration abounds. (tags: ec2 logging deployment) [...]</description>
		<content:encoded><![CDATA[<p>[...] Splunk Your Distributed Logs in EC2 Managing logs is hard as it is, and now imagine you have several (dozen) servers in EC2, the process becomes a chore, the debugging is hard, and frustration abounds. (tags: ec2 logging deployment) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: People Over Process &#187; links for 2008-07-05</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-105128</link>
		<dc:creator>People Over Process &#187; links for 2008-07-05</dc:creator>
		<pubDate>Sat, 05 Jul 2008 07:37:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-105128</guid>
		<description>[...] Splunk Your Distributed Logs in EC2 (tags: splunk ec2 logmanagement cloud itmanagement redmonkclients) [...]</description>
		<content:encoded><![CDATA[<p>[...] Splunk Your Distributed Logs in EC2 (tags: splunk ec2 logmanagement cloud itmanagement redmonkclients) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104548</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Tue, 01 Jul 2008 00:38:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104548</guid>
		<description>You had me at &lt;a href="http://www.splunkbase.com/apps/All/Technologies/app:Splunk+Replay" rel="nofollow"&gt;Splunk Replay&lt;/a&gt;, and &lt;a href="http://www.splunkbase.com/apps/All/Business_Intelligence/app:Splunk+Globe" rel="nofollow"&gt;Splunk Globe&lt;/a&gt; took the cake. Wasn't aware of all the additional apps available for Splunk!</description>
		<content:encoded><![CDATA[<p>You had me at <a href="http://www.splunkbase.com/apps/All/Technologies/app:Splunk+Replay" rel="nofollow">Splunk Replay</a>, and <a href="http://www.splunkbase.com/apps/All/Business_Intelligence/app:Splunk+Globe" rel="nofollow">Splunk Globe</a> took the cake. Wasn&#8217;t aware of all the additional apps available for Splunk!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Wilde</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104219</link>
		<dc:creator>Michael Wilde</dc:creator>
		<pubDate>Wed, 25 Jun 2008 04:17:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104219</guid>
		<description>Ilya and readers...  I just finished up a post which includes a longer form video called "Splunk Ninja - Inside the Cloud"  I cover everything from CLI, Elasticfox, Rightscale, CloudStatus.com and a few new "Applications" on top of Splunk.  Check it out and keep the conversation going -- I'm game!

http://blogs.splunk.com/thewilde/2008/06/24/splunk-ninja-inside-the-cloud/</description>
		<content:encoded><![CDATA[<p>Ilya and readers&#8230;  I just finished up a post which includes a longer form video called &#8220;Splunk Ninja - Inside the Cloud&#8221;  I cover everything from CLI, Elasticfox, Rightscale, CloudStatus.com and a few new &#8220;Applications&#8221; on top of Splunk.  Check it out and keep the conversation going &#8212; I&#8217;m game!</p>
<p><a href="http://blogs.splunk.com/thewilde/2008/06/24/splunk-ninja-inside-the-cloud/" rel="nofollow">http://blogs.splunk.com/thewilde/2008/06/24/splunk-ninja-inside-the-cloud/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: splunk on ec2 &#8212; award tour</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104196</link>
		<dc:creator>splunk on ec2 &#8212; award tour</dc:creator>
		<pubDate>Mon, 23 Jun 2008 23:33:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104196</guid>
		<description>[...] Splunk Your Distributed Logs in EC2 - igvita.com. splunk always struck me as a cool app. [...]</description>
		<content:encoded><![CDATA[<p>[...] Splunk Your Distributed Logs in EC2 - igvita.com. splunk always struck me as a cool app. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: links for 2008-06-21 &#171; Mike Does Tech</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104155</link>
		<dc:creator>links for 2008-06-21 &#171; Mike Does Tech</dc:creator>
		<pubDate>Sat, 21 Jun 2008 00:31:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104155</guid>
		<description>[...] Splunk Your Distributed Logs in EC2 - igvita.com (tags: ec2 logging rubyonrails splunk) [...]</description>
		<content:encoded><![CDATA[<p>[...] Splunk Your Distributed Logs in EC2 - igvita.com (tags: ec2 logging rubyonrails splunk) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carl Mercier</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104141</link>
		<dc:creator>Carl Mercier</dc:creator>
		<pubDate>Fri, 20 Jun 2008 13:09:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104141</guid>
		<description>For the record, we ran plenty of disk benchmarks for a talk Morgan Tocker and I gave in April.  Here's the beef: http://tinyurl.com/5gg55t</description>
		<content:encoded><![CDATA[<p>For the record, we ran plenty of disk benchmarks for a talk Morgan Tocker and I gave in April.  Here&#8217;s the beef: <a href="http://tinyurl.com/5gg55t" rel="nofollow">http://tinyurl.com/5gg55t</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Wilde</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104140</link>
		<dc:creator>Michael Wilde</dc:creator>
		<pubDate>Fri, 20 Jun 2008 12:48:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104140</guid>
		<description>@Carl Mercier - I use XL's for pure performance.  Having all of those cores and memory makes for a pretty darn fast search, but let me break performance down for you. 

Architecture matters -- big time (well depending on how much data you have).  When Splunk stores data we organize it by time.  on 32-bit systems, the size of time bucket we can go to is about 200MB per bucket vs. on a 64-bit system, the buckets can be up to 10GB.  64-bit can handle so much more when reading.  If you have a huge amount of data over some determinate amount of time on a 32-bit system, we're gonna have to open a boatload of files to find your data vs on a 64-bit which we wont.

Indexing = CPU intensive operation.  Splunk will only open 1 thread by default for indexing, but can be configured to consume more threads and memory depending on your data rate.  First non-founding employee Brian Murphy has a great video on perf which i refer to alot.  http://www.splunk.com/article/2183

Search - Architecture matters here as I mentioned above, but so does Disk I/O.  I have found that AWS is actually better than VMWare when it comes to Disk I/O (but not by much).  Wanna make a Splunk win on Expert in Guitar Hero, give it RAIDed 15k RPM SAS drives, as search is a disk intensive activity.  I have a Penguin Relion dual quad-core's running at 3ghz, 16GB RAM and those fast disks I just mentioned -- i should be able to get about 300GB per day indexing rate.

Personally, I haven't done any scientific level benchmarking on AWS yet--our dev group is starting a QA project on EC2 (which i love!).  To benchmark disk I/O use the Bonnie++ tool (or just do some heavy duty GZIPing to see max thruput on the most intensive disk operations).  To benchmark indexing, use the techniques in Brian Murphy's video.

@Carl &#38; @Ilya - Persistent storage is a problem so far on EC2, what i have done in some instances is use PersistentFS which is a FUSE based filesystem that lets you mount S3 and through a syncing process, files are copied to and from S3.  I have used it for backup, and could use it for an archive in the "coldToFrozen" roll (see docs). You don't want to have Splunk index/search data on that type of volume--not sure if it'll even work because we do have FS-specific locking code we use -- but nonetheless speed would by way slow.</description>
		<content:encoded><![CDATA[<p>@Carl Mercier - I use XL&#8217;s for pure performance.  Having all of those cores and memory makes for a pretty darn fast search, but let me break performance down for you. </p>
<p>Architecture matters &#8212; big time (well depending on how much data you have).  When Splunk stores data we organize it by time.  on 32-bit systems, the size of time bucket we can go to is about 200MB per bucket vs. on a 64-bit system, the buckets can be up to 10GB.  64-bit can handle so much more when reading.  If you have a huge amount of data over some determinate amount of time on a 32-bit system, we&#8217;re gonna have to open a boatload of files to find your data vs on a 64-bit which we wont.</p>
<p>Indexing = CPU intensive operation.  Splunk will only open 1 thread by default for indexing, but can be configured to consume more threads and memory depending on your data rate.  First non-founding employee Brian Murphy has a great video on perf which i refer to alot.  <a href="http://www.splunk.com/article/2183" rel="nofollow">http://www.splunk.com/article/2183</a></p>
<p>Search - Architecture matters here as I mentioned above, but so does Disk I/O.  I have found that AWS is actually better than VMWare when it comes to Disk I/O (but not by much).  Wanna make a Splunk win on Expert in Guitar Hero, give it RAIDed 15k RPM SAS drives, as search is a disk intensive activity.  I have a Penguin Relion dual quad-core&#8217;s running at 3ghz, 16GB RAM and those fast disks I just mentioned &#8212; i should be able to get about 300GB per day indexing rate.</p>
<p>Personally, I haven&#8217;t done any scientific level benchmarking on AWS yet&#8211;our dev group is starting a QA project on EC2 (which i love!).  To benchmark disk I/O use the Bonnie++ tool (or just do some heavy duty GZIPing to see max thruput on the most intensive disk operations).  To benchmark indexing, use the techniques in Brian Murphy&#8217;s video.</p>
<p>@Carl &amp; @Ilya - Persistent storage is a problem so far on EC2, what i have done in some instances is use PersistentFS which is a FUSE based filesystem that lets you mount S3 and through a syncing process, files are copied to and from S3.  I have used it for backup, and could use it for an archive in the &#8220;coldToFrozen&#8221; roll (see docs). You don&#8217;t want to have Splunk index/search data on that type of volume&#8211;not sure if it&#8217;ll even work because we do have FS-specific locking code we use &#8212; but nonetheless speed would by way slow.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ilya Grigorik</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104139</link>
		<dc:creator>Ilya Grigorik</dc:creator>
		<pubDate>Fri, 20 Jun 2008 11:15:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104139</guid>
		<description>Bryan, you can certainly redirect all your logs into one file, but that's not what has me excited. For me, it's the ability to aggregate logs from multiple hosts, and the ability to search them intelligently from the comfort of my web browser.

Kord, thanks for addressing the questions!

Michael, thanks for the video reply. Didn't know about transactions and have to agree, that is incredibly powerful. I've added your reply at the bottom of the post.

Carl, those are great questions, I'll defer to Michael / Kord to give us the answers (pinged them). Having said that, for the S3 mirror, Splunk does not do this natively, but it wouldn't be a hard thing to setup.</description>
		<content:encoded><![CDATA[<p>Bryan, you can certainly redirect all your logs into one file, but that&#8217;s not what has me excited. For me, it&#8217;s the ability to aggregate logs from multiple hosts, and the ability to search them intelligently from the comfort of my web browser.</p>
<p>Kord, thanks for addressing the questions!</p>
<p>Michael, thanks for the video reply. Didn&#8217;t know about transactions and have to agree, that is incredibly powerful. I&#8217;ve added your reply at the bottom of the post.</p>
<p>Carl, those are great questions, I&#8217;ll defer to Michael / Kord to give us the answers (pinged them). Having said that, for the S3 mirror, Splunk does not do this natively, but it wouldn&#8217;t be a hard thing to setup.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carl Mercier</title>
		<link>http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104130</link>
		<dc:creator>Carl Mercier</dc:creator>
		<pubDate>Fri, 20 Jun 2008 03:30:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.igvita.com/2008/06/19/splunk-your-distributed-logs-in-ec2/#comment-104130</guid>
		<description>Forgot...

1. What kind of EC2 machine is recommended for Splunk?  An XL seems to be overkill for something like this.  How far will a small get me?  

2. Are the logs mirrored on S3 automatically, or they're simply kept on the local disk (ie: /mnt)

3. How many logging entries per second can I expect?  Has it been benchmarked?

Thanks! Great article!</description>
		<content:encoded><![CDATA[<p>Forgot&#8230;</p>
<p>1. What kind of EC2 machine is recommended for Splunk?  An XL seems to be overkill for something like this.  How far will a small get me?  </p>
<p>2. Are the logs mirrored on S3 automatically, or they&#8217;re simply kept on the local disk (ie: /mnt)</p>
<p>3. How many logging entries per second can I expect?  Has it been benchmarked?</p>
<p>Thanks! Great article!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
