<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Essential storage tradeoff: Simple Reads vs. Simple Writes</title>
	<atom:link href="http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/feed/" rel="self" type="application/rss+xml" />
	<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/</link>
	<description></description>
	<lastBuildDate>Sat, 13 Mar 2010 16:12:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Games Outside the Box, Analog Computers, Read-Write-Costs etc. &#124; Gamlor</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-276306</link>
		<dc:creator>Games Outside the Box, Analog Computers, Read-Write-Costs etc. &#124; Gamlor</dc:creator>
		<pubDate>Thu, 11 Mar 2010 23:38:06 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-276306</guid>
		<description>[...] an old post about the read-/write-cost tradeoff. For most applications reads occur way more often than writes. Therefore this application should [...]</description>
		<content:encoded><![CDATA[<p>[...] an old post about the read-/write-cost tradeoff. For most applications reads occur way more often than writes. Therefore this application should [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SQL is Dead. Long Live SQL. : Dataspora Blog</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-254186</link>
		<dc:creator>SQL is Dead. Long Live SQL. : Dataspora Blog</dc:creator>
		<pubDate>Thu, 26 Nov 2009 02:10:46 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-254186</guid>
		<description>[...] The second is a rejection of the strong typing of relational schemas, which make changes to data models, which are inevitable,  disastrously difficult to achieve.  It also makes writing to the data store a complex process. [...]</description>
		<content:encoded><![CDATA[<p>[...] The second is a rejection of the strong typing of relational schemas, which make changes to data models, which are inevitable,  disastrously difficult to achieve.  It also makes writing to the data store a complex process. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scalable Database Links &#171; streamhacker.com</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-249249</link>
		<dc:creator>Scalable Database Links &#171; streamhacker.com</dc:creator>
		<pubDate>Mon, 26 Oct 2009 16:20:26 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-249249</guid>
		<description>[...] Code Monkeyism: Essential storage tradeoff: Simple Reads vs. Simple Writes [...]</description>
		<content:encoded><![CDATA[<p>[...] Code Monkeyism: Essential storage tradeoff: Simple Reads vs. Simple Writes [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ennuyer.net &#187; Blog Archive &#187; Rails Reading - Sept 21, 2009</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244999</link>
		<dc:creator>Ennuyer.net &#187; Blog Archive &#187; Rails Reading - Sept 21, 2009</dc:creator>
		<pubDate>Mon, 21 Sep 2009 15:16:56 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244999</guid>
		<description>[...]  Code Monkeyism: Essential storage tradeoff: Simple Reads vs. Simple Writes  [...]</description>
		<content:encoded><![CDATA[<p>[...]  Code Monkeyism: Essential storage tradeoff: Simple Reads vs. Simple Writes  [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Adams</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244605</link>
		<dc:creator>Curt Adams</dc:creator>
		<pubDate>Sat, 19 Sep 2009 08:17:43 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244605</guid>
		<description>If the denormalized views are constructed from a normalized database, the normalized data isn&#039;t dropped. It&#039;s still there, and can be used to verify or reconstruct documents. My understanding of the NoSQL movement is that it abandons the goal of a normalized data structure entirely.</description>
		<content:encoded><![CDATA[<p>If the denormalized views are constructed from a normalized database, the normalized data isn&#8217;t dropped. It&#8217;s still there, and can be used to verify or reconstruct documents. My understanding of the NoSQL movement is that it abandons the goal of a normalized data structure entirely.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Warner</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244534</link>
		<dc:creator>Jason Warner</dc:creator>
		<pubDate>Fri, 18 Sep 2009 19:18:22 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244534</guid>
		<description>If people haven&#039;t, now would be a great time to go and review Brewer&#039;s CAP Theorem!

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem</description>
		<content:encoded><![CDATA[<p>If people haven&#8217;t, now would be a great time to go and review Brewer&#8217;s CAP Theorem!</p>
<p><a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem" rel="nofollow">http://www.julianbrowne.com/article/viewer/brewers-cap-theorem</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Z. Beard</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244499</link>
		<dc:creator>Eric Z. Beard</dc:creator>
		<pubDate>Fri, 18 Sep 2009 12:59:27 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244499</guid>
		<description>I&#039;ve been studying this issue closely for a while now.  I have built and maintain a system with hundreds of millions of records that&#039;s growing at about 25% per quarter consistently, so I&#039;m obsessed with data storage.  My application relies on hundreds of queries that need to run in real-time against all of that transactional data - no offline cubes or Hadoop clusters.  I&#039;m considering a jump to NoSql, but the lack of ad-hoc queries against live data is just a killer.  I write probably a dozen ad-hoc queries a week to resolve support issues, and they normally need to run &quot;right now!&quot;  I might be analyzing tens of millions of records in several different tables or fixing some field that got corrupted by a bug in the software.  How do you do that with a NoSql system?

My solution for the moment is sharded databases, with a healthy dose of de-normalized aggregates sprinkled throughout the model, which are kept in synch with triggers or controlled data access via stored procedures.  There&#039;s no chance of data being different in two places.

And there&#039;s no edict on how to write queries - if you join 20 tables and it runs fast, fine.  If you need to write parallelized queries to return raw data from one table at a time and then join them on an application server, then that&#039;s fine too.

I think a lot of people new to this game don&#039;t get how good a SQL database can be.  On a single shard (a cheap 2U box), I can run 5 million customers, which is something like 150Gb of data, and everything runs fast enough to keep people happy.  In a NoSql system, how many nodes would you need to support both transactional and reporting applications for that much data?</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been studying this issue closely for a while now.  I have built and maintain a system with hundreds of millions of records that&#8217;s growing at about 25% per quarter consistently, so I&#8217;m obsessed with data storage.  My application relies on hundreds of queries that need to run in real-time against all of that transactional data &#8211; no offline cubes or Hadoop clusters.  I&#8217;m considering a jump to NoSql, but the lack of ad-hoc queries against live data is just a killer.  I write probably a dozen ad-hoc queries a week to resolve support issues, and they normally need to run &#8220;right now!&#8221;  I might be analyzing tens of millions of records in several different tables or fixing some field that got corrupted by a bug in the software.  How do you do that with a NoSql system?</p>
<p>My solution for the moment is sharded databases, with a healthy dose of de-normalized aggregates sprinkled throughout the model, which are kept in synch with triggers or controlled data access via stored procedures.  There&#8217;s no chance of data being different in two places.</p>
<p>And there&#8217;s no edict on how to write queries &#8211; if you join 20 tables and it runs fast, fine.  If you need to write parallelized queries to return raw data from one table at a time and then join them on an application server, then that&#8217;s fine too.</p>
<p>I think a lot of people new to this game don&#8217;t get how good a SQL database can be.  On a single shard (a cheap 2U box), I can run 5 million customers, which is something like 150Gb of data, and everything runs fast enough to keep people happy.  In a NoSql system, how many nodes would you need to support both transactional and reporting applications for that much data?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rayk</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244467</link>
		<dc:creator>Rayk</dc:creator>
		<pubDate>Fri, 18 Sep 2009 09:03:43 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244467</guid>
		<description>Again a beautiful and nice posting, beside the content, which is very good!

For ad-hoc reporting we BI experts are currently facing the arising of coulumn based and column indexed storages, mainly in combination with in-memory technology. 

As developer one does data modelling in his / her best way, in preparation for access things are split into columns and links between. Far away from 3NF or documents.

I supose it&#039;s worth having a look at during your further investigations. &#039;am looking forward.

Regarsd
Rayk</description>
		<content:encoded><![CDATA[<p>Again a beautiful and nice posting, beside the content, which is very good!</p>
<p>For ad-hoc reporting we BI experts are currently facing the arising of coulumn based and column indexed storages, mainly in combination with in-memory technology. </p>
<p>As developer one does data modelling in his / her best way, in preparation for access things are split into columns and links between. Far away from 3NF or documents.</p>
<p>I supose it&#8217;s worth having a look at during your further investigations. &#8216;am looking forward.</p>
<p>Regarsd<br />
Rayk</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stephan</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244444</link>
		<dc:creator>stephan</dc:creator>
		<pubDate>Fri, 18 Sep 2009 05:08:45 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244444</guid>
		<description>@Curt: But materialized views are complex writes - done by the DB - to denormalize data. The same I write about in the post. Aren&#039;t they?</description>
		<content:encoded><![CDATA[<p>@Curt: But materialized views are complex writes &#8211; done by the DB &#8211; to denormalize data. The same I write about in the post. Aren&#8217;t they?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Adams</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-244395</link>
		<dc:creator>Curt Adams</dc:creator>
		<pubDate>Thu, 17 Sep 2009 22:54:49 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-244395</guid>
		<description>My experience with non-normalized data is that it will become inconsistent, guaranteed. You have to have a normalized backbone for important data. Complex views of normalized data are very useful and I&#039;ve been using a materialized view strategy for 25 years, long before I ever heard of any formal descriptions. I have never been able to get any other system to work.

In terms of chunking, I find extremely complex &quot;materialized views&quot; are very useful. I haven&#039;t used the Oracle system, but I&#039;ve written some pretty complicated update routines that I would be surprised to see in standardized packages. So I think your goals can be best realized (in most environments) with sophisticated systems comparable to materialized views - documents that update automatically based on an underlying verifiable data system.</description>
		<content:encoded><![CDATA[<p>My experience with non-normalized data is that it will become inconsistent, guaranteed. You have to have a normalized backbone for important data. Complex views of normalized data are very useful and I&#8217;ve been using a materialized view strategy for 25 years, long before I ever heard of any formal descriptions. I have never been able to get any other system to work.</p>
<p>In terms of chunking, I find extremely complex &#8220;materialized views&#8221; are very useful. I haven&#8217;t used the Oracle system, but I&#8217;ve written some pretty complicated update routines that I would be surprised to see in standardized packages. So I think your goals can be best realized (in most environments) with sophisticated systems comparable to materialized views &#8211; documents that update automatically based on an underlying verifiable data system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stephan</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243944</link>
		<dc:creator>stephan</dc:creator>
		<pubDate>Tue, 15 Sep 2009 20:46:46 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243944</guid>
		<description>@Yeroc: Yes, you are right. 

&quot;I think it’s worth mentioning that some SQL databases (Oracle for example) provide out-of-the-box solutions for this problem as well and arguably in a more seamless manner. For example, creating a materialized view based on a complex query gives you the benefits of a denormalized schema.&quot;

There are other benefits of chunks in documents and values though. Distribution comes to mind, which is inherently more difficult (might I say impossible above certain levels) to achieve with Oracle. And of course you can have denormalized data in every SQL database.</description>
		<content:encoded><![CDATA[<p>@Yeroc: Yes, you are right. </p>
<p>&#8220;I think it’s worth mentioning that some SQL databases (Oracle for example) provide out-of-the-box solutions for this problem as well and arguably in a more seamless manner. For example, creating a materialized view based on a complex query gives you the benefits of a denormalized schema.&#8221;</p>
<p>There are other benefits of chunks in documents and values though. Distribution comes to mind, which is inherently more difficult (might I say impossible above certain levels) to achieve with Oracle. And of course you can have denormalized data in every SQL database.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yeroc</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243942</link>
		<dc:creator>Yeroc</dc:creator>
		<pubDate>Tue, 15 Sep 2009 20:23:00 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243942</guid>
		<description>I think it&#039;s worth mentioning that some SQL databases (Oracle for example) provide out-of-the-box solutions for this problem as well and arguably in a more seamless manner.  For example, creating a materialized view based on a complex query gives you the benefits of a denormalized schema.  This, in conjunction with query re-writing could mean your application would continue to operate normally without any knowledge of the denormalization going on behind the scenes.  Meanwhile, at the application level inserts and updates don&#039;t need to be concerned about manually updating the data across multiple tables since that will be handled automatically via the materialized view mechanism.</description>
		<content:encoded><![CDATA[<p>I think it&#8217;s worth mentioning that some SQL databases (Oracle for example) provide out-of-the-box solutions for this problem as well and arguably in a more seamless manner.  For example, creating a materialized view based on a complex query gives you the benefits of a denormalized schema.  This, in conjunction with query re-writing could mean your application would continue to operate normally without any knowledge of the denormalization going on behind the scenes.  Meanwhile, at the application level inserts and updates don&#8217;t need to be concerned about manually updating the data across multiple tables since that will be handled automatically via the materialized view mechanism.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rusty Wright</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243914</link>
		<dc:creator>Rusty Wright</dc:creator>
		<pubDate>Tue, 15 Sep 2009 16:48:37 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243914</guid>
		<description>A good example of where this played out was Twitter, back when it kept falling on its face because it was using an sql database for storing everything.</description>
		<content:encoded><![CDATA[<p>A good example of where this played out was Twitter, back when it kept falling on its face because it was using an sql database for storing everything.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Mahemoff</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243905</link>
		<dc:creator>Michael Mahemoff</dc:creator>
		<pubDate>Tue, 15 Sep 2009 15:52:24 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243905</guid>
		<description>+1 this is a great high-level summary of the architectures. The characterisation of a simple trade-off is a good starting point.</description>
		<content:encoded><![CDATA[<p>+1 this is a great high-level summary of the architectures. The characterisation of a simple trade-off is a good starting point.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stephan</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243904</link>
		<dc:creator>stephan</dc:creator>
		<pubDate>Tue, 15 Sep 2009 15:51:16 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243904</guid>
		<description>@Joe: Yes, this might be the case under some circumstances and some applications. Those were certainly the mainstream in the 80s and 90s. Internal applications where 1000 concurrent users (flight terminals?) are a high number. 

But they only scale with massive caching. Today many applications experiences millions of concurrent users. Then you need to trade consistency with scalability.

As I&#039;ve said to Stefan,  the downside is &quot;mainly the lack of adhoc reporting and adhoc data fixing.&quot;. Which sometimes is a major issue.

So for user data, payment data, orders it&#039;s still a good idea to keep your data normalized.

Cheers
Stpehan</description>
		<content:encoded><![CDATA[<p>@Joe: Yes, this might be the case under some circumstances and some applications. Those were certainly the mainstream in the 80s and 90s. Internal applications where 1000 concurrent users (flight terminals?) are a high number. </p>
<p>But they only scale with massive caching. Today many applications experiences millions of concurrent users. Then you need to trade consistency with scalability.</p>
<p>As I&#8217;ve said to Stefan,  the downside is &#8220;mainly the lack of adhoc reporting and adhoc data fixing.&#8221;. Which sometimes is a major issue.</p>
<p>So for user data, payment data, orders it&#8217;s still a good idea to keep your data normalized.</p>
<p>Cheers<br />
Stpehan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243900</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Tue, 15 Sep 2009 15:42:16 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243900</guid>
		<description>In applications I have worked on, data consistency has been the highest concern. The performance loss to joins has been minimal in almost every case, and the ability to change\add data without having to remember to do it in 10 different tables is essential.

I will likely stick to Normalized Data for the majority of my applications, but I will have to keep an open mind towards DeNormalized data as well, I suppose.</description>
		<content:encoded><![CDATA[<p>In applications I have worked on, data consistency has been the highest concern. The performance loss to joins has been minimal in almost every case, and the ability to change\add data without having to remember to do it in 10 different tables is essential.</p>
<p>I will likely stick to Normalized Data for the majority of my applications, but I will have to keep an open mind towards DeNormalized data as well, I suppose.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stephan</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243898</link>
		<dc:creator>stephan</dc:creator>
		<pubDate>Tue, 15 Sep 2009 15:33:47 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243898</guid>
		<description>@Stefan: Thanks. Well yes, there are many other tradeoffs. I&#039;m writing a follow up post about the &quot;Dark side of NoSQL&quot; - mainly the lack of adhoc reporting and adhoc data fixing.

Cheers
Stephan

PS: And many more things</description>
		<content:encoded><![CDATA[<p>@Stefan: Thanks. Well yes, there are many other tradeoffs. I&#8217;m writing a follow up post about the &#8220;Dark side of NoSQL&#8221; &#8211; mainly the lack of adhoc reporting and adhoc data fixing.</p>
<p>Cheers<br />
Stephan</p>
<p>PS: And many more things</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stefan Schubert</title>
		<link>http://codemonkeyism.com/essential-storage-tradeoff-simple-reads-simple-writes/comment-page-1/#comment-243889</link>
		<dc:creator>Stefan Schubert</dc:creator>
		<pubDate>Tue, 15 Sep 2009 13:40:33 +0000</pubDate>
		<guid isPermaLink="false">http://codemonkeyism.com/?p=1207#comment-243889</guid>
		<description>Hey Stephan,

very neat and clean presentation of the trade-off those two principles.
Unfortunately there is more involved in such a decision, but hopefully you&#039;ll get into that later ^^

Chears
Stefan</description>
		<content:encoded><![CDATA[<p>Hey Stephan,</p>
<p>very neat and clean presentation of the trade-off those two principles.<br />
Unfortunately there is more involved in such a decision, but hopefully you&#8217;ll get into that later ^^</p>
<p>Chears<br />
Stefan</p>
]]></content:encoded>
	</item>
</channel>
</rss>
