<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Fraser-Green</title>
  <link href="http://www.fraser-green.com/atom.xml" rel="self"/>
  <link href="http://www.fraser-green.com/"/>
  <updated>2011-08-26T00:45:25+02:00</updated>
  <id>http://www.fraser-green.com/</id>
  <author>
    <name>Owen Fraser-Green</name>
    
      <email>owen@fraser-green.com</email>
    
  </author>

  
  <entry>
    <title>Managing seed data with Google Docs</title>
    <link href="http://www.fraser-green.com/blog/2011/08/23/using-google-docs-for-seed-data/"/>
    <updated>2011-08-23T15:15:00+02:00</updated>
    <id>http://www.fraser-green.com/blog/2011/08/23/using-google-docs-for-seed-data</id>
    <content type="html">&lt;p&gt;When you're building a website what should you do with all that rarely
(yet not never) changing seed data? I'm talking about things like
Localization strings, choices in some drop down menus and so on. This
is a situation I've faced a few times and I've gone through a variety
of solutions to the problem. A couple of projects back, we tried
putting everything in flat files - in the case of localization
strings, this tends to be encouraged by the framework - but then you
end up having to teach people Git just so they can fix a few
typos. Then when they're done (and you've cleaned up the mess they
made in Git) they complain:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Foreign marketing dude:&lt;/strong&gt; 'elp, somezing eez wrong! I'm about to
do a demo and I do not see zee changes I made to zee site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;me:&lt;/strong&gt; Oh, yeah. I was just notified by Jenkins that the last build
failed. I think I can fix it quickly, then I'll redploy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foreign marketing dude:&lt;/strong&gt; Re-de-huh?! And 'oo eez zis Jenkins?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Too technical. So then we figured everything had to go in the database
and we'd make a user-friendly front-end. Every table needed its own
CRUD admin pages to maintain the data. Some grew huge and even needed
a search box - AJAX naturally. As far as the end-user website was
concerned, the project turned out to be a bit of a flop but the admin
site rocked! At least I thought it did til one day someone
accidentally deleted loads of localization strings. He didn't even
seem that concerned:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Foreign marketing dude:&lt;/strong&gt; Zis Git, 'eez taken care of zee backups,
  no?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;me:&lt;/strong&gt; Erm...&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;So to my &lt;a href=&quot;http://www.myenergy.no&quot;&gt;current project&lt;/a&gt; and I started
pondering what a reasonable amount of development time was to spend on
seed data maintenance and in the end, I came to the conclusion: not a
lot. At the same time, we'd started using Google Docs for pretty much
anything we could so I figured we could use Google Docs spreadsheets
to maintain the seed data. They have some convenient properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Anyone you &lt;a href=&quot;(http://www.google.com/support/a/bin/answer.py?answer=60781&quot;&gt;give access
to&lt;/a&gt; can
edit the data.&lt;/strong&gt; We share the documents &quot;People at &lt;em&gt;Our Organization&lt;/em&gt;
who have the link can edit&quot;. We're only four but you can share with
individuals if you prefer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It has &lt;a href=&quot;https://docs.google.com/support/bin/answer.py?hl=en&amp;amp;answer=92199&quot;&gt;version
control&lt;/a&gt;.&lt;/strong&gt;
If someone messes it up you can see the whole history, click on
individual revisions to see what changed, and revert to a previous
version.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Everyone understands spreadsheets (at least better than git).&lt;/strong&gt;
In fact our previous attempt at making a seed data admin interface
began looking more and more like a spreadsheet since all the data's
tabular by nature and it's convenient seeing all the values in place
and being able to edit in situ.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;There's a &lt;a href=&quot;http://code.google.com/apis/spreadsheets/&quot;&gt;well-documented
API&lt;/a&gt; which provides CSV
export.&lt;/strong&gt; Authenticating is slightly tricky but thereafter it's just
one call to download a CSV-export.&lt;/li&gt;
&lt;/ol&gt;


&lt;h3&gt;So how does it work?&lt;/h3&gt;

&lt;p&gt;Each class of data (what would normally be a table in a database) has
its own spreadsheet, and a config file in the app declares all the
links.&lt;/p&gt;

&lt;p&gt;&lt;img class='' src='http://www.fraser-green.com/images/seed-data-spreadsheet.png' width='' height='' alt='' title=''&gt;&lt;/p&gt;

&lt;p&gt;When the app launches, it authenticates to Google Docs using a
dedicated apps account and, for each document it hasn't already cached
as a file, it saves the CSV export as a file in the cache directory,
and loads the data. The cache means that if the app is restarted and
Google Docs is having a bad day, the app can start fine using the
previous run's data. Also, if someone is in the middle of editing when
the app happens to restart, you don't want the half-baked changes
appearing.&lt;/p&gt;

&lt;p&gt;There's a single seed data admin page where site admins with links to
edit the spreadsheet and reload the data.&lt;/p&gt;

&lt;p&gt;&lt;img class='' src='http://www.fraser-green.com/images/seed-data-admin.png' width='' height='' alt='' title=''&gt;&lt;/p&gt;

&lt;p&gt;The cached file is replaced and the app begins using the new data
straight away. Note, when you're re-reading the data, don't just empty
the data structure holding the data and load it up again because if
any requests arrive while that's happening, they'll see partially
loaded data. Instead, load it into a new structure and make the actual
structure point to the new one. This also helps cope with bad data
since if the actual import crashes, the app can carry on using the old
data.&lt;/p&gt;

&lt;h3&gt;Details, details, details&lt;/h3&gt;

&lt;p&gt;Our app is built using &lt;a href=&quot;http://liftweb.net/&quot;&gt;Lift&lt;/a&gt; so we could use
Google's Java API to access Docs. We also used
&lt;a href=&quot;http://www.csvreader.com/java_csv.php&quot;&gt;JavaCSV&lt;/a&gt; to parse the CSV
data. Of course CSV is quite a trivial format so it's tempting just to
do it yourself, but there are a few corner cases like data containing
commas and new-lines which 3rd party libraries handle. Each set of
seed data is handled by a model class holding the dataset in an immutable
Map.&lt;/p&gt;

&lt;p&gt;Each of the model classes extends one of two generalizations. One
deals with multi-column data such as in the first screenshot above. In
these cases, lookups are typically made for a key to retrieve a class
with properties mapped to the columns in the spreadsheet. The other
generalization deals with data such as localization where a column is
picked first (e.g. the locale) and a key lookup is made to fetch a
single element.&lt;/p&gt;

&lt;h3&gt;Conclusions&lt;/h3&gt;

&lt;p&gt;So how's worked out? Pretty good. We've used this in production now
for six months and so far there haven't been any hiccoughs. You
wouldn't want to use this for large data sets since everything's held
in memory but in each case we've used it, that's worked to our
advantage. Doing round trips to the database to look up localization
strings when there could be 50 on a page, for example, would be very
expensive.&lt;/p&gt;

&lt;p&gt;If you try it yourself, let me know how it goes.&lt;/p&gt;
</content>
  </entry>
  
</feed>

