Now I’m faced with creating a customer-facing site that has (or will someday soon have) real requirements.
Here are a couple of the requirements I know so far:
- Relatively low volume traffic. The site will be public, but only registered users (customers) will have access. No product pages, no shopping carts, no ads, no social networking. The front page is a login screen.
- Reliable and secure transport and storage of medical data. At a minimum we must comply with HIPAA standards (privacy rules).
I don’t see web site development as really that different from building any other type of application. It’s all software. The architectural building blocks may be different, but the developer’s mind-set and methodologies for producing a quality product need to be the same.
I haven’t gotten far enough along to really understand all of the deployment and maintenance issues. I’m thinking about them though. The same goes for testing. I can foresee development vs. production platform testing issues that will have to be carefully considered.
What I want to do is walk you through my rational for the selection of some of the major components and tools I’m considering using for this project.
Here’s a little historical perspective on selecting a web development framework:
Yep, that’s how it feels. There are at least 100 options (plus a couple of my additions):
Agavi | AIDA/Web | Ajile | Akelos | Apache Click | Apache Cocoon | Apache Struts | Apache Wicket | AppFuse | Aranea | ASP.NET MVC | Axiom Stack | BFC | CakePHP | Camping | Catalyst | CherryPy | CodeIgniter | ColdSpring | CSLA | CppCMS | Django | DotNetNuke | Drupal | ErlyWeb | eZ Components | Flex | FUSE | Fusebox | Google Web Toolkit | Grok | Grails | Hamlets | Horde | Interchange | ItsNat | IT Mill Toolkit | JavaServer Faces | Jaxer | JBoss Seam | Kepler | Kohana | Lift | LISA | ManyDesigns Portofino | Mason | Maypole | Mach-II | Merb | Midgard | Model-Glue | MonoRail | Morfik | Nitro | onTap | OpenACS | OpenLaszlo | OpenXava | Orbit | PEAR | Orinoco | Pyjamas | Pylons | Qcodo | Radicore | Reasonable Server Faces | RIFE | Ruby on Rails | Seaside | Shale | Simplicity | SilverStripe (Sapphire) | SmartClient | Sofia | SPIP | Spring | Stripes | Symfony | Tapestry | ThinWire | Tigermouse | Vaadin | TurboGears | Wavemaker | web2py | WebObjects | WebWork | Wigbi | Yii | Zend | ZK | Zoop | Zope 2 | Zope 3 | ztemplates
As a .NET developer, my first inclination was to look at ASP.NET MVC. The two most popular and active open source frameworks are Ruby on Rails (RoR) and Django (Python-based). To be honest, I have not spent a lot of time investigating any of the others.
Why is it that I often find myself in this situation? It’s usually not 100, but there always seems to be multiple well developed solutions for these types of problems. I ran into the same thing a couple of years ago when I was selecting an ORM for a .NET project.
All you can do is start by taking the advice of others (“most popular”) and give one or two a try. Not only will you get a good sense of how well the framework meets your project requirements, since there will inevitably be problems or questions you’ll also be able to evaluate documentation and community activity.
It’s like making pasta — you throw a noodle against the wall and if it sticks, you’re done cooking. Well, not really… but you know what I mean.
One of the major considerations is hosting. I’ve previously explored the three major cloud computing platforms.
- Amazon EC2 would be overkill (see requirement #1). I don’t see a need for significant scale-up in the foreseeable future. Running a small on-demand EC2 instance 24/7 is more expensive (~$70/month) than just buying hosted services. Also, supporting a complete OS platform is unnecessary work.
- Microsoft Azure is currently in CTP (Community Technology Preview) and it’s still unclear what the pricing will be.
- That leaves Google App Engine. Based on the GAE Quotas, we would be able to operate under the limits for quite a while (exceeding the quotas would be a good thing). That means GAE can provide us free hosting, which is hard to beat.
There are literally 100’s of hosting options, and most would meet our bandwidth and storage requirements at a nominal cost. Independent of storage (see below) I guess I’m biased towards a cloud solution for two reasons:
- “Good Enough” isn’t Good Enough: I’ve been hosting this domain on a commercial site for about 6 years. I’d classify my host as good enough for my personal use (family site, photo gallery, this blog, etc.). If my hosting service went away tomorrow, no big deal. I backup everything regularly and could be up and running on a comparable host pretty quickly. But for business purposes that involve critical customer medical data, “good enough” and the possibility of the host disappearing just doesn’t cut it.
- Large Infrastructure: This is what makes a cloud solution so attractive. With any of the three cloud options you are buying into reliability and stability. They already have multiple data centers, security, and disaster plans in place. You don’t have to worry about Amazon, Microsoft, or Google going away any time soon. Unless you have the resources to build it yourself, IMO using a cloud service is a good business decision.
So for now I’ll be using Google App Engine.
Now lets looks at requirement #2: reliable and secure data storage. At this time the best solution seems to be Amazon S3. Amazon has already put a lot of thought into this: Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services (warning: PDF). S3 transfer and storage costs are very reasonable. Paying only for what you use is a real benefit.
Both Google and Microsoft are very active in the Healthcare sector (Google Health and HealthVault) and I’m sure will soon have cloud storage offerings with similar features.
There are a number of web hosting sites that claim HIPAA data storage compliance, but most seem to just be using “HIPAA” as a marketing tool to attract medically related clients. I’d stay away from these.
Web Frameworks (part 2)
Deciding to use GAE quickly narrows the web framework choice down. GAE supports Python (w/ Django) and the Java 6 runtime environment. I do not believe that either ASP.NET or RoR are supported on GAE. Done deal — Django.
I know what you’re thinking. There are many other Python-based web frameworks and even Java alternatives that I should be considering. That’s true, but Django is arguably the most popular and has a very active developers community. Also, there are several Google Code App Engine projects (see below) that support Django integration.
I did play around with RoR . The Ruby language itself is great. I love having five different ways to do the same thing. The RoR web framework is mature and has many of the same features as Django.
I looked at ASP.NET MVC, but only from a distance. Here’s a concise take from someone that recently jumped in: ASP.NET MVC Impressions after 1 week.
I initially setup a Windows-based Python/Django/GAE-SDK development environment but found it to be too clumsy. I’ve settled into Ubuntu 9.04 running in a VirtualBox VM.
The Ubuntu Package Manager handled installation of all the necessary prerequisite components. Now that I think of it, I didn’t have to do a single ./configure and make. That’s progress!
I’m an old Unix hack and I quickly fell back into my first love : Emacs. After the nostalgia wore off, I needed to find a real development IDE. There were two choices:
- Eclipse: I tried using the PyDev plug-in along with some Django integration instructions I found. Google also provides some Eclipse integration, but being able to start the server and other functions from the IDE was not that important to me. I’d rather use the command line. Also, Eclipse just seems like a real dog.
- Netbeans: With the Python plug-in Netbeans works fine, so I’ll stick with it until something better comes along.
The four features that make Django attractive:
- Object-relational mapper: Define your data models entirely in Python. You get a rich, dynamic database-access API for free — but you can still write SQL if needed.
- Automatic admin interface: Save yourself the tedious work of creating interfaces for people to add and update content. Django does that automatically, and it’s production-ready.
- Elegant URL design: Design pretty, cruft-free URLs with no framework-specific limitations. Be as flexible as you like.
- Template system: Use Django’s powerful, extensible and designer-friendly template language to separate design, content and Python code.
Carefully walk through the four part Django tutorial. Beware: there are three versions of the tutorial (0.96, 1.0, and “Latest”). Make sure you’re using the desired one.
For Django integration with GAE I’m using app-engine-patch. I had first tried Google App Engine Helper for Django, but I found that app-engine-patch works much better.
Data Integration (Back-end)
Getting data to and from the S3 server will be a critical component. I have only started looking into this, but the Amazon documentation seems very good. The Getting Started Guide examples are presented in multiple languages (PHP, C#, Java, Perl, Ruby, Python). A Python interface to Amazon Web Services, Boto, also looks like it might be useful.
Amazon S3 POST is an efficient way to move data to S3:
The back-end will require much more investigation.
For the additional database needs (account management, logging, auditing, etc.) I’ll just use the GAE Datastore.
There’s a lot of “stuff” here. Investigating and evaluating it all plus making decisions is a daunting process.
The purpose of going through these selections is to reduce the number of variables so I could start concentrating on an architecture and design that will meet project requirements. There are still many unknowns though, and I’m sure there will be major bumps in the road that will cause me to change direction.
UPDATE (11/21/2010): Beware — you get what you pay for!: Goodbye Google App Engine (GAE)