VRM VPS Technology

Let's assume that we want to build a secure, self-contained data store with a VPS. What technologies would we build it with?

Here are my recommendations:

1) OpenVZ as the VPS container
This is the building block that holds the server. An alternative exists called XenSource but it isn't as well supported so I've gone with OpenVZ.

2) Ubuntu Linux as the base OS
I'd prefer to use something like FreeBSD or NetBSD for their security and extremely flexible licenses rather than Linux but if we do need to go for a Linux distribution, Ubuntu should be the one. It has a nice package management system that allows for easy updating.

The Ubuntu community is also very helpful and responsive.

3) PostgreSQL or db4o as the database
The database deserves a post all of it's own and I'll get around to that next but so far I've narrowed it down to these two databases.

Further details in a later post of why I've selected these two as it gets a little more complicated.

4) Apache as the application server for the UI
We need a user interface for the user that's accessible to the user anywhere they have an Internet connection. (Whether it's wise for a user to connect to their secure data repository anywhere they can find an Internet connection is a different matter.)

It will also act as our feeds server too.

It's not a perfect solution but it helps us get something started and I think having an imperfect-demonstrable product is preferable to having a perfect but non-existant one.

VRM - Consequences for Vendors

I tend to look at VRM from a user perspective and naturally it colours my development too.

What I've been wondering lately is how things are going to work for the Vendors.

For instance, how does a vendor discover a user's public intentions list?

To me, there seem to be four options:

1) A user signs up to a directory like a telephone book
This would be a directory of users who have an intentions list they wish to publish publicly as opposed to a list of of users who just run VRM apps without public intentions lists. Who would own and run these directories? They could be run like DNS.

We couldn't actually use DNS because then every site listed would come under assault of vendors the world over looking for users interested in their products/services.

2) A vendor trawls the web for every website and looks for a standard 'public intentions' list API on each site
I'm not sure if vendors would be happy to trawl the Internet for customers. It's not scalable either - if every supplier did that, we'll all get DoS'd off the Internet and the resultant extra traffic would bring hosting sites to their knees.

Site owners who aren't publishing an intentions list aren't going to be happy either.

3) A user signs up to the companies they're interested in dealing with
This isn't really an option for a publicly accessible intentions list or certainly it holds no value for the user and little value for the vendor who doesn't have the resources to reach those users.

4) An intermediary company aggregates our publicly listed intentions
This seems like the best idea. We now no longer get battered by every vendor looking to sell to us and the ISPs are happy because it's far less bandwidth intensive.

The unfortunate thing is someone has to pay for those aggregation servers. These servers will be all that keeps vendors from their customers money and I'm sure someone somewhere would love to be that gatekeeper.

Where does a VRM user store their data?

Current Ideas

Before we can decide where the data should be stored, we ought to define some requirements (I'm only considering data - not the user interface to the data).

It needs to be 100% under the control of the user.
It needs to be secure.
It needs to be accessible 24/7/365.

In order to speed up adoption I refer you to the goals of this well written and easy to read paper:

  • invent as little as possible
  • reuse only popular technologies, techniques and user-interface metaphors in order to enable VRM, and…
  • provide maximal inclusiveness and extensibility to the VRM implementation, to permit the greatest potential for growth.

Now, at the VRM Conference in London, Adriana mentioned a few ways of storing data:

  1. On your own server/hosting provider
  2. On someone elses server like a storage company
  3. Distributed on some form of p2p

I know that these are just suggestions but I'd like to comment on why I don't feel they're practical.

Personally, I feel that p2p is out the door immediately because in it's current form, it's not suitable for data storage for a multitude of reasons. It's possible to write a new form of p2p application just to serve the purposes of VRM but then we're not inventing as little as possible. Technically I don't see how you could reliably depend on your data being distributed evenly to all nodes. In much the same way as DNS takes time to propagate changes across the world, p2p would suffer similar problems exaggerated by the potential volume of data that would need to be shuffled around.

Option two sounds like a contender if there's some sort of interface/API direct to our data. Unfortunately we've now got to encrypt data between our storage provider and where our data is being accessed from. It'd be much simpler if we had the UI and the storage on the same server which actually takes us to option 1.

With option 1 we could be in control of the UI and the data store BUT it would mean everyone would need to be able to run/configure/secure their own servers! That's not going to fly.

A Suggestion

How about a VPS?

A custom built VPS could contain the core components of the data store, e.g. database, web server, web based UI for accessing user data. It could be hardened against attack and it would be a pre-created so everything is already set up for the user. Any configuration changes could be done through the web interface.

This VPS would be encrypted or certainly the database would be encrypted so only the user would have access to the data. I've yet to consider the details of how this would be done so only the user has the ability to access that data despite the fact that the host has physical access to the VPS. It always comes down to the user needing to enter a pass-phrase to initiate the server but this wouldn't happen often with a reliable host.

Additionally a VPS can be moved around as a single file! Not happy with your current provider? Potentially, you could move to a different one by downloading your VPS and uploading to a different provider.

There's a lot to flesh out with the VPS idea but I'd like to know what everyone thinks about it. Are there considerations that I've missed?

Hurdles

There are a number of hurdles the VPS doesn't solve - for instance how data is to be structured and stored. I've been looking into some other interesting technologies that might be very useful but I'll leave that for another blog post.

Taking Things Forward

Everyone at the VRM meet-up was keen to move forward in some sort of semi-organised direction. I'm not sure what happened with that as I've not managed to keep up to date since the last meeting. I remember hearing about a VRM Bill of Rights so that'll probably be my first port of call.

If a VPS can solve our data storage issue, would it be any help to the VRM gang if I produce a small Virtual Machine image (maybe 50-150MB in size) that everyone could base their efforts on? It would mean we could all start working on applications rather than all work on re-inventing the storage wheel.

Let me have your thoughts folks!

Oversalted

I've had oversalted.com for a number of years now. I used to use it for rubbish and it eventually fell into neglect so I took it down. Now I'm going to use it for VRM research - as you can see it's basically Drupal with few changes (I removed the scary looking emblem).

I've been working on VRM for a few years now. I was introduced to it by Iain Henderson sometime around 2004 but back then it wasn't called VRM and Iain and his team were years ahead of everyone else in the field. Since then I've joined the team and help on the technical/development side of things.

Two weeks ago Iain introduced me to the VRM gang in London - it was an informal conference hosted by Adriana Lukas and I got a lot out of it.

Since getting back I've been looking into ways of solving a chunk of the problems associated with the storage of a user's data and I've collected quite a lot of random thoughts, thoughts I'd like to air and get people's feedback on.

Originally I was going to post them up on the VRMHub wiki but it looks more for organising events and I don't want to spam it so here I am on oversalted.com

Feel free to leave comments!

Syndicate content