I have had the opportunity to consider from scratch what is needed to migrate applications into the cloud. The first question I usually get asked is with cloud vendor do I recommend Amazon, Azure etc.
My reply is I am not worried about that – and ask if there is a clear strategy to support multi-tenancy for the application being developed for, or ported to the cloud. If not read on for both an introduction, and a checklist of practical steps I recommend.
So a quick summary of the problem (that you may care to skip) …
Imagine you are developing cloud service for independent restaurants that allows food to be ordered online. Each restaurant what signs up for this service becomes a tenant. End users (restaurant customers) many then sign up one or more restaurants but each restaurant (tennant) must be isolated from each other. The restaurants should neither know or care if other restaurants are sharing the same machines, databases, web servers etc. A good test is of an end user could register with one or more restaurants using their e-mail and be seen as a totally independent person.
There are a number of different design approaches to multi-tenancy, but a key consideration is if the underlying databases (or other persistent storage) is shared or separate. There is the secondary consideration if the other components of the system (web servers, business logic) are shared or separate.
Where the storage is separate this may be via different approaches – separate schema in a single physical database, virtual private databases (where supported by your storage provider) or even separate databases perhaps provisioned on different machines. The bottom line being in a separate solution you are relying on the infrastructure to partition tenants, and in a shared environment it becomes your application problem to partition tenants.
Which approach depends on both your requirements and where you are starting from. The wrong approach can kill a system before it is used by the first customer.
Preferred approach for a few large tenants – if you only plan 10 or 20 tenants then 10 or 20 virtual machines (or Azure or Amazon instances) will be the best way to go. So each tenant has their own dedicated physical/virtual machine or instance in the cloud. Very appropriate if you have an existing code base or telnets with strong security or other isolation needs (e.g. independent upgrades and customization). Essentially ‘clone and go’ with a copy of a master system. Can become expensive if tenant numbers grow and you have not 10 but 100s or 1000s tennnets to maintain, patch and upgrade.
Consider for many, many small tenants all with the same feature set and where each tenant has restricted resource needs – generally delivers lower ultimate deployment cost – but at a cost of significantly more application design work to allow many tenants to share one machine. Operational costs may be far lower as one upgrade will simultaneously impact many tenants. Many simpler applications (blog hosts), will support this deployment model. Can become a nightmare if the system you deploy to a tenant is made up of multiple communicating components.
The shared approach is, however, hard. Indeed it many be impossible to retrofit into a mature system single tenant system where you are trying to move an existing code base into the cloud.
There are many hidden costs that are overlooked in early designs with the shared approach – most of these come back and haunt a business that has not planned for success. For example if one tenant needs extra capacity how can a tenant be moved from one machine/database to a new and possibly private database/instance?
Aside: For more discussion on the benefits of each approach read some of the older in-depth articles that describe the different approaches to multi-tenancy. I recommend the MSDN article Multi-Tenant Data Architecture from 2006, and see also the referecnes at the end of this blog.
Having said this if I am specifying a NEW cloud based code development (and the associated developers are clever) I would generally recommend that a new cloud design include an explicit tenant ID in database, API, logging files etc. to allow the option of shared deployment in the future. Provided the tennet ID is correctly designed (contact the author for more help here, and refer to the checklist below) initial deployments could be on separate instances / virtual machines and provided the tenant ID is anticipated in storage, API etc, the appropriate functional code might be able to be added later.
Checklist – All multi-tenant solutions
- Is the tenant ID for each tenant unique in your system regardless of where the tenant is deployed?
- Have you anticipated that some organisations may need a number of tenants (e.g. for production, test) and be able to copy data between tenants?
- What is your strategy to allow per tenant customisation, especially in a shared environment and across upgrades?
- Have you reviewed all of your configuration parameters – should these be global or per tenant (or better still can you remove low level technical configuration altogether)?
- Can the different tenants in in different time zones (more on time zones in a future post)?
- Do you know how to monitor the resources used per tenant (and are able to identify or even throttle very active tenants)?
- Have you considered that for billing and other purposes tenants may have some weak grouping (e.g. a group of tenants may belong to some form of service provider or reseller – we will cover cloud and resellers and other multi-party commercial relationships in a future blog entry)?
- Do you know how you will route external requests (web pages, API calls, files sent to a FTP server) to the correct tenant based on some external ID?
- Do you minimise or eliminate the impact to the tenant if you have to internally change how to host a tenant (calling a tenant and asking if they could login via a different IP address to a different machine tomorrow is generally not welcomed)?
- Have you included the tenant ID for logging and billing purposes where different tenants use some shared service? Consider:
- Every tenant can send access a common gateway to send SMS messages to their end users– can you charge the SMS back to the correct tenant.
- All the tenants use an external shared payment gateway (e.g. PayPal) where the customer is only identified by an e-mail address. How do you route interactions back to the correct tenant, especially where the same end user might be registered in multiple tenants.
- Have you considered how new tenants sign up and can self provision their own service (and do you need this facility on day one – it can be hard to do)?
- Can customers IDs, account numbers, phone numbers, email addresses and all other external ID co-exist in different tenant’s systems.?
- Can tenants attach their own domain names to the tenanted service that they buy?
- Can tenants upload logo, style sheets and other user interface customization?
- Do you know how you can later add other components to speed up the system (e.g. caching) that will not break your tenancy models?
- Can you independently backup and restore tenants (include setting changes)?
- Do you instrumented your system so that you can apply different charging models to your tenants?
- Do you have all of your operational monitoring infrastructure to verify per tenant health in place before you go live?
Additional checklist for separate multi-tenant solutions
- Does your environment provider in what ever form – physical (e.g. IBM, HP), virtual (e.g. VMware, Xen) or Iaas/PaaS (e.g. Amazon / Azure) – provide much of the automation to allow you to create, clone, parallel upgrade, backup, restore, move the dedicated instance associated with each tenant?
Additional checklist for shared multi-tenant solutions
- Tenant ID everywhere. Databases, API, logs, messaging requests.
- Beware that tenant ID should not be on all database tables – many cold tables may not need a tenant ID?
- Does your design factor out much of the tenant sensitive logic into a few common reused code modules?
- Have you avoided kludges such as prefix all external ID that are use to reference customers (email addresses, account numbers, etc) with a string containing the tenant ID?
- Do you have clever developers?
- Multi-Tenant Data Architecture. Excellent article from 2006.
- Architecture Strategies for Catching the Long Tail. Another good Microsoft Article.
- Multi-tenancy in the cloud: Why it matters. Good introduction from Computerworld.
- Multi-tenancy: does it have to be that hard? A good argument that shared solutions are hard.
- Designing Multitenant Applications on Windows Azure. The title says it all.
- Wikipedia on Multi-Tennancy.