Lets take a few minutes to chat a bit about storage systems and how that is super important for a good CommuniGate Pro hosting platform deployment. One big mistake with storage we find in just about every deployment is the misunderstanding of how the usage pattern and loads of CommuniGate Pro differ from that of say a database or file server in an Enterprise deployment. With that in mind, how about we set the record straight on how we like our discs and arrays to taste and digest?
CommuniGate Pro is highly adaptable in scale; up or down. The same binary can be deployed in situations where there is nothing more than a single server and internal drives, up to a Dynamic Cluster that has multiple storage systems; some being internal to the physical machines, and some attached as shared storage, or with many arrays “attached” as logical systems that are presented to individual domain or realms. To set a target benchmark or “good example” we will use a multiple system architecture and a typical topology that is “common place” in our partner networks.
Let us begin with a case study of a regional Network Operator that has 1 million subscribers and sells broadband services to both consumers in residential deployments and also provides business subscribers broadband dedicated links. For the residential subscribers the operator provides email as a bundled service. For the Business subscribers the operator provides a richer experience business grade unified communications suite with value added services like VoIP, premium groupware style email and storage. As you can imagine the “load profiles” of the subscribers will be radically different even in the cases whereby all subscribers are using Webmail access (no IMAP/POP or SIP); the two example “group types” of services will place different requirements on your SaaS platform and as a result the storage subsystem.
Lets round it all down to some simple numbers for the purposes of this chat:
Our example network operator: “super duper broadband”
Total subscribers: 1 million
- 900k consumer subs in a single dedicated domain “superduper.com”
- 100k business subs spread across 1,000 domains ( like “bluedonuts.com” & “dieselcroissant.fr”)
Profile type: medium load (90% of the subscribers are consumers that do not login all day)
Quota type: 100meg on consumers and 1gb on the business subscribers ( <30% utilization)
*Concurrent Web sessions (https + XIMSS): 70,000
SMTP inbound per hour: 800k
SMTP outbound per hour: 200k
Estimated IOPS on Backends: 7,000
Estimated Storage capacity total: 40TB
Capacity planning begins with understanding your customer and their usage patterns. For example, the consumer subscribers can radically vary in usage patterns because some people buy a ADSL line and get 5 email accounts bundled and never use them. But the accounts over time become filled with junk mail or notices eating up storage for no good reason. On the other hand, older people tend to use the email accounts provided by the operator and stick every photo and document they have into their folder tree. So when the account is “active” we might find they want and “expect” the speed and usability of an enterprise messaging solution.
Business subscriber profiles obviously will have completely different usage pattern; linked to the “work hour” & attachment penchant that users typically require as the messaging system becomes used almost like a fileserver. The email “storage and archival system” for business communications is fundamentally important to business in general and operators can find this as a “value add” offering. Finding a “weighted” profile is key; only your Network monitoring will provide useful parameters that are provided by your subscribers. There are many “optimization” techniques and policies that we provide our partners to deploy and manage more effectively the storage and load characteristics of the CommuniGate Pro SaaS delivery platform.
Perhaps one of the most commonly deployed CommuniGate Pro Dynamic Cluster “layouts” is the 4 x 3 architecture where you have 4 physical or “bare metal” servers on the front side that are using internal storage with SSD drives . These servers typically use storage for logs and do not need a large amount of capacity if you have set up log rotation in a CRON job or have systems management to deal with housekeeping. Keeping in mind that the load patterns of the Frontend servers are far more CPU intensive today in light of the fact that WebRTC sessions and https to the edge are now using SSL or other encryption techniques. On top of this are loads that anti-abuse filters, policy management rules and traffic shaping engines place on the frontend server array.
The example layout has 3 Backend servers which are the only systems in the Dynamic Cluster that have connections to the “shared storage” where the account account reside. This is usually mounted as /var/CommuniGate/ on each machine in the Backend server configuration/s. That means that the “Frontend servers” do not have access to the shared storage, and when they want to access an account they make the “request” to the Backend array and when authenticated, the Backend/s decide which server will open the account directory and give that info back to the session the frontend system is controlling (thru webmail for example).
As you can imagine the load on the backend servers will be far more on the IOPS side of the equation compared to the Frontend servers (CPU heavy) that are dealing with sessions doing authentication and/or encryption; which equates to CPU cycles. Therefore, we can also propose that when deciding on your network switches and interfaces the Backend severs should always be placed on dedicated switches and use 10G ports or fiber channel.
The above diagram shows a properly configured Dynamic Cluster network with 4 dedicated IP ranges and layers ideally on dedicated switches. The following is a sample Networking topology, but this should not replace talking with our engineers as part of any production Dynamic Cluster.
- Public Network – This is the externally facing Network, on a routable IP block, normally with one or more IPs assigned to the Loadbalancer on the “external interface” All Frontend Servers in the Dynamic Cluster will have one Network Interface with an IP on this network and a DNS entry.
- Inner Cluster Communications Network – This private network, using non-routable IP address blocks (such as 192.168.x.x.) is used for the cluster members to communicate with each other. No other traffic should be put on this private network. Frontend servers should have a second interface configured with an IP.
- Management Network – This private network might be the shared LAN of the Operator or ISP NOC (Network Operations Center). This could be another non-routable Network (such as 172.16.x.x.) Each server should have another network interface configured so administrators can have access to the Administration UI or APIs for provisioning, billing, and other management duties.
- Note: There may be times when a fifth network is used for management of the server at the OS/BIOS level. Many Sun and HP servers have a “lights out” port that can be connected to secure VPNs or terminal servers used to gain access to the machine in the cases where there are connectivity issues or the server hardware or power has failed.
- Storage Network – This private network, with a non-routable IP block (such as 10.10.x.x) is used only by the backend servers to communicate with the shared storage system. This network should be high speed, Fibre or 10GE.
We will not dig too much deeper into the networking other than to say we want the storage LAN to be dedicated or without other traffic. We also strongly recommend that the storage network be 10g and use SSD’s whenever possible. NAS has advantages both economically and for a reduction in complexity as we do not use file locks in the Dynamic Cluster and our performance is orders of magnitude better than most NFS applications that need the logic on the filesystem.
Back to our case study reference point and we have arrived finally at the point to talk about what new toys for the geeks to rack up. When thinking about the 900k subscribers it might be totally reasonable to use a “spindle based” NAS solution, while for our business subscribers we put all those domains and their storage on a SSD based rack. CommuniGate Pro has the ability to move domains or accounts to Logical arrays that are mounted as specified in a preferences file in the directory of the domain.
For spindle based NAS systems we find a few things that will often “trip up” the purchasing or specifications. RAID level 5 should always be avoided, and when possible we like RAID 1+0 (a.k.a RAID10) and yes this will double the “spindle count”. Nevertheless the stripping over a rack of drives gives us the IOPS we want; more economically. Often times a storage vendor will view CommuniGate Pro like an enterprise load and not have a good picture of how the Dynamic Cluster operates. In fact they might be “honestly” looking to save some coin by using RAID levels that boost capacity.
Another area that is important in the specifications of a large NAS system with many spindles is to choose disc that are the fastest and not the largest. If the target is x on size or “Terabyte capacity” having 4 LUNS of drives, where each is say a cabinet of 24 drives is far better than discs of double capacity and only having 2 cabinets or 48 drives versus 96.
Lets say then we have a NAS head on 4 cabinets of drives or 96 usable spindles. Using an LVM we span the LUNS into a single volume and present that to CommuniGate Pro for /var/CommuniGate and stripe over all the drives in RAID10 to achieve the best performance in IOPS whereby we exceed what would be possible in other configurations.
Our business subscribers will get a link to the optimal storage array when their domain is created. Meaning the domain is created that will send their profile over to a SSD based storage system that can have other value added services like backups, archival rotation and encryption policies.
A properly configured Dynamic Cluster has the potential for 100% uptime; we have dozens of examples whereby the operator partner tells us that CommuniGate Pro is the longest running and most stable app in the data center. In some cases the only “major upgrades” to the system are a result of the hardware vendors EOL or phasing out support for the servers. Several of our partners have Dynamic Clusters up for 8+ years non-stop. That being said, another good tip for your architecture is to plan your change management and how to deal with load sharing during spikes and peaks.
The CommuniGate Pro Dynamic Cluster has a “rolling updates” mechanism that allows software or hardware to be serviced or swapped with no downtime. In our Dynamic Cluster the “server systems” or “cluster members” can be updated one by one when the administrator takes the cluster member into the “non-ready mode”. The cluster member when put into this “non-ready state” stops taking new connections and can be upgraded or the hardware can be changed entirely. In addition, you can easily add in new cluster members to deal with loads, or switch operating systems and hardware vendors. It is possible to run mixed systems; such as FreeBSD Frontends and Linux or Solaris or AIX Backends.
One thing to remember is when we take a cluster member offline for service, you are removing the load capacity that member provides to the entire Dynamic Cluster subscriber base. In our 4 x 3 example the backends are each providing 33% load and when you place one Backend member into “non-ready” mode, the remaining systems are now in a 50/50 load situation. These Backend server members should be designed to deal with that load in maintenance or hardware failure situations. It is far better to have “more cluster members” versus having big servers that deal with a large % of load. My rule of thumb is to never have more than 33% load on any cluster member, and that is “peak” load, not nominal operating parameters.
IOZONE tools for checking IOPS
Linux FIO tool description
Performance on Linux in virtualized environments
CommuniGate Pro benchmark with IBM
CommuniGate Pro SpecMail results
SpecMail benchmark topology