The Riverbed Blog (testing)

A blog in search of a tagline

Can this WAN optimization solution scale?

Posted by riverbedtest on July 14, 2008

It’s a clear fact that not all WDS solutions can scale as necessary for deployment into large enterprise-scale networks with hundreds or thousands of sites and heavy traffic loads.  Though there are many WAN optimization vendors in the market promoting their products, some of those products are best confined to the test lab or to small production networks.  So what are the characteristics that distinguish those solutions that are capable of scaling for large-scale enterprise deployment?

In this blog I’ve compiled a short list of characteristics that that can provide hints on the scalability of any given WAN optimization solution. 

1)  No per-peer data store — A scalable solution will have a single universal data store for communicating with all remote WAN optimization devices.  Some WAN optimization solutions use a separate data store to communicate with each peer device.  Solutions that use this fragmented data store approach are not going to be able to scale the storage capacity for the core central device.  For example, in the case of an identical file accessed by users at 10 different branch offices, the relevant data should only be stored once in the central WAN optimization device, not 10 separate times as would be the case with a solution using a per-peer data store.  In this latter case, the situation becomes significantly worse if there were 100 sites, and especially 1000 sites.  The core device in a solution with a segmented per-peer data store will not be able scale its storage capacity to support these requirements found in large-scale enterprise networks.

2)  Only one type of data store — A scalable solution will store the relevant data in only one type of data store, at the byte level, using a single unified data store.  Some solutions have multiple types of data stores.  They may have a file cache, and email cache, an FTP cache, etc…  Then to top it off they may also have a byte-level cache.  This may work in a test lab for small-scale tests, but they won’t scale in a large enterprise network there may be hundreds or thousands of users simultaneously accessing data through the WAN optimization device.  In such a demanding environment, a product that has to read and write the same data to disk twice–once at the file level for the file cache, and a second time at the byte-level for the byte-level cache–is clearly not going to keep up with the processing and throughput demands found in these environments.  It is counter-intuitive that accessing a single file over the WAN generates twice the amount of disk activity on the part of the WAN optimization device compared to the original file server.

3)  Asymmetric routing — Asymmetric routing is inevitable in large IP networks.  Does the product documentation explicitly describe how to address asymmetric routing, for both in-path and out-of-path configuration deployments?  If not, then you should be suspicious.  Asymmetric routing requires explicit mechanisms for it to be addressed, particularly for in-path/in-line deployment configurations.  If the solution addresses application-level latency and protocol chattiness problems, then the relevant state of the connection must be handled by a single device on each side of the WAN.  If there is asymmetric routing in the network and there are no explicit mechanisms to ensure traffic can be deterministically forwarded to the same pair of WAN optimization devices, then that solution will break.

4)  No tunnels — Tunnels are a nightmare to configure and roll-out to large networks.  There may be hundreds or thousands of IP subnets in large enterprise networks.  With a tunneling solution, each and every one of those subnets must be individually identified and configured into the tunneling solution’s central manager.  Okay, that’s more configuration work, but so what?  Well, most enterprise networks already use IP address management tools to track IP address and subnet assignments.  When you use a tunneling solution, you’ve now got two places where those IP subnets must be updated–first in the original IP address management tool, and now a second place in the management station for the tunneling solution.  Of course, there is no automatic mechanism to make sure the two IP addressing databases are synchronized, so who has responsibility for reconciling any differences? 

5)  Has plenty of references available — If a solution is truly scalable, then it should already have been deployed by others who have gone before you.  This is especially true if you’re talking about WAN optimization solutions offered by large established network equipment vendors.  As I’ve stated in previous blogs, one of the best ways to ensure that a given solution can be scaled when deployed to your network is to ask the vendor to give you references of other customers with a similar or greater number of sites and end-users.  The vendor should easily be able to provide these references to you, if the solution is indeed scalable as the vendor claims.  I would strongly advise that you talk to a real customer in a confidential call, and not accept marketing brochures, case studies, or 3rd-person anecdotal accounts.   

4 Responses to “Can this WAN optimization solution scale?”

  1. Curious said

    what about TCP sessions? Should the device not also be able to support a good amount of TCP-sessions to handle all of the users and all of the sites. Especially in the core?

  2. Josh Tseng said

    TCP connections can be one metric, but without any other context it’s an incomplete metric of how scalable a device is. For example, a vendor may claim support for up to 50,000 TCP connections. How many of these connections are active? How many are idle? What is the average throughput among the 50,000 TCP connections? Without the rest of this information, you can’t tell how scalable or capable a product is. I know that some vendors claim an unlimited number of TCP connections. Does that mean their devices can support an unlimited amount of traffic?–probably not. Rather, it just means their devices are more or less stateless at the TCP layer. In other words, if their devices see 1000 packets, they don’t care if those packets represent 1 packet each from 1000 TCP connections, or 1000 packets all from the same TCP connection–their devices treat the traffic in the same way in either case. For that reason, just because Vendor A claims their devices support more TCP connections than Vendor B doesn’t mean that Vendor A’s product is more scalable.
    Best regards,
    Josh Tseng

  3. Joshua,
    Yes, Scaling is a difficult problem to master. The real measure perhaps is in actual deployed networks. Like the floating 240 count accelerator network Expand has deployed in the Maritime fleet for the government. (That is not the extent however — Expand has just passed its 10,000 box deployed in the various branches of the US government.) Or the 3800 count accelerator deployment in the insurance industry. Or the 800 count network accelerators Expand has deployed for one customer. (That is not the extent however — Expand has just passed its 10,000 box deployed in the various branches of the US government.)
    As far as the TCP connections and sessions and whatnot — I think your conclusions have to do more with the limitations of your product — the difficulty of your configuring tunnels etc. There are other approaches and as you can see from the above — they have very little impact on the scale of the deployment.
    What is your largest network deployment — I heard it was in the 250 unit range which is impressive?
    Best Regards,
    Stewart

  4. Josh Tseng said

    Stewart,
    Congratulations to you and Expand for being able to deploy so many devices. However, you fail to mention that those large deployments are with your older disk-less compression devices, and that those deployments took place a long time ago. I believe Packeteer has similar if not larger deployments, with their PacketShaper products (again, disk-less), and look at where they are now.
    The main issue with those older products (both Expand’s and Packeteer’s) is that they are simple packet compression devices that did not address appliation-level protocol chattiness issues. In a WAN environment with high latency when you’re using an application like CIFS, Exchange, NFS, etc., your memory-based packet compression devices do very little to improve performance.
    Josh Tseng

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: