 |
Major Manny Dominguez, USAF, Chief Information Officer (CIO) for the
Medical Education and Training Campus (METC) asks: "In moving
capabilities to the cloud, it will be important for Government/DoD
organizations to have an understanding of continuity of operations,
failover, and backup and recovery capabilities, with associated SLAs.
Please describe the key elements of these capabilities and how you
believe Government/DoD customers can verify them, and be contractually
guaranteed of their effectiveness."
|
|
Teresa Carlson
Vice President
Microsoft Federal
In a cloud environment, the principles of continuity of operations planning, failover and backup and recovery aren't much different from a traditional IT infrastructure. The big difference is that the potential scale of cloud computing ensures computing resources are available to agencies when they need them.
Large cloud providers offer environments that are worldwide in scale, with the ability to handle and route massive amounts of data. The data centers are enormous, and when there is spillover, or if a data center experiences a service interruption, traffic is automatically transferred to another datacenter with availability. The best cloud-based systems are redundant by design, with standardized processes for dealing with unexpected or unusual computing patterns. Not only does this provide greater flexibility and resources, but it allows providers to be completely transparent about where data is being stored or relocated to.
In terms of backup and recovery, large cloud environments provide capabilities that protect both the physical equipment and the applications themselves. Applications are replicated and stored in multiple data centers, so that if one location experiences a problem, the application can be accessed from a secondary data center. It's failover on steroids, and it's all because of scale. Major cloud providers build these capabilities from the ground up, and they add an incredible amount of resiliency to the entire operation.
Verifying these capabilities and ensuring effectiveness is a major issue not only for providers, but for legislators as well. Vendors have to be more accountable from a legal perspective – especially when protecting sensitive government data and applications. Citizens and organizations need guaranteed access to secure data, and cloud vendors must be transparent about documentation and controls. To make this a reality, the U.S. needs to adapt its communications technology laws to reflect the modern computing environment. Microsoft's Brad Smith has called for the creation of a "Cloud Computing Advancement Act" to establish best practices and increase confidence in the privacy, security and resiliency of the cloud. Steps in the right direction include:
- Reforming the Electronic Communications Privacy Act to include stronger security protections
- Updating the Computer Fraud and Abuse Act to provide law enforcement with the resources it needs to combat emerging forms of online crime
- Transparency provisions that ensure citizens and organizations have a right to know exactly how their information will be used, accessed and protected by service providers
- Initiate discussions with countries from around the world to establish global cloud standards, because it's not uncommon for data that originates in one country to be hosted in another
Industry, government and consumer groups must work together to create legislation that encourages innovation while demanding security and protecting privacy. In the meantime, it's up to vendors to be completely transparent with government agencies about resiliency and continuity capabilities, and for agency IT leaders to demand adherence to industry best practices.
For more information, please see Teresa Carlson's FutureFed blog at: http://blogs.msdn.com/USPUBLICSector/
Posted: June 14, 2010
|
Gregg (Skip) Bailey, Ph.D.
Director
Deloitte Consulting LLP
It is interesting that the issues of continuity of operations, failover, and backup and recovery capabilities are great strengths of cloud computing. In fact, these may be areas to launch your cloud experience in. There are at least three possible ways to use the cloud in your continuity of operations plan (COOP) and backup plans. First, you can provision and use the cloud to be a backup site for a traditional data center application. Second, you can use a traditional data center to back up a cloud implementation. Third, you can use a cloud backup to a cloud implementation. With all three approaches, security is the place to begin. For the purposes of this discussion, I will assume that you are comfortable with your security platform (a discussion for another time). I have also heard of using the cloud to test the COOP process without using live data. This solves some of the security concerns.
Let us take the first scenario, which I believe is one of the most effective ways to get experience with the cloud. In this scenario, you would have your traditional data center in whatever state of virtualization you may be in, and you would use a cloud offering to provide COOP and/or backup. Backup may be the easiest to provide security for as you can keep data encrypted the entire time it is on the cloud. Either way, you would be able to procure the compute or storage capabilities on the fly and only pay for what you use. If one service or application is deemed mission critical today, then you can provide for it. If over time you remove that application from the mission critical list, you can ratchet back the appropriate cloud services. In this scenario, you could gain valuable experience without interfering with the day-to-day operations of your information technology services.
In the second scenario, you could make use of your legacy data center to support your new cloud offering. This is not a likely scenario, but in some cases could be useful if the legacy environment were capable of backing up your new environment. In the third scenario, you would be moving to a total new environment and the issues would be very much like the ones discussed in the first scenario.
Now, a word about SLAs and the ability to determine if the uptime and service are acceptable. Obviously, the availability needs will depend on your mission and the needs that are required for your work. You should secure appropriate guarantees based on your needs. The good news is that most cloud providers have built in the precautions needed for such SLAs, such as power and communications redundancies and in some cases geographic diversity. The key is to know what the provider is providing and what you are responsible for. We are not to the point in maturity that you can treat the cloud as a black box. You should understand what the cloud is and how it is being provided to you.
For further information, please contact Gregg (Skip) Bailey at: gbailey@deloitte.com
Posted: June 23, 2010
|
Peter Coffee
Director of Platform Research
salesforce.com inc.
Public cloud services are clearing and illuminating the landscape of IT risk. Major cloud providers run homogeneous systems at nearly constant workload, with high degrees of automated or otherwise systematized management and fault mitigation. Further, the multi-tenant architecture of true clouds enables enormous reduction of points of failure and number of distinct failure modes – improving reliability, and also enabling superior visibility into operational state (as demonstrated by public Web sites such as trust.salesforce.com/trust/status and status.aws.amazon.com).
The initial question of SLAs suggests a more important question. Wouldn't any customer organization, in public sector or private, prefer reliable service – combined with ample warning of degradation or interruption – rather than merely receiving after-the-fact credit for the price of any services not received? Anything more than this would get into the realm of consequential damages, which is the domain of insurance companies rather than cloud service providers – but statistically significant data from large cloud providers will enable, in the very near future, a far more efficient marketplace in such coverage. There's every reason to expect that insurers will reward cloud service providers who set high standards for transparency, operational excellence, and consistently high performance by giving the customers of those services better rates for service interruption policies.
We must compare service level confidence, not to a theoretical ideal, but to present-day reality. At a conference in Singapore last year, a Red Hat executive observed that people ask all the time about the Service Level Agreement that they'll receive from a cloud service provider – while seeming not to notice that there are rarely any service level agreements to protect their on-premise IT assets.
"If the data center goes dark, or that server in the corner bursts into flame," Red Hat's Frank Feldmann rhetorically inquired, "do you have SLAs with the power company and the fire department?" Corporations don't have SLAs with such services, because those services respond all the time to similar incidents. This gives people a reasonable basis for estimating the risk they'll take by relying on those services, and lets them make a sound decision about any further steps they might need to follow (such as engaging private emergency-response firms, or employing their own local disaster teams). Cloud services' customers also have comparable options.
As more information services move to public clouds, agencies will be able to modernize their mitigation of IT risk. Today, every traditional data center is its own, uniquely configured, hand-built tower of toothpicks – with failure modes that are not precisely the same as those of any other facility. The hardware in use, the software version and configuration, and the skills of the operators vary enormously from one such center to another and even from one work shift to the next. There's no statistically meaningful data on operational reliability, which means there's no sound basis for pricing risk.
The operational advantages of true clouds, and the relentless competitive pressure of a marketplace of transparent service measurement, will drive cloud-based IT to achieve new standards of assurance.
For further information, please contact Peter Coffee at: pcoffee@salesforce.com
Posted: June 25, 2010
|
Larry Pizette
Principal Engineer
MITRE
Thank you, Major Dominguez, for sharing an insightful question that many civilian and Federal/DoD IT leaders are considering. Also, many thanks to the private sector leaders who shared their insight and perspective this month.
Continuity of operations, failover, and backup and recovery are capabilities that must be periodically tested and exercised, and this testing is not without service provider effort and expense. Consequently, from its initiation, a contract with a service provider should have language to account for the testing, expected performance levels, and expected measurement techniques for the COOP-related SLAs mentioned in the question. The contract will give the Government a means to exercise and verify the capabilities in an operational context. During the solicitation process it is useful for the Government to offer a document template of the service SLAs and their measurement techniques, along with a post-award operational verification plan. The content of these documents may be developed as part of the proposal by the potential vendors, as framed by the Government's requirements. A vendor's willingness to do special continuity testing for a government customer will likely depend on the business return the vendor anticipates, and not every procurement will provide that business return.
While COOP-focused requirements for all cloud deployment models flow from a systems engineering process driven by operational requirements, key elements include the specification of multiple distributed physical locations for data and processing capability, periodicity of backup activities, timing metrics, up-time expectations, failover recovery metrics, and network connectivity/throughput. It's important to include points of escalation and delineation of responsibility (e.g., responsibility for the network between a vendor and government organization). For community and public clouds, the agreement between the Government organization and the cloud provider should include mechanisms for verification and joint risk reduction. Potential areas for Government risk reduction are audits, testing backup and recovery capabilities in concert with the provider, and review of periodic metrics provided by the provider to the Government to give visibility into their operational performance.
Please note that there are several efforts in the works across the Government in this general area. For example, the GSA is "currently undergoing a procurement to award multi-vendor BPAs for IaaS offerings available on Apps.gov." The request for quotation (RFQ) requires backup and recovery capabilities and an SLA with a minimum of 99.5% uptime. The RFQ has categories for Operational Management and Trouble Management. Similarly, for security certification and accreditation, government organizations may be able to leverage the upcoming Federal Risk Authorization and Management Program (FedRAMP).
In the influential white paper Above the Clouds: A Berkley View of Cloud Computing, the authors listed availability of a service as the number one obstacle to cloud computing. To many, the question is about trust: Will the capability be there all the time when I need it? And will it be available under all circumstances, including crisis situations – natural or man-made – that may affect a broad area of infrastructure and other capabilities? As noted by Major Dominguez's question, Government organizations must pay careful attention to COOP-related SLAs, as they are key to establishing trust and meeting operational needs.
For further information, please contact Larry Pizette at: cloudbloggers-list@lists.mitre.org
Posted: July 2, 2010
|
If you would like to contribute an answer to this question, or future questions, please Contact Us. Terms and Conditions of Use
|
|
If you are from a U.S. government agency or DoD organization and would like to pose a question for this forum, let us know.
Welcome
"Ahead in the Clouds" is a public forum to provide federal government agencies with meaningful answers to common cloud computing questions, drawing from leading thinkers in the field. Each month we pose a new question, then post both summary and detailed responses.
Current Month
January 2011
|
|
|