Pages

Wednesday, September 25, 2013

Outages ARE Relevant!


By Bill Moran, Rich Ptak

Once again Amazon is in the public media, this time for outages in AWS, their public cloud.[1] Our earlier blog[2] commented on the GAO issued report[3] about the CIA awarding a contract for cloud services to Amazon and IBM’s subsequent protest of the award. The GAO accepted IBM’s protest but disallowed IBM’s attempt to point out the AWS’ history of significant outages as reported in the NY Times and other media during last year.

In that blog, we did not accept the government’s rejection of this part of IBM's protest. Since the CIA planned to move the vendor’s public cloud into a government datacenter, we thought that the track record of the vendor’s cloud offering to the marketplace was clearly germane.  We recommended that the government require vendors to provide data on their cloud’s marketplace performance[4]. One reason the government gave for rejecting IBM’s protest was that no information was available about Amazon SLAs (service level agreements). We suggested the government require Amazon and any other bidder to supply such information. 

Frankly, we don’t know whether or not the Amazon cloud has the reliability and security necessary to satisfy the intelligence community’s requirements. However, a failure to make a proper assessment of these issues could be very costly for the buyer. Based on the published evidence in the GAO report, it does not appear that any such assessment was made. In fact, little detail about the assessment itself has been released. 

Let’s explore this a bit further. Some years ago, we heard Scott McNealy, Sun’s CEO at the time discuss a conversation with an early purchaser of a new large Sun server. Sun had just entered the server business. The customer said they were planning to host a 911 service on the server. Scott admitted he was stunned as the customer described the significance and variety of potential problems if the system went down. Until then, Sun was a workstation company selling most of their products to engineers. It took Sun some time to adjust to the enterprise marketplace and the realities of enterprise reliability requirements. 

Whatever else one might say about IBM, one has to admit that, as a company, they understand enterprise requirements. They have produced many successful products targeted at the enterprise.
Amazon, on the other hand, has almost no track record in producing enterprise products. Of course, this does not prove that the Amazon cloud will not meet the intelligence community’s needs. However, it does indicate that the burden of proof is clearly on the CIA to require Amazon to demonstrate their cloud can do the job. It should be an Amazon responsibility to provide the necessary data on the operation of their public cloud.

Also significant is that it is no easy matter to re-architect a large and complex hardware/ software product. There are many examples of such costly occurrences that could be named, including some within the US government. At times, vendors struggle for years with these systems to meet customer requirements. In other cases, the projects have been abandoned. Generally, the problems do not really surface until after customer delivery and implementation begins. It seems reasonable to us that the CIA should take extra care to assure their cloud project does not add to this unfortunate list.

All-in-all, the final outcome of this bid is still unclear. It does appear that there are some serious weaknesses in the process that need to be addressed. We do think that one lesson to be learned is that the overall process is severely lacking in transparency. No one, not the tax-paying public, not the vendors not the government are being well-served by the secrecy that appears to be integral to the existing process. We suggest that the GAO and agencies consider a more transparent process.


[4] The project was rebid but there is an Amazon court case pending. Because of the secrecy we do not know what requirements that the CIA put on the bidders concerning the reliability of their public cloud.

Thursday, September 19, 2013

Top-of-Mind: Enterprise Concerns with the Cloud

By Bill Moran, Rich Ptak



The rush to the Cloud is well underway. For some, the transition has gone smoothly. For others, the switch has been more expensive and problematic than ever anticipated. The difference is due to the level of preparation, education and planning undertaken prior to making the move. The leap into the Cloud (or any new technology) has never been as easy or smooth as promised by promoters and some vendors. Most enterprises are reluctant to share a public analysis of problems. To our benefit, many government entities are subject to different rules and motivations so follies and missteps get identified and publicly aired, allowing others to learn from their missteps. Let’s look at one case. 

 Enterprises evaluating the Cloud can learn something from NASA’s experience.  Recently, NASA’s Inspector General (IG) published an audit[1] of the agency’s Cloud usage that highlighted some issues to which enterprises should pay attention. There is no intent here to bash NASA. The agency has a well-known history of pioneering in the Cloud. In fact, NASA created and contributed key technology that forms the basis of OpenStack. OpenStack has been widely adopted as the foundation for the Cloud industry standard technology[2]

We use NASA’s experience to provide examples of issues that can arise when dealing with the Cloud. The reader should keep in mind that NASA accepted the findings of the IG. It has already made plans to correct the problems that were uncovered. The points that we are making apply equally to both government agencies and private enterprises.

The issues that we explore break into four broad categories 1) Governance  2) Security, 3) Reliability, 4) Interoperability and Open Standards .  Let’s examine each of these in turn.   

Governance

Governance relates to the overall management of the Cloud. The NASA auditors began by doing a survey of the different divisions of the agency to determine the actual usage of the Cloud. They discovered that the NASA CIO was unaware of all of the ways the Cloud was being used in the different parts of the agency. The various departments had signed contracts with a variety of different Cloud suppliers. In fact, some individuals had used NASA credit cards to buy time from Cloud providers. These deals were made while generally ignoring processes and criteria relating to security requirements as well as existing Federal standards for Cloud contracts. 

 We suspect that most companies might be surprised to find their own internal Cloud usage is more widespread and unregulated than anyone truly knows[3]. The first item of business then is to determine how many departments have made their own arrangements with any of the numerous and varied Cloud suppliers. In this connection and for most organizations, we believe that the CIO’s organization will be expected to have the responsibility for the governance of the Cloud.  The NASA example shows that this assignment of responsibility needs to be clearly communicated throughout the organization. In most situations, it will be the CEO’s responsibility to clearly and unequivocally communicate this fact to everyone. Failure to communicate in a clear, pervasive, structured manner was key factor in NASA problems.

Next, it is necessary to catalog and review all of the business arrangements that these departments have made with the various suppliers. In most cases, they have probably just signed ‘boiler-plate’ agreements the vendor presented to them. The organization will need to develop their own specific standards for contracts and service level agreements to be used in Cloud service procurements.  The CIO will need to inform those responsible in the different functions that they need to follow these standards. Good governance requires monitoring to assure compliance.  Where existing agreements do not follow these standards they will have to be cancelled, modified or renegotiated to comply, whichever makes the most sense for the given department or group.

Security

The next item to be addressed is security.  Before getting into the topic of Cloud-specific security requirements, the enterprise must analyze its applications to identify and assess the risk undertaken if that application moves to the Cloud. The Federal government created a three tier classification of apps which provides a good starting point for other organizations. The NASA IG used this method. It divides the apps into low, moderate, and high risk. Low risk means that there would be little damage if the app was compromised. Moderate means that there would be damage if the app was compromised, but it can be contained. Finally, high risk means that there is the potential for significantly great damage. Security requirements escalate in scope and depth as the risk increases. Enterprises cannot move forward with a Cloud security strategy until they have appropriate classification for each of their applications.  It is worth noting that two moderate risk applications were moved by NASA organizations to outside Cloud providers with no special security steps taken. 

The recent Snowden case dramatically makes the point that many of the most egregious security violations are posed and occur from within an organization. Therefore, using a private Cloud and keeping it within the organization’s firewall does not eliminate all security threats. On the other hand, using a public Cloud does demand that the organization review, evaluate and monitor the security offered by a Cloud provider.
At the most fundamental level, good security demands that enterprises take the necessary steps to ensure that unauthorized parties are denied access to sensitive information.  This requirement continues when such information resides in a Cloud. We believe that as enterprises study their data they will decide that some information is too sensitive or its release would be too damaging to allow it to be in the Cloud.   

We said earlier that many organizations might be surprised at the amount of work that has already been moved to the Cloud. Generally, there is little long term harm when developers do this. However, moving production work is another matter entirely.  Business units tend to take a very short term view with a focus on ‘getting the job done’ or meeting immediate goals.  This can lead to serious problems when a quick solution leads to data breaches or security or reliability or availability failures. History teaches us that correcting the resulting problems can be very expensive.  In the category of production we include enterprises web sites. The NASA IG discovered that NASA’s key web site (NASA.gov) had moved to the Cloud. No test had ever been done of the security of the Cloud in question nor does it appear any attention paid to adhering to appropriate contracts or SLAs. The website did not comply with government policies.

Reliability

For an enterprise running applications in its own datacenter, the cost of reliability and its cousin availability is clear. There must be redundancy in the server, storage, software, and networking. The mantra is: ‘eliminate any single points of failure’. Periodically, failover drills are conducted to assure that the operations staff can manage moving applications to backup systems. Moreover, some provision need to be made to handle environmental threats such as fire, flood, hurricane, etc. All of these steps need to be taken otherwise reliability and availability can be seriously compromised.    

We find when organizations move applications to the Cloud, many ignore these requirements. The thinking is that they can skip these requirements because the responsibility has shifted to the Cloud provider. However, this is only true if it is detailed and specified in the contract.  Even then, reliability and performance guarantees need to be analyzed to assure they comply with what the organization needs and requires. Money saved moving to the Cloud is illusory if necessary reliability is compromised or not provided at all.  Reliability is not free in an organization’s own datacenter. It will not be free in the Cloud either.  Organizations must have a clear understanding of their reliability requirements; then communicate their expectations to their Cloud vendor. They should monitor and test to confirm that these are met over time. We expect wise organizations will split their work between different Cloud providers in different geographies or at the least between different and geographically separate datacenters of their Cloud provider. 

Interoperability & Open standards

For widespread adoption of Cloud, it is critical that enterprises be able to move applications and data from one Cloud to another. Col. Hill of the US Army, the head of the Futures Directorate, said that interoperability between Clouds as well as ability to move data from one Cloud to another is an important factor in the acceptance of Cloud[4]. He went on to say that the Army needed an open architecture that would allow it to use the best features of the various Clouds in the market now. To us, OpenStack currently shows the most promise of being that architecture. We recommend that organizations consider supporting and participating in OpenStack activities.  

Conclusions

It is clear that any enterprise evaluating the Cloud has a great deal of work to do. It needs to make sure that its data and reputation will be protected in the Cloud. The Cloud can offer substantial savings but it is critical that these savings do not come at the expense of the security, reliability, and interoperability that will be needed in the long term. Moving to the Cloud doesn’t relieve enterprise IT of its responsibilities of governance, security, reliability, interoperability and standards. It does mean that while the work related to these is done by others, the enterprise organization must have a clear understanding what is necessary and required in each case. Then, it must monitor, manage, analyze and verify that their internal requirements are met.



[1] For the report see http://oig.nasa.gov/audits/reports/FY13/IG-13-021.pdf
[2] HP, Intel, and IBM are just a few of the companies  that  have adopted OpenStack as the basis for their Cloud technology. There are currently more than one hundred companies in the OpenStack org. See http://www.openstack.org/
[3] Developers, in particular, are likely to go outside the organization to accelerate the development and testing of applications that they are responsible for.
[4] See  http://www.fiercegovernmentit.com/story/military-wont-commit-single-Cloud-computing-architecture-say-panelists/2011-05-17?utm_medium=nl&utm_source=internal#ixzz1RQqkS8Na

Wednesday, September 11, 2013

NeXtScale: it’s what’s happening with IBM System x!

By Rich Ptak

A little over a month ago, we asked ‘What’s happening with IBM System x?’ in response to rumors that IBM was considering ‘disinvesting’ and leaving the x86-based systems market. We examined the issue from a business, product and market perspective and reached the conclusion that “We don’t see any compelling evidence that IBM will or should abandon the x86.”  Today’s announcements concerning improvements, extensions and new products and services for the System x supports our earlier conclusion, as it documents their plans to greatly increase their investment and visibility in the market of highly flexible, general-purpose System x servers from the singles to massive enterprise server farms.

IBM NeXtScale System™ represents an explosive expansion and movement by IBM into the High Performance (HPC) and High Density computing market segments. It extends their current mix of offerings which include the x3100, x3250, x3530, x3550, x3630, x3650, x3690, x3750, x3850,  iDataPlex, etc. to extreme large scale systems.

IBM NeXtScale System steps up the game significantly as a key representative of x86 based NeXtGen systems. It presents a new architecture for IBM System x. The guiding design principles were flexibility simplicity and scalability. This system directly targets the general-purpose server market providing an attractively priced, high quality alternative to the offerings from HP and DELL.

System statistics and specifications will impress IT developer and operations staffs.  A joint venture between the US Research Triangle and a Taiwan-based IBM design/development team put together the system and roadmap that excels today and will grow smoothly into the future. Its components were designed to scale from components to single or double chassis unit to full out single or multiple rack applications.

Developers will like the simple, light chassis that is designed for ‘front-of-rack’ servicing, tool-less access to servers and server removal without touching its power. The compute, storage and PCI-GPU/GPGPU components are designed to swap easily, mix and match in standard configurations. All are compatible with standard racks. Storage and Graphics Acceleration or Co-Processing expansion units make upgrading easy without any unique mid-plane dependencies.

Operations staff will like the front-access to all components, including cable routing (if desired). All Power and LEDs are forward facing. Networking cables and Switches are front facing and direct to system with no proprietary switching. All switching is done at the top of the rack. Support is available for 1/10/40 Gb, InfiniBand, FCoE and VFAs. The system can be shipped fully configured and ready to power on. All hardware, software and management are designed to assure maximum power efficiency. We could go on but you get the idea. See here[1] for more details.

IBM identifies 8 key points of differentiation from competitor offerings; here are four that especially impressed us:

  1. No left/right servers needed (competitors require different servers for left and right sides of the chassis making replacement cumbersome)
  2. Simple, tool free installation of parts speeds installation and on- boarding (most other dense platforms require tools to install PCI cards and HDDs)
  3. Operation at 40°C (104°F) inlet air temperature can save money in the data center (most competition stops at 35°C (95°F))
  4. NeXtScale supports full TOP BIN Intel E5 2600 v2 130W processors (many other dense designs only support up to 115W)


IBM positioned the NeXtScale systems as complementary to both the existing iDataPlex and IBM Flex system offerings. Finally, IBM stated that they are will continue to sell the iDataPlex through 2015. Gen 1 NeXtScale systems are not right for everyone today. It lacks several features that iDataPlex has today including water cooling and 16 DIMM slots. Also, GPU/GPGPU support is available only on iDataPlex. (NeXtScale has plans to add support in Q1 2014.)

Conclusion
All in all, IBM effectively demonstrated that its commitment to and plans for the System x family extends well into the future. They are aggressively pursuing new market opportunities against established competitors with these systems, while they make enhancements to all parts of the System x family. To paraphrase Mark Twain, one of our favorite authors, “reports of the death of the IBM System x family have been greatly exaggerated.”


Publication Date: September 11, 2013

Monday, September 9, 2013

IBM SmartCloud Orchestrator – an end-to-end success


IBM recently announced General Availability of IBM SmartCloud Orchestrator (SCO). This is the newest addition to the SmartCloud Foundation solution family of standards-based IT solutions for cloud-enabled data center environments, including private, public or hybrid implementation models. The family currently includes products that enable cloud environments with functions that range from monitoring to management of security, performance, storage, control desk, workload automation, and continuous delivery operations. It also includes the complete family of PureApplication and PureFlex systems.

Figure 1 - Cloud-based service creation
We’ve heard and written about infrastructure orchestration in the past. We endorsed these efforts at automating infrastructure management as a very good thing. However, the vast majority of those orchestration products address only a very limited part of the application and service delivery process. In fact, as illustrated in Figure 1, they have been for the most part limited to addressing the activities of virtual machine provisioning (the part in the red box), not full cloud service provisioning. 

IBM SmartCloud Orchestrator changes all that by automating the orchestration of the full end-to-end cycle of provisioning and deployment of a cloud-based service. This new product enables IT to build and implement the fully-automated, dynamic, flexible, open orchestrated provisioning and deployment of resources, workloads and services.

Over the last few years, we’ve watched and commented about how IBM is building a complete architecture and basis for solutions that are targeted at major problems of IT responsiveness to business demands for faster service design, creation, development, testing, deployment, provisioning and updating. The challenge was magnified by the necessity to be able to run on and use an infrastructure that itself was under-going significant change and modification, not the least being the heterogeneity of cloud environments.

To that end, IBM invested time and effort working with and advancing a range of open-standards groups and organizations (TOSCA- platform services, OpenStack – for infrastructure service and OSLC – for governance services) to provide the necessary critical foundation for integrated management in heterogeneous operating environments. SCO with demonstrated interoperability of open cloud services is a testament to the benefits of and justification for those efforts.

The Final Word

We’ve seen multiple demonstrations of IBM’s SmartCloud Orchestrator. It comes with and has access to a variety of for fee and free workload patterns built for IBM’s PureApplication system. Patterns are usable as is or can be customized, even re-built to uniquely fit your environment. All are reusable and can be combined and extended with pre-built images, process/configuration automation patterns that are available through the Cloud Marketplace. Standardized service development and delivery makes administration and support, as well as development and deployment more efficient, which also lowers cost.

SCO yields benefits across the organization. It speeds the design and deployment of new services and applications, reducing IT costs as it improves efficiency. End-to-end exposure and detailed views of IT processes facilitates and simplifies a process of auditing and updating to eliminate inefficiencies and improve overall performance. 

For the business staff, the result is a reduction in the time-to-market and speeds the development of new services needed by the business staff to respond to and keep up with competition. The relationship between IT and business improves as IT effectiveness and efficiency in responding to business requests for service visibly improves.

IBM SmartCloud Orchestrator combines with the rest of the SmartCloud Foundation to automate and ease the effort required to keep track of the cost of cloud services, monitor and manage their health, as well as monitoring and managing infrastructure provisioning and the capacity used and needed by cloud resources. It sounds like an all-around win for IT and business to us.