My Thoughts on Steve Jobs

Wanted to share some thoughts about Steve Jobs’ passing today.  In my lifetime I won’t see another — at the same level as Steve influencing both business and technology.  The tweets I’ve seen today from my tech friends and cycling friends have told the impact.

I didn’t think I’d see the second coming of Apple.  When I was very young, all the cool families had an Apple.  I was stuck with the uncool TRS – 80.  I remember wondering around the malls looking at the Apple from afar.  Then x86 took hold on me.  When I was at Mayo, I remember our first electronic medical record project, and the failed, heaping pile of Macs that sat in our building waiting to be donated away.  But for me there was a sadness even at that time.  I wanted the Newton to be successful.  NEXT was interesting to me, being a hard core Unix guy.  The second coming was a great thing to see!

The legacy for me is both a mixture like Steve’s grasp on the world — business and design and it just so happens he chose to enlighten IT.  I can’t imagine the complexity that I would have in my personal and business life without Apple. I am 100% sure that I would not have thought I’d ever say that in the 90s and even early 2000s.  Two of my favorite Jobisms…

Real artists ship — always move the ball forward.  Perhaps the 4s is that.

Don’t ask customers what they want — Ford said the same thing.  Jobs took this to another level, and reduced complexity in a convenient and always on way.

On the business side Steve took on the music industry and brought the consumers choice.  We probably didn’t see this at the time but it was a huge impact on how we view the music world today.

RIP Steve. Thanks for the ride.

2010 in review

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads Minty-Fresh™.

Crunchy numbers

Featured image

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 1,800 times in 2010. That’s about 4 full 747s.

 

In 2010, there were 14 new posts, growing the total archive of this blog to 20 posts. There were 8 pictures uploaded, taking up a total of 3mb.

The busiest day of the year was April 15th with 42 views. The most popular post that day was What You Don’t Want — the Cloud and Cost to Deliver.

Where did they come from?

The top referring sites in 2010 were cloudbook.net, twitter.com, linkedin.com, ethernetalliance.org, and Google Reader.

Some visitors came searching, mostly for felt f1 sl, felt f1 sl review, felt f1 review, felt f1 2010 review, and felt f1.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

What You Don’t Want — the Cloud and Cost to Deliver April 2010

2

Initial review of Felt F1 SL Team March 2010
1 comment

3

About October 2009

4

The Future of Cloud Data Center Networking April 2010

5

Cloud Deployment Best Practices February 2010

Development and Getting it Right

Lots of thoughts ruminating the last week or so on the kick off of a new development project, and how to get it right. I’m reminded of some quotes from books or key folks in this space (e.g. Steve Jobs) that help form my thoughts around execution:

Real Artists Ship — Steve Jobs

Thrash Early – Seth Godin

Perfect is the Enemy of Good (something I use to describe cloud computing quite often)

Avoid “Just one more thing” — and this seems to get worse the longer you take between releases.

Got some more?

My Keynote from Ethernet Alliance – Next Gen Network and System Design

It’s been a while but wanted to post my presentation from this event —

Great mix of chip designers, network engineers, and great session on TRILL vs SBP and status in these areas.

The entire afternoon was spent discussing impact of virtualization on networking. Extreme networks highlighted the 1-2 additional layers in the network after looking at blades and/or virtual machines. I highlighted in my slides the policy shifts and changes that are happening in these various layers.

Regulating Scale and Velocity — Wall Street vs Clouds Part 1

Looking at the “sell-off” and market hiccup that occurred Thursday last week formed an analogy to cloud computing for me. Trading is now a distributed process, something we were reminded of last Wed. Trades occurred, perhaps erroneously, perhaps not, that spanned various exchanges under “rules” apparently set by those who know best but seemingly find enforcement a struggle (as it often is in distributed systems.) Even though trading on some specific securities stopped on some exchanges, they were allowed to occur on others — and the “view” we have into this is a single asking price, apparently an accumulated or averaged set of values.

We learned this single view was not reality, not was this tied to a single security but effected many in ways that we are still trying to understand.

IT also looks for a similar singular view. Few actually achieve this. Even today, where still most workloads are ran inside fairly well-known controlled environments (e.g. data centers) that you own or control. Or perhaps the workloads have ventured out on the “cloud” and some risk is spread across providers and on-premise data centers. Or maybe all in the off-premise cloud for those that judge the risk (or type of workload, data, etc) acceptable.

But as above highlights changes are afoot. Increasingly workloads are deployed across on-premise and off-premise clouds. Scale is increasing. Scale? Just like billions of stock trades, there are billions and billions of objects now marshaled to provide a “view” of a service — and to provide such a service to end users.

I probably pick on both IT vendors and IT users equally. I believe all sides can do a better job in building and implementing technologies to help with scale, velocity, and the distributed nature that is network computing.

We need to increase investment in three areas:

1) understanding how to provide IT policy management, definition, and distributed regulation

2) improved service definitions as an extension of workload/payloads (could be a subset of #1 but there is a “state-change” when you go from modeled to deployed (e.g. instance immutability) — and this is a critical point to understand

3) creating more reflective architectures — meaning creating connected autonomous systems that are regulated by #1 and run #2, with concrete well defined interfaces and policy enforcement controls (and where events can be triggered both inside and outside the system.)

Dealing with these issues is not easy, especially in multi-vendor environments. It requires a strong partnership between vendor and user/customer.

More on this in the coming week, as well as my take on how to actually build solutions that incorporate these areas.

The Future of Cloud Data Center Networking

Jason Corollary: “Impossible” exists only because we haven’t stated or re-factored the problem so it is “possible.”

I have thought for the last few years that in virtualized and larger density data centers (a PDF link) there has to be a better solution than MAC and IP.

IP is challenged by having location embedded in the address. Not good for a world that wants to be able to move “cloudy” workloads across compute nodes, PODs, and data centers. One solution I’ve seen is to late binding on this IP layer but session state becomes tied and unable to migrate without loss. And let’s not mention the oversubscription issues happening in most of these topologies today.

I’ve also (see my earlier posts) never liked the complexity of managing switches. With SDN we basically could take a switch out of the box and configure a couple things and be good to go. Now companies are linking virtualization management with switch management, dealing with some complexity around this aspect but still route configuration in a non-layer 2 “flat” network is managed by admins.

Today, I enjoyed listing to a talk about PortLand (targeting 100,000 nodes and 1m VMs, full BW to each node), a UCSD project (with a great Itunes U talk) to provide a self-managing, scale out networking layer 2 design. The build in parallel from some of work happening on TRILL and others around some of these issues. But I like their design a bit better — don’t assume global host address space, assume a multi-routed tree (see corollary above.) It assumes a level of backwards compatibility — critical for widespread adoption.

They have created a simple protocol that seems to make well to the modular world of the data center future. — Location Discovery Protocol implemented in the rather simple layer 2 switching fabric. There’s also a fabric manager — necessary to maintain some level of backwards compatibility and management of the MAC mappings without changing protocols. Forwarding in the topology is done completely at this “pseudo” mac layer. This is created dynamically, hierarchically, addressing the global addressing (l2) and memory constraints that some of the other solutions in this space seem to be facing.

Looks promising!! Would like to see more work around latency, without the assumption of internet connected services. The PortLand work does address dynamic hierarchy, where as if more intelligent proximity (or P2P) based data stores etc could be used it might address the latency within POD vs outside, etc. All that is stored at the pMAC level.

Even love some of the “self-integrity” monitoring by neighboring switches like we thought about for Project OpenSolarisDSC. Wonder how we can help them with their fabric manager?? Hmm…thinking…

What You Don’t Want — the Cloud and Cost to Deliver

Providing cloud computing at scale is about economics. The delivered services have a narrow range of margin over the first 2-3 years, as to be expected when you outlay some cash for infrastructure, design, implementation etc. Hopefully after year 2 you achieve positive margin and start to pay yourself back. This works well in an environment that meets the Cloud’s perceived “80/20″ rule where your meeting 80% of the needs of 80% of the customers out there. The economics and time to market should help offset the 20% of remaining need, and customers can shop elsewhere for the 20% “doesn’t fit here” remainder.

With that said, how does one manage the cloud delivery process? In many hosting situations, hosting companies have done customized negotiations for clients, one-offs, and custom designs depending on how large the opportunity is. Many hosting companies are getting into the cloud game, e.g. AT&T, Telstra, and Terremark. How does one balance custom requirements with the cloud? When does it make sense to deviate from the 80/20 rule? Or when does it make sense to sell 10-20% of your capacity to one customer or workload?

This can be a difficult decision. There are many factors. It plays to standardization and service management. It requires careful analysis on who else might want that feature and when.

I theorize that a significant change to the “cloud” services to provide a specific new feature for a customer or two will likely result in changing the cost of service for the entire platform in a bad way. The deviations need to be managed and well thought-out.

There will be an effect on cost to deliver — can your pricing sustain that? Can you pass on that cost to the new customer and not effect overall delivery margin? Maybe the addition of this new feature or platform will help drive adoption of your cloud, in which case that may certainly offset the overall change in cost. Can you do it without adding additional complexity? The “zen of cloud” would almost state that you must.

Hopefully some of the questions work out! It will hopefully avoid this…

whatyoudontwant_img.jpg

What’s an example of a significant change — let’s say your definition of SMALL LARGE compute service doesn’t fit with a customer’s definition. They want something that is medium plus more memory. That’s ok, but if you spec’d your PODs (point of delivery) with re-stacking workloads in mind (and you should) you now may find yourself with spare capacity on a node that you might not be able to utilize.

IT, Complexity, and Clouds Redux

Update:

I listed to Dr. Paul Borrill’s (exSun DE, VP, Veritas CTO) talk at Stanford on itunes the other day. I loved it. My favorite statement was on TIME = CHANGE. It’s exactly in line with much of my thinking and observations after life in IT for 15+ years. I and some other smart people at Sun (now Oracle) put some of this to “code” with DSC – dynamic service containers awhile back. But listening to his talk inspired me to do some more thinking in this space. I’m reposting this here. It was on my other blog a couple years ago or so.

I also read Lori MacVittie’s post on Apathy vs Architecture — it highlights some other aspects of why we are here and why we must work harder to solve the hard problems.

More to come but here’s some background…

I’ve been doing some thinking lately around the cloud model and how enterprises might adopt it. Enterprises are challenged with a conflict — between giving their developers control and choices, and maintaining operational control. Case in point — the ownership around SLAs if often with the operations/adminsitration org — not the developer. The developer in many cases is hoping that most of the “systemic qualities” will appear within the platform and not necessarily require lots of development time. An interesting example of improvement in this space is the SHOAL project around Glassfish.

One of my employees is working on some modeling projects — trying to model the data center “as is” vs deriving the model from a “perfect” state where choices are somewhat removed from the scenario. I mean the data center is architected in specific ways that allow or disallow some functionality — you see this in very large sites, like Google and Yahoo. They have several major architecture patterns where many or most services confirm to those patterns. You want to deploy? You conform.

This battle is often up hill. The last 20% of a solution is the area that you spend the most time on, convincing others of the design or that “good enough” will trump perfect. But I think we need to get over that — we can’t afford not to.

Graffiti is a good example. Hand writing recognition was very hard, companies failed trying to figure this out. Did they constrain the problem (thus the solution) enough to progress to something that works without a whole bunch of “change?” Jeff got it right — fix the few letters that cause the problem (i vs L) and constrain the problem. He found a solution. We’ve gotten a bit more flexible today but its still the core thinking in the industry.

What problems can we solve today if we limit the choices, give a way a little control, and are able to take technology to the next level?

UPDATE: Forgot the Jason Corollary: “Impossible” exists only because we haven’t stated or re-factored the problem so it is “possible.”

Requirements for a real-time cloud marketplace

There’s been lots of press (AWS’s SPOT Instances, Zimory, here’s one from Vinton Cerf) the last few weeks about cloud moving towards becoming an interconnected set of compute/data processing infrastructure. One step towards that is certainly some level of interoperability to be able to deploy apps — e.g. what Rightscale does essentially today or libs like libcloud that provide a consistent interface to many clouds/providers.

I sat down about 6 months back with Lou Springer to discuss the idea of cloud marketplace and we came up with a few items that would need to be addressed:

–workload transportability — and its beyond encapsulation in images but a description language of the relationships to data and “data physics”

–network transportability — our DNS-based way of doing things on the net has been pretty broken for a while. One way to do this is almost provide an escrow service that handles service delivery addressing. Payload size is also an issue (see above)

–workload rating/pricing — all apps have a time to live — how do I price my workload? what access does it need? what are the constraints to this model?

–capacity management — how do I know what capacity I have in order to provide a price?

–run-time permissions/evocation — at some point I will want to ensure that my stray workloads are not run by providers that I don’t want to run them anymore.

–provider indemnification — in some cases providers would rather be completely unaware to a high degree of what is running — sure network ports and the like are fair game but some providers may not want to be able to run reports against your workload. Is there a model that allows us to encrypt “everything” and provide the other values above?

I’m sure there’s more. What’s missing from your cloud marketplace?

Part 3- Cloud’s Impact on Data Center Architectures

This is part 3 if my work in progress that I hope to release in more detail as a paper! Find Part 1 and Part 2 here.

ITArchitecture and the Business

It is often said that architecture is a study in tradeoffs. Perhaps its best to look at this architecture problem from the perspective of three different types of customers (or divisions, organizations, projects, etc) These groups are not holistic in nature — every company has parts of its business in each category.

cloudeffect_usersjpg_crop.jpg

All of these enterprises provide value to customers. The first provides value by its external connections and “systemness.” The second wants to build connections to what it has. The third is perhaps shrinking in market share and needs to rapidly redefine its cost structures. They share other similar characteristics at time. But they also differ. The quickly growing often cannot utilize “off the shelf” technologies because they don’t exist at the scale necessary, are too expensive, etc.

The second is looking at often legacy data and providing it in new and exciting formats to provide additional business value to itself by enabling others. It may have invested in large scale databases at centralized facilities. Perhaps it wants to move towards real-time analytics and provide these functions across the world.

The third may more rapidly embrace internal consolidation strategies or public cloud strategies. Optimization for this type of business is to quickly reduce complexity and increase operational optics by quickly determining what is core, what isn’t and devising a strategy to deal with it accordingly. This category is constrained by not only cost but generally the ability to go beyond a single IT platform layer.

These businesses must all make decisions about where to invest and where not to. How do you leverage what provides the competitive advantage vs something others have? Increasing connections, increasing value by making what you have available to more audiences, or by going “double-down” on what you need to do to stay in business?

Follow

Get every new post delivered to your Inbox.