Challenges to DevOps and How to resolve them 2018
The collaborative foundation of DevOps decrees positive and well-intentioned communications. Defining the rules of engagement that satisfy this expectation equips each team member for success. Knowing that communication underlies and perpetuates all aspects of DevOps encourages team members toward effective communications.
Each entity, DBA, or DevOps team member must take personal responsibility for the success of this talent merger. Years of resentment and uncooperativeness has brought team division to new heights. The cultural position of DevOps brings opportunities to bridge the divide for true team partnerships.
Because DevOps continues to achieve emergent momentum, DBAs might need to come around a bit further than the already engaged DevOps players.
As with any movement or incipient technology framework, new nomenclature develops that takes time to learn and understand. Existing DevOps team members need to educate DBAs on terminology as much as practical DevOps techniques and tools.
After becoming familiar with the DevOps approach and pertinent processes and tools, DBAs introduce database practice experience to expand perceptions of data protection, schema management, data transformation, and database build best practices.
DBAs who meld database management approaches into DevOps practices that are aligned with shared goals are successful only if the DevOps team members understand DBA methods and can see the value brought to the overall DevOps model.
Rules of Engagement
Guidelines are important for communicating and working effectively because differing collaborative terms pop up every few years with different names and different bullet points. They all have the same purpose: to respect each person and the value he or she offers.
As a United States Army veteran, a term such as rules of engagement resonates. Aligned with the DevOps principles, here is an easily understandable set of guardrails to keep us all communicating and operating efficiently:
• Goal alignment: Have a collaborative approach among team members who agree on common goals and incentives: strive to harvest excellent software products hosted on sustainable and stable infrastructures while continuously improving processes, automation, and cycle time.
• Deliverable co-responsibility: No single actor should be allowed to dominate or distort the principles, direction, or team accountability and actions, thus safeguarding DevOps guidelines and the Agile self-forming team concept.
• Speak to the outcomes: Require constant and consistent verbal communications for expeditious task coordination and execution, matched by effective and timely decisions to drive expected outcomes.
• Change adaptation: Accept business and customer fluidity as product requirement drivers while slaying traditional project management strategies better guarantees project success.
• Give the benefit of the doubt: Grant people grace, and trust that their intentions are good and intended for the team’s benefit. Embrace the possibility that you may be the person causing team tension and then stop doing so.
Continuous may be the most frequently heard word in DevOps conversations. Here’s why:
• flow: Work is always progressing and driven by automation, having value deliberately built in at a sustainable pace. Several Agile methods specifically limit the amount of work that can be in process at the same time.
Limiting work-in-progress grants focus and time to properly construct the product and product testing, which lead to better outcomes without over pressuring the staff.
• build: Build tests and code, preferably in that order. With QA shifting to the development stage, code with fewer defects can be created at a lower total cost of ownership.
• integration: Combine new or changed application code with the core product through extensive functional and performance testing, and correct defects immediately to produce the next product version.
• delivery: Ensure that the software product is positioned at all times for production release or deployment, encapsulating the building, testing, and integration processes. Successful integration produces the deliverable, making continuous delivery a product state, not a process.
• deployment: Where applicable, production deployments should occur as soon as the product is ready for release after integration (this is less likely for legacy environments).
• feedback: There should be persistent communications concerning the product quality, performance, and functionality intending to find, report, and fix bugs faster or to correct performance earlier in the pipeline. Commit to the “shift-left” concept.
• improvement: Apply lean principles to eliminate waste, build in value, reduce defects, and shorten cycle time to improve product quality. Team members should take time to reflect on completed projects or sprints to increase productivity by staking claim to value-adding tasks and shedding inefficiencies and unproductive steps.
Depending on the tools used, product themes abound. There are chefs with recipes, cookbooks, kitchens, and supermarkets; a butler, puppets, blowfish, broccoli, maven, ant, and many other strange yet fun product names. Check out XebiaLabs’ Periodic Table of DevOps Tools.
Automation and Orchestration
Automation focuses on executing tasks quickly. Building a script to run a set of database change commands is automation.
Orchestration focuses on process or workflow. Building a series of steps to execute tasks in a defined order to produce an outcome is orchestration. Spinning up a virtual database host combines automation (the set of commands for each task) and orchestration to run the tasks logically.
Languages vary among DBAs. For example, application DBAs talk code execution efficiency, logical DBAs (aka data architects) talk about normal forms, and operational DBAs talk about performance.
Yet DBAs manage to keep databases humming along—most of the time. Although there are differences in DBA roles and responsibilities, the end game is database stability, performance, availability, security, and recoverability (to name just a handful). DevOps team members must understand the DBAs’ database protectiveness and self-preservation tendencies.
After spending long nights and weekends recovering from code deployments that took months to build and test, it makes less sense on the surface to reduce the time spent building and testing the software.
DevOps team members are challenged to shine a light on the new paradigm and emphasize that the speed is offset by fewer code changes, which improves the odds of a successful deployment.
Also let the DBAs know that as a DevOps team, failures cause all team members—including developers—to be all hands on deck. Now that it is in everyone’s best interest to implement change correctly, DBAs are no longer the only people pursuing self-preservation.
Language and Culture: More than the Spoken Tongue and Traditions
The IT world is diverse on many levels, which is great! I have learned much from working with people in the United States, but also in South Korea, West Germany (I still make the distinction because I was serving in West Germany when the Berlin Wall fell), and for about a week in Brazil. I have also learned things from people in other states because diversity is needed.
As DBAs and DevOps team members come together, the differences add the value. Think about it; if everyone on the team knows the same things, all but a single person are redundant. People speaking different languages figure out how to communicate effectively, so DBAs and DevOps team members can do the same.
The difference is often perspective, which I have mentioned before: repetition reinforces ideas. DevOps is more a cultural shift for IT than a process shift. Sure, the tools and schedules are different, but those elements are easy to learn or adapt to; a culture shift requires time to digest the idea and bring everyone along.
Let’s take a look at the world of IT from different perspectives to begin to understand where DevOps is taking us all.
Resiliency versus Complexity
Resiliency describes the ability to sustain operations or to quickly restore operations when a failure occurs. For application systems with data stores, database clustering provides resiliency—the failure of one node does not reduce transactional throughput.
That happens when the cluster is built to withstand a single node failure, with the remaining nodes sized to maintain 100% capacity at mandated response times. A pool of web or application servers distributes the workload while improving resiliency because surviving nodes maintain operations when a node fails.
Resiliency can be scaled to meet financial considerations. Under the plan using the clustered database example, a single node loss could result in a 30% decrease in load capacity; mitigation must be preplanned to stop or deprioritize enough load to not impact critical operations. For example, batch processing or reporting can be suspended until the system is at full capacity.
DevOps provides an answer to the capacity problem if the database clustering can benefit from the host build template scripts. The loss of one node can be quickly offset by an automated build of a new node that can be introduced into the cluster. Furthermore, additional capacity can be activated when demand exceeds capacity.
Resiliency from clustering and other high-availability solutions does have a drawback: complexity. Be sure to not increase complexity to an unsustainable level when designing critical systems.
Overly complex systems with tricky interdependencies that create situations in which maintenance and upgrades are postponed defeats the purpose of resiliency. Being resilient requires keeping pace with database upgrades and security patching to increase stability and prevent breaches or data theft.
Rolling upgrades and patches signal resiliency by demonstrating the capability to maintain continuous operations while simultaneously improving the platform. Extending this capability to be able to completely replace the database environment with an upgraded or different database altogether, and with a fallback plan in place to return to the previous platform, exemplifies resiliency.
DevOps brings about the opportunity to maintain resiliency with less complexity because the web, app, or database servers can be built in minutes or hours instead of the weeks or months it used to take to acquire and build servers. Virtualization is a major enabler of DevOps.
Simplifying architecture and application code runs counterintuitive to real–life IT solutions design, yet it is still a smart move for the long run. True solutions design not only leads to the best possible product but it also restrains from adding anything distracting to the product.
As DBAs and DevOps team members unite, they resolve to fight complexity with design eloquence and minimalist tendencies and prevent complexity from entangling DBA processes that may harm pipeline efficiency. Excitement builds as expectations for simple, precise, and demonstratively improved business systems are realized from this joining of forces.
Packaging and Propagation
Thoughtful and well-planned database software build packaging and propagation can be used to maintain resiliency, as described previously, but it can also be used for on-demand capacity, multisite distributive processing, and maintenance of pipeline database consistency.
Packaging versioned releases for upgrade simplification must include database owner and other account privileges needed for distribution. Database installs in which an OS hook must be executed by an administrator account need to be scripted to pull needed credentials during execution. The scripting must also ensure that password information does not get written to the installation or audit logs.
The shift goes from lengthy and tedious manual installs or lightly automated installs to a completely automated build that can be done fast enough that IT has the agility to immediately respond to demand, not after weeks of struggling to keep a system running in overload mode.
Structured and Unstructured
For decades, the relational database has been the database of choice, and large companies have invested millions in licensing and platforms. Without viable options, project data storage requirements landed in a relational database management system (RDBMS), regardless of the data structure or even the content.
More recently, many newer, viable database options are becoming mainstream, but it is still a hard sell to convince the upper echelon that additional investment is needed for another database ecosystem.
Even open-source databases come with staff support and hardware costs, or monthly DBaaS payments. Forcing data models into unsuitable databases deoptimizes solutions. From the start, performance is less than it could be that when a better–fitting database engine manages the data.
Maturing DevOps organizations lean toward optimized solutions, making force-feeding data into a database unthinkable. Relational databases remain “top dogs” as databases of record for transactional data. As applications shift toward multiple databases backends, services or APIs provide data call abstraction to maintain flow.
Unicorn companies start with very little cash flow, limiting the affordable scope of databases. Open-source databases enable individuals and small teams to build application software with a data store.
As these companies grew, the databases scaled to the point at which other companies took notice. When CIOs drive down IT costs, looking at alternative databases becomes a viable (and street-proven) option. DevOps leverages this learning, making it possible to store data in the database best suited for the content, pulling along cost-cutting options.
[Note: You can free download the complete Office 365 and Office 2019 com setup Guide for here]
Audit reviews are a necessity when build automation replaces human control. DBAs who install software pay attention to the screen messages, responding to configuration questions and noting errors that need attention. The risk is that the same person might do a second install that is not exactly like the first.
Vendors have included automation scripts for years, but platform differences still happen. DevOps automation is meant to build the complete platform without a person making decisions because the decisions are built into the automation or gathered before automation execution.
A developer requesting a new web server should need to provide only primitive inputs up front—OS, web server brand, and a few sizing parameters—before the automation kicks off.
There are legitimate reasons to pause automation, but asking for more information should not be one of them. As mentioned, automation is task-based, so stopping the orchestration is more likely. Automation and orchestration need to generate audit trails.
True to DevOps, audit log checkout should be automated because no DBA or DevOps team member wants to review pages and pages of audit information. Learning which error codes or other failures to search for tightens the noose around inconsistency. More importantly, governing bodies require documentation for application changes, which makes the audit log that evidence.
Repeatability of tests or builds improves the efficiency of code, and infrastructure as code, along with the full continuous delivery pipeline. Being able to build servers quickly allows developers to experiment with different code techniques or operations to build capacity on demand.
DBAs are used to being responsible for database builds, so it may take a little time for them to get used to the idea of developers building and destroying databases at will.
DBAs can instead create templates for the way databases are built, which seems like a better deal. Limiting the numbers of unique database software installs and database builds has advantages. The code should execute exactly the same within a version. Troubleshooting narrows from having fewer possible variables.
Once a problem is found, knowing where to apply the fix is easy. When testing a change, the way the database executes the change should be consistent on like architectures.
As much as possible, the nonproduction environment should mirror production, decreasing the chance of change failure caused by underlying configuration differences. Build repeatability is a win for developers, DBAs, and DevOps team members.
Nothing causes a puckering posture more than a potential data breach. On the scale of security threat mitigation, preventing data breaches sits at or near the top. Partnering with the information security team, DBAs play an inherent role in data protection. DBAs, as custodians of the corporate data assets, consider security a key deliverable.
Although database access comes in many forms, in all cases access should be granted only after authentication, and each access needs to meet audit requirements. Authentication can be granted by the database, application, or single sign-on protocol. Each authentication must be logged for auditing.
Each access, whether as a user request, job initiation, or integration interface, should be uniquely identifiable for auditing. How the auditing is performed is less important than the auditing being done.
The auditing may be controlled within the database by using a built-in feature or with application code that writes the audit information to a table or file. Importantly, DBAs should not be able to alter the data once the audit record is created, which protects the information from less-scrupulous DBAs.
Data encryption protects data at rest, including data stored in the database or stored as files. Many database products offer encryption, though it may be easier to use storage-based encryption, which covers the database and file data.
At a minimum, Social Security numbers (SSNs), credit card numbers, personal health information, and other sensitive data elements must be protected, which should already be done where compliance with governance requirements such as SOX, HIPAA, PCI-DSS, and more are enforced and audited.
Secure SSL protects data in transit, to and from the database to the application tier or end-user device. Preventing “on the wire” breaches is nearly impossible, but at least it should be challenging for the data to be interpreted.
Developers do consider security and at times write code to implement data protection or data hiding; for example, not allowing application users to see full SSNs (just the last four digits) when the user’s role does not require knowing the full SSN.
Developers may also code in calls to encryption functions or packages to obfuscate data elements. Storage encryption solutions are usually easier to manage and provide full data coverage, but not all organizations scale to the level at which the cost can be justified.
DevOps automation and orchestration should include security implementations. Configuring SSL and installing certificates should be automated.
Creating service accounts needed for application access to the database should be automated. Disabling FTP and Telnet on the host should be automated. Each of these automation pieces is collected for orchestration.
Computers continue to increase in processing power (more importantly, in transactional throughput), which allows more work to be done in less time. No matter how fast computers become, overhead work always reduces the optimal ceiling.
Work minimization improves optimization. Lean methodologies drive out unnecessary work to improve process times and reduce waste and cost. IT shops are learning from lean methodologies, DevOps being one representative model.
Execution plans define how the database engine decides to retrieve, sort, and present the requested information or how to modify the data per instruction.
Optimizers do a terrific job building execution plans, although misses still occur. If a query is performing poorly, the execution plan should be an early check in the troubleshooting process.
DBAs must interrogate the execution plan to determine appropriateness, which requires experience. Developers make great partners when checking execution plans; they are capable of interpreting the plan in light of what the code was built to do.
Code consistency matters for some database engine implementations. During the process of building execution plans, these databases interpret uppercase and lowercase letters as different, making a simple one-character difference appear to be a completely different statement. Keeping code consistent increases the reusability of plans already stored in the cache.
Using replaceable variables may also help optimize cached statement use. As DBAs integrate into DevOps teams, ensuring that solid code practices are in place to ease the database load is a step in the right direction.
“Hidden” predicates can make evaluating code and execution plans more challenging; just consider the possibility when the execution plan seems reasonable while performance lags. Security implementations may be the culprit, and one might expect the “secrets” to not be revealed.
An easy test to determine whether hidden predicates used by Oracle’s Virtual Private Database (VPD) are in play is simply to run the statement using an account with more authority.
Improved performance indicates the need to check for additional predicates. You may have to use a tool from a performance products vendor to find the predicates. Once discovered, improving performance may be as easy as elevating account privileges or executing with an account with more authority.
Sometimes reworking the code does not lead to enough performance improvement, making the privileges decision the fix. Also, if you know that something like VPD is implemented; jobs and reports suddenly take a dive in performance by two-, three-, or four-fold (or more);
And the database was not changed, check account security because it is not beyond the realm of possibility that a security job was run to correct perceived audit discrepancies.
Optimized code sheds unneeded work and data touches (the latter is critical to result in set size) and reporting and ETL processes in the context of the batch.
Selective predicates—the where clause statements—reduce execution effort and time while also lessening the burden on the database as a whole. DBAs understand, and developers and DevOps team members need to learn, that each segment of work contributes to the overall database load. Therefore, anything that can be done to reduce work at the statement level benefits all database transactions.
Leverage indexes for improved performance. Performance drags when large data scans are performed unnecessarily, making index selection critical. Whether an index was not considered as the code was built and implemented, or the statement was written so the optimizer decided that no existing index met the execution needs, performance suffers.
Today’s computing power and high-performing database engines contribute to response times in the low milliseconds for simple transactional reads and write, meaning that DBAs should seriously question response times that take a second or longer.
Kernel configuration undergirds databases and applications, ensuring resource availability. DBAs who lack kernel-tuning experiences are missing an opportunity to truly take full advantage of the underlying hardware and OS. DBaaS solutions being preconfigured leaves kernel configuration and tuning in the hands of the provider.
Otherwise, DBAs should work in tandem with SAs to monitor and tweak the kernel for better performance or go with a PaaS solution for more control over the database configuration, at the cost of increased maintenance overhead.
Network configuration is usually not a high-priority performance differentiator; it becomes a concern only when huge data sets have to be transferred over the network.
Even then, the primary focus is outside of the databases, requiring OS and/or network configuration tuning. Common modifications include increasing the maximum transmission unit (MTU) to pack more data into each network packet, or (when available) using “jumbo” packets that are dependent on platform options. Either way, the change needs to be done at both endpoints.
Data movement impacts performance based on volume. Remember that networks cannot get faster, only bigger. They are capable of moving more data at the same speed, but the amount of data that needs to be moved directly impacts the time needed for the move.
The larger data sets tend to be between the database and the application servers. Latency increases as the distance between point A and B increases, extending the time needed to move the data.
An easy test: place one application server in the data centre that hosts the database and another application server in a location geographically distanced from the database location.
Test data pulls of increasing size until the data move duration becomes apparent. Then consider that impact spread across thousands of customers. Even if the distance is not a concern, it remains a wise decision to limit the data volume because client machines possess varying network–traffic processing capabilities.
Virtualization has improved server resource usage and facilitated data centre consolidation from increased compute density per floor tile. DBAs need to ensure that the assigned virtual resources are “locked” so other guests can-not “steal” resources.
Resource reallocation generally helps to balance loads, and it produces excellent results in most cases. Databases are one exception because they do not play well with other kids in the same sandbox.
Just for fun, test the scenario in which a guest steals memory from the database guest. Nothing says “horrendous performance” faster than the database cache being swapped in and out of memory!
Being able to transfer information in your head to someone else’s head should be a required skill for all team members because one Agile precept (extended to DevOps) states that each team member should be able to perform all the team’s functions.
For DBAs, that implies that you are unlikely to be the only person creating automated database change scripts. Instead, you could be reviewing code and looking at audit files to improve automation execution.
Your DevOps teammates have the responsibility of making you a full-fledged team member. On the flipside, DBAs must teach DevOps team members how to manage database changes to support the development pipeline. This knowledge sharing is a great thing, especially if you want to ever take an uninterrupted vacation.
Teaching can and should be done formally and informally. Formal teaching requires planning, topic definition, and preparation to ensure that the information to be conveyed happens successfully.
Informal teaching can be done by sitting next to a team member (similar to extreme programming paired programming) and working through a database change or writing an automation script. Informal teaching includes talking with teammates while having lunch together or when gathered at the team’s favourite after-work watering hole.
Sharing knowledge within teams is step 1. Self-forming teams are a key Agile and DevOps element, but self-forming teams do not imply forever teams. As products and demands change, teams eventually disperse and reform differently, ready to complete new work. Team redistribution leads to knowledge distribution.
Something you taught to one team can now be shared within other teams, extending your impact and making teams more effective, while expanding the organizational knowledge base.
Personal development should not be replaced by team training; instead, personal development should inject fresh ideas and skills. Too often, people attend training specifically only for their primary technology skill:
Java programmers take an advanced Java class, or DBAs take a backup and recovery class for the database platform they support. My approach has always considered three perspectives that I believe fit the DevOps model of shared work.
Core technology or skill: deepening your core skill and intending to become an “expert.”
Aligned technology or skill: expanding your sphere of impact by adding complementary skills such as surrounding technologies. General or soft skills: communications, leadership, time management, and business understanding.
After the code has been implemented, the final step in the pre-DevOps model is usually Operations team members figuring out how to implement backups, monitoring, batch processing, reporting, and more.
DevOps makes it feasible to gather the operational information earlier in the process, which allows automation to handle much of the operationalization. For example:
• Backups: Backup software or agents can be installed and configured during the server build, including setting the schedule.
• Monitoring: Like backups, software or agents can be installed and configured, and registered to the administrative or master console, including baseline performance settings.
• Scheduling batch and report jobs: Load management pertains to distributing background work across the day interval to not impact transactional systems while completing batch and report work. Scheduling can be automated, even with load protections to delay execution for a prescribed time if the system load is high.
• Capacity management: Not the annual growth predictions, but real-time activity, monitoring provides opportunities to take proactive steps to add capacity on demand, or at least to plan for the capacity to be added soon. Adding a fifth application server in real time to a four-node app server farm quickly provides 25% more capacity, provided you have the needed server build automation in place.
Once the server is built and ready for traffic, a quick update to the load balancer can be made to start directing traffic to the expansion server. Imagine being able to upgrade the entire farm by building replacement servers with higher transaction throughput and then making changes at the load balancer to insert the new servers and delete the old servers.
DevOps automation opens new windows to improve operational performance and resiliency with real-time capacity management. Capacity management needs to consider more than database growth; instead, it should encompass the full IT supply chain, up- and downstream.
Resiliency is the capability of a system to continue to function or recover quickly from failure. Being designed and baked into an application architecture that results in a high-availability infrastructure implementation able to tolerate single device failures affords strong continuous business capability.
A three-node server cluster built with enough horsepower that a single node failure can be absorbed by the two remaining servers without performance degradation demonstrates resiliency.
Failover is a methodology for moving a failed or significantly impaired production environment onto another similar system, usually located near or in close proximity to the primary system. One caution comes in the form of the statement “failover to DR,” which may not mean exactly what is stated.
Cost and complexity decisions weighed against business needs may lead to investments in like systems or smaller investments to provide a portion of the transactional capability of the primary systems as a stopgap until the main production platform can be operationally restored.
The transition involves redirecting all computers communicating to the failed system, which may simply be updating a few entries in a load balancer or be a complicated and tedious effort to manually point each interfacing system to the temporary production environment, having to repeat the same effort to fall back to normal operations.
Recovery dictates backup requirements. Many DBAs ask, “How should I back up an X TB database?” The question should actually be this: “How should I back up an X TB database when the business demands a 2-hour recovery window?” Understand that the recovery requirements drive the backup solution.
With DBaaS, the recovery requirement needs to be settled as a deliverable in the SLA. An in-house backup may be to disk or to a virtual tape solution (disk) that is capable of recovering an X TB database in under 2 hours.
Because the business wants the recovery to take no longer than 2 hours, the recovery must allow time to start the database and reconnect dependent systems before access is granted.
Server failures without the aforementioned resiliency models in place to maintain operations are more complicated to recover. The whole system may have been to be recovered from backup, or with DevOps automation, the host environment could be rebuilt new from a template package, followed by the database restore.
Disaster recovery is a program designed to protect the business from a catastrophic failure, most likely the destruction of a data centre.
This form of recovery must be specifically planned and exercised, with predetermined executives authorized to declare the event and open the chequebook to cover the costs of people, vendors, and computing resources needed to recover automated business operations in a geographically distanced data centre. DRaaS options are relatively new, albeit gaining respect and maturing quickly.
Business continuity is the business-side recovery process when disaster strikes, including an event requiring the disaster recovery program to be activated. Business continuity is more likely to be activated due to a natural disaster or civil unrest than the failure of the company’s data centre. Planning and exercising options lead to success.
Knowing how to operate the business during the crisis—civil, natural disaster, or technology unavailability—covers a much broader scope than the disaster recovery program.
DevOps automation mentioned briefly here, brings a new and exciting option to the world of recovery. The capability to generate new virtual hosts or full application host environments on demand quickly presents the opportunity to improve recovery times. Database hosts can be rebuilt, but the database must be recovered.
More apparently, web and application servers built from predefined templates and install packages should be considered and tested for recovery-time comparison. Having the ability to build systems quickly as a recovery process frees traditional resources for other work.
If the current disaster recovery program includes replication between the primary and DR site, consider stopping the replication of web and app servers; for example, instead opt to build these servers on demand, potentially saving bandwidth costs. Ensure that the DR location has available resources for the automated build restores.
DevOps is an opportunity. Bringing together talented professionals to complete new missions by using new methods and tools facilitates business agility and growth while improving customer experience and developing IT team members.
Two obstacles—language and culture—can be easily overcome with frequent communication, the willingness to share experiences, and selfless knowledge sharing. The end game is to build great DevOps teams that are capable of delivering software and infrastructure better and faster than ever.
Adding DBAs to DevOps teams amps up team capabilities while making it possible to reduce risk by incorporating database builds, configurations, and changes into the Agile pipeline. This addition also removes a long sidelined process outlier to just another automation to be included in the orchestration.
Making the IT Process Visible
I like to start any DevOps consulting activity with a value stream mapping exercise. The reason is quite simple: it is the most reliable exercise to align everyone in the organization and my team to what the IT process looks like.
You could look at the methodology page or some large Visio diagrams for the IT delivery process, but more often than not, reality has evolved away from those documented processes.
I have outlined the process to run such an exercise at the end of the blog section so that you can try this too. In short, you are bringing representatives from all parts of the organization together in a room to map out your current IT delivery process as it is being experienced by the people on the ground and, perhaps more importantly, reveals areas within the system that can be improved.
I suggest that you engage an experienced facilitator or at least find someone unbiased to run the meeting for you.
Ideally, we want to be able to objectively measure the IT process in regard to throughput, cycle time, and quality. Unfortunately, this is often a work-intensive exercise.
Running a value stream mapping exercise every three to six months (depending on how quickly you change and improve things) will give you a good way to keep progress on the radar while investing just a few hours each month.
It will highlight your current process, the cycle time, and any quality concerns. You want to make the result of the exercise visible somewhere in your office, as that will help focus people on improving this process. It will act as a visible reminder that improving this process is important to the organization.
Once you have a good understanding of the high-level IT process and the areas that require improvement, you can then create the first roadmap for the transformation.
Creating the First Roadmap
Roadmaps are partly science and partly art. Many roadmaps look similar at the high level, yet on a more detailed level, no two people create the exact same roadmap.
The good news is that there is no one right answer for roadmaps anyway. In true Agile fashion, it is most important to understand the direction and to have some milestones for evaluating progress and making IT visible.
Many things will change over time, and you will need to manage this. There are a few guidelines on how to create a good roadmap for this transformation.
Based on the value stream map of your IT delivery process, you will be able to identify bottlenecks in the process. As systems thinking, the theory of constraints and queuing theory teach us, unless we improve one of the bottlenecks in the process, every other improvement will not lead to a faster outcome overall.
This is important, as sometimes we spend our change energy on “shiny objects” rather than focusing on things that will make a real difference. One good way to identify bottlenecks is to use the value stream mapping exercise and let all stakeholders in the room vote on the problems that, if addressed, will make a real difference to overall IT delivery. The wisdom of the crowd in most cases does identify a set of bottlenecks that are worth addressing.
There are two other considerations for your roadmap to be a success: flow and speed of delivery rather than cost and quality. A focus on flow is the ultimate systems thinking device to break down silos in your organization.
In the past, the “owner” of a function, like the testing centre of excellence or the development factory, ran improvement initiatives to make its area of influence and control more effective.
Over time, this created highly optimized functions for larger batch sizes to the detriment of the overall flow of delivery. Flow improves with small batch sizes.
There are usually three ways to evaluate IT delivery: speed, cost, and quality. Traditionally, we focused our improvements on cost or quality, which, in turn, often reduced the speed of delivery.
If you evaluate your IT delivery by just looking to improve quality, you often introduce additional quality gates, which cost you more and take longer to adhere to.
If you evaluate your IT function based on reduced cost, the most common approaches are to push more work to less experienced people or to skip steps in the process, which often leads to lower quality and lower speed due to rework. Focusing on cost or quality without considering the impact on flow is, therefore, an antipattern for successful IT in my experience.
In contrast, focusing on speed, specifically on bottlenecks that prevent fast delivery of really small batches, will bring the focus back to the overall flow of delivery and hence improve the speed of delivery in general (even for larger batches), leading to improvements in quality and cost over time.
It is impossible to achieve higher speed if the quality is bad, as the required rework will ultimately slow you down. The only way to really improve speed is to automate and remove unnecessary steps in the process.
Just typing faster is unlikely to do much for the overall speed. So, speed is the ultimate forcing function for IT. I have been in transformations with clients where cost was reduced but the overall delivery experience continued to be bad for business stakeholders.
I have also seen a lot of quality improvement initiatives that stifled IT delivery and nearly ground it to a halt. I have yet to see the same problem with improvement initiatives that evaluate based on speed.
Two words of caution when it comes to speed: The first one is really not that bad of a problem. You can obviously “game” the speed evaluation criteria by breaking work down further and delivering smaller batches, which can be delivered faster.
While this does not result in a like-for-like comparison of the speed between batches, it is still a win for the organization, as smaller batches are less risky. The second warning is that people might look for shortcuts that increase risk or reduce quality.
To prevent this, you need to continue to look for quality measures on top of speed to make sure that quality is not dropping as speed increases. To evaluate for speed, you will look at work coming through your delivery lifecycle, and the process of measuring it will make it more visible to you.
Good measures for speed are cycle time for work items (cycle time = time from work item approved to work item completed and available in production) or volume of work delivered per time period.
Your overall transformation roadmap will likely have milestones focused on different functions and capabilities (e.g., automated regression testing available, lightweight business case introduced), which makes sense. However, there is another dimension, which is the coverage of applications and technologies.
In the next blog section, I will explain how to do an application portfolio analysis that allows you to identify sets of applications that you uplift as part of the transformation.
Your roadmap should include prioritized sets (often called waves) of applications, as an organization at scale will not be able to uplift absolutely everything. You shouldn’t anyway,
One last comment on the transformation roadmap: many capabilities and changes require a significant amount of time to implement. Unfortunately, organizations are not very patient with change programs, so you need to make sure that you build in some early and visible wins. For those early and visible wins, all other rules do not apply.
They can be with applications that are not critical for the business or in areas that are not part of a bottleneck. The goal of those wins is to keep the momentum and allow the organization to see progress.
You should see these as being part of the change-management activities of the transformation. Of course, ideally, the early and visible wins are also in one of the priority areas identified earlier.
Transforming your IT organization will take time.
As you adopt DevOps and become faster, you start to realize the organizational boundaries and speed bumps that are embedded in the operating model, which require some real organizational muscle and executive support to address. Don’t be discouraged if things don’t change overnight.
Governing the Transformation
As mentioned earlier, the roadmap is important, but without appropriate transformation governance, it is not going to get you much success. All too often, transformations get stuck.
It is not possible to foresee all the challenges that will hinder progress, and without appropriate governance that finds the right balance between discipline and flexibility, the transformation will stall. Transformation governance makes the progress of the transformation visible and allows you to steer it.
It’s different from the normal IT delivery governance that you run for your delivery initiatives (e.g., change review boards). In a meeting with a number of transformation change agents and consultants at the 2015 DevOps Enterprise Summit, we tried to identify what it takes to be successful when adopting DevOps.
We all had different ideas and were working in different organizations, but we could agree on one thing that we believed was the characteristic of a successful organization: the ability to continuously improve and manage the continuous improvement process.
This continuous improvement and the adaption of the roadmap are the largest contributors to success in transforming your IT organization. DevOps and Agile are not goals; hence, there is no target state as such.
What does successful transformation governance look like? Governance covers a lot of areas, so it is important that you know what you are comparing against as you make progress with your transformation. This means you need to establish a baseline for the measures of success that you decide on before you start the transformation.
Too many transformations I have seen spent six months improving the situation but then could not provide evidence of what had changed beyond anecdotes such as “but we have continuous integration with Jenkins now.”
Unfortunately, this does not necessarily convince a business or other IT stakeholders to continue to invest in the transformation. In one case, even though the CIO was supportive, the transformation lost funding due to a lack of evidence of the improvements.
If you can, however, prove that by introducing continuous integration you were able to reduce the instances of build-related environment outages by 30%, now you have a great story to tell.
As a result, I strongly recommend running a baselining exercise at the beginning of the transformation. Think about all the things you care about and want to measure along the way and identify the right way to baseline them.
The other important aspect of transformation governance is creating flexibility and accountability. For each improvement initiative, as part of the roadmap, you want to leverage the scientific method:
Formulate a hypothesis, including a measure of success.
Baseline the measure.
Once the implementation is complete, evaluate the result against the hypothesis.
Some things will work, some won’t; and during governance, you want to learn from both. Don’t blame the project team for a failed hypothesis (after all, it should have been we, as leaders, who originally approved the investment—so, who is really to blame?).
You should only provide negative feedback where the process has not been followed (e.g., measures were not in place or results were “massaged”), which prevents you from learning.
As you learn, the next set of viable improvement initiatives will change. Your evaluation criteria of the initiatives you want to start next should be guided by previous learnings, the size of the initiative following a weighted shortest job first (WSJF) approach, and how well the team can explain the justification for the initiative.
Don’t allow yourself to be tempted by large business cases that require a lot of up-front investments; rather, ask for smaller initial steps to validate the idea before investing heavily.
You should keep an eye on the overall roadmap over time to see that the milestones are achievable. If they are not anymore, you can either change the number of improvement initiatives or, when unavoidable, update the roadmap.
In the transformation governance process, you want a representation of all parts of the organization to make sure the change is not biased to a specific function (e.g., test, development, operations). Governance meetings should be at least once a month and should require as little documentation as possible.
Having the transformation team spend a lot of time on elaborate PowerPoint presentations for each meeting is not going to help your transformation. Ideally, you will look at real-time data, your value stream map, and lightweight business cases for the improvement ideas.
Making IT Delivery Visible
Talking about making things visible and using real data, it should be clear that some of the DevOps capabilities can be extremely useful for this. One of the best visual aids in your toolkit is the deployment pipeline.
A deployment pipeline is a visual representation of the process that software follows from the developer to production, with all the stages in between. This visual representation shows what is happening to the software as well as any positive or negative results of it.
This deployment pipeline provides direct insights into the quality of your software in real time. You might choose to provide additional information in a dashboard as an aggregate or to enrich the core data with additional information, but the deployment pipeline provides the core backbone.
It also creates a forcing function, as all the steps are represented and enforced, and the results can be seen directly from the dashboard, which reduces the chance of people doing things that are not visible.
Any improvements and process changes will be visible in the deployment pipeline as long as it remains the only allowed way to deliver changes. Where you don’t have easy access to metrics you can also add steps to each stage to log out metrics for later consumption in your analytics solution.
Having an analytics solution in your company to create real-time dashboards is important. Most companies these days either use a commercial visualization or analytics solution or build something based on the many open source options (like Graphite).
The key here is to use the data that is being created all through the SDLC to create meaningful dashboards that can then be leveraged not only during the transformation governance but at any other point in time.
High-performing teams have connected their DevOps toolchain with analytics dashboards and it allows us to see important information in real time. For example, we can see how good the quality of a release is, how the quality of the release package relates to post-deployment issues, and how much test automation has improved our defect rate in later phases of the SDLC.
Governing IT Delivery
IT governance is, in my view, one of the undervalued elements in the transformation journey. Truth be told, most governance approaches are pretty poor and achieve very little of the outcome they are intended to achieve.
Most governance meetings I have observed or been part of are based on red/amber/green status reports, which are subjective in nature and are not a good way of representing status.
Furthermore, while the criteria for the colour scheme might be defined somewhere, it often comes down to the leadership looking for the project manager in the eyes and asking what she really thinks.
Project managers from a Project Management Institute (PMI) background use cost performance indicator (CPI) and schedule performance indicator (SPI), which are slightly better but rely on having a detailed and appropriate project plan to report against. I argue that most projects evolve over time, which means that if you’re preparing a precise plan for the whole project, you plan to be precisely wrong.
Additionally, by the time the status report is presented at the meeting, it is—at best—a few hours old. At worst, it’s an unconscious misrepresentation, because so many different messages needed to be aggregated and the project manager had to work with poor inputs.
Too often, a status report remains green over many weeks just to turn red all of a sudden when the bad news cannot be avoided anymore.
Or the status that moves up the chain of command becomes more and more green the higher you get because everyone wants to demonstrate that he is in control of the situation.
Remember, one of our goals with status reports is to make IT work visible, and we’re not doing that in a meaningful way if the information we’re presenting isn’t a factual representation of our processes and progress.
The Lean Treatment of IT Delivery
In transformations, our focus is often on technologies and technical practices, yet a lot can be improved by applying Lean to IT delivery governance. By IT delivery governance, I mean any step of the overall IT delivery process where someone has to approve something before it can proceed.
This can be project-funding checkpoints, deployment approvals for test environments, change control boards, and so on. During the SDLC there are usually many such governance steps for approvals or reviews, which all consume time and effort.
And governance processes often grow over time. After a problem has occurred, we do a post-implementation review and add another governance step to prevent the same problem from happening again.
After all, it can’t hurt to be extra sure by checking twice. Over time, this creates a bloated governance process with steps that do not add value and diffuse accountability.
I have seen deployment approval processes that required significantly more time than the actual deployment without adding value or improving quality. I find that some approval steps are purely administrative and have, over time, evolved to lose their meaning as the information is not really evaluated as it was intended. The following analysis will help you unbloat the process.
I want you to take a good, hard look at each step in your governance process to understand (a) how often a step actually makes an impact (e.g., an approval is rejected), (b) what the risk is of not doing it, and (c) what the cost is of performing this step.
Let’s look at each of the three aspects in more detail:
1. When you look at approvals and review steps during the SDLC, how often are approvals not given or how often did reviews find issues that had to be addressed?
(And I mean serious issues, not just rejections due to formalities such as using the wrong format of the review form.) The less often the process actually yields meaningful outcomes, the more likely it is that the process is not adding a lot of value. The same is true if approvals are in the high ninetieth percentile.
Perhaps a notification is sufficient rather than waiting for the approval, which is extremely likely to come anyway. Or perhaps you can cut this step completely.
I worked with one client whose deployment team had to chase approvals for pretty much every deployment after all the preparation steps were complete, adding hours or sometimes days to the deployment lead time.
The approver was not actually doing a meaningful review, which we could see from the little time it took to approve once the team followed up with the Approver directly. It was clearly just a rubber-stamping exercise.
I recommended removing this approval and changing the process to send information to the approver before and after the deployment, including the test results. The lead time was significantly reduced, the approver had less work, and because a manual step was removed, we could automate the deployment process end to end.
2. If we went ahead without the approval or review step and something went wrong, how large is the risk? How long would it take us to find out we have a problem and correct it by either fixing it or withdrawing the change? If the risk is low, then, again, the governance step might best be skipped or changed to notification only.
3. What is the actual cost of the government step in both effort and time? How long does it take to create the documentation for this step? How much time does each stakeholder involve spend on it? How much of the cycle time is being consumed while waiting for approvals to proceed?
With this information, you can calculate whether or not the governance step should continue to be used or whether you are better off abandoning or changing it. From my experience, about half the review and approval steps can either be automated (as the human stakeholder is following simple rules) or changed to a notification only, which does not prevent the process from progressing.
I challenge you to try this in your organization and see how many things you can remove or automate, getting as close as possible to the minimum viable governance process. I have added an exercise for this at the end of the blog section.
First Steps for Your Organization
There are three exercises that I find immensely powerful because they achieve a significant amount of benefit for very little cost: (1) value stream mapping of your IT delivery process, (2) baselining your metrics, and (3) reviewing your IT governance. With very little effort, you can get a much better insight into your IT process and start making improvements.
Value Stream Mapping of Your IT Delivery Process
While there is a formal process for how to do value stream mapping, I will provide you with a smaller-scale version that, in my experience, works reasonably well for the purpose that we are after making the process visible and improving some of the bottlenecks.
Here is my shortcut version of value stream mapping:
1. Get stakeholders from all key parts of the IT delivery supply chain into a room (e.g., business stakeholders, development, testing, project management office (PMO), operations, business analysis).
2. Prepare a whiteboard with a high-level process for delivery. Perhaps write “business idea,” “business case,” “project kickoff,” “development,” “testing/QA,” “deployment/release,” and “value creation” on the board to provide some guidance.
3. Ask everyone in the room to write steps of the IT process on index cards for fifteen minutes. Next, ask them to post these cards on the whiteboard and work as a group to represent a complete picture of the IT delivery process on the whiteboard. Warning: you might have to encourage people to stand up and work together, or you may need to step in when/if discussions get out of hand.
4. Once the process is mapped, ask one or more people to walk the group through the overall process, and ask everyone to call out if anything is missing.
5. Now that you have a reasonable representation of the process, you can do some deep dives to understand cycle times of the process, hot spots of concerns for stakeholders due to the quality or other aspects, and tooling that supports the process.
6. Get people to vote on the most important bottleneck (e.g., give each person three votes to put on the board by putting a dot next to the process step).
In my experience, this exercise is the best way to make your IT delivery process visible. You can redo this process every three to six months to evaluate whether you addressed the key bottleneck and to see how the process has evolved.
You can make the outcome of this process visible somewhere in your office to show the improvement priorities for each person/team involved. The highlighted bottlenecks will provide you with the checkpoints for your initial roadmap, as those are the things that your initiatives should address.
Accepting the Multispeed Reality (for Now)
Clients I work with often have a thousand or more applications in their portfolio. Clearly, we cannot make changes to all of them at the same time. This blog section looks at how to navigate the desire for innovative new systems and the existing web of legacy applications. We will identify minimum viable clusters of applications to start your transformation and perform an application portfolio analysis to support this.
One of the trends in the industry that has caused an increase in interest in Agile and DevOps practices was the arrival of internet natives, as I mentioned in the introduction. Those companies have the advantage that their applications are newer than most applications are in a large-enterprise context.
“Legacy” is often used as a derogatory term in the industry, but the reality is that any code in production is really legacy already. And any new code we are writing today will be legacy tomorrow. Trying to differentiate between legacy and nonlegacy is a nearly impossible task over time.
In the past, organizations tried to deal with legacy through transformation projects that took many years and tried to replace older legacy systems with new systems. Yet very often, many old systems survived for one reason or another, and the overall application architecture became more complicated.
These big-bang transformations are not the way things are done anymore, as the speed of evolution requires organizations to be adaptable while they are changing their IT architecture.
I think we all can agree that what we want is really fast, flexible, and reliable IT delivery. So, should we throw away our “legacy” applications and build a new set of “fast applications”? I think the reality is more nuanced. I have worked with dozens of organizations that are struggling with the tension between fast digital applications and slow enterprise applications.
Some of these organizations just came off a large transformation that was trying to solve this problem, but at the end of the multiyear transformation, the new applications were already slow-legacy again. A new approach is required that is more practical and more maintainable, and still achieves the outcome.
Analyzing Your Application Portfolio
Large organizations often have hundreds if not thousands of applications, so it would be unrealistic to assume that we can uplift all applications at the same time. Some applications probably don’t need to be uplifted, as they don’t change often or are not of strategic importance. In the exercise section of this blog section, I provide details so that you can run your own analysis.
With this analysis, we can do a couple of things: we can prioritize applications into clusters (I will talk about that a little bit later) and gather the applications into three different groupings that will determine how we will deal with each application as we are transforming IT delivery. The groupings will determine how you will invest and how you will work with the software vendors and your delivery partners.
The first group is for applications that we want to divest from or keep steady at a low volume of change. Let’s call this true legacy to differentiate it from the word “legacy,” which is often used just for older systems.
In the true legacy category, you will sort applications that are hardly ever changing, that are not supporting business-critical processes, and in which you are not investing.
I think it is pretty obvious that you don’t want to spend much money automating the delivery life cycle for these applications.
For these applications, you will likely not spend much time with the software vendor of the application, and you will choose a low-cost delivery partner that “keeps the lights on” if you don’t want to deal with them in-house. And you really shouldn’t invest your IT skills in these applications.
The second group is for applications that are supporting your business but are a little bit removed from your customers. Think of ERP or HCM systems—these are the “workhorses” for your applications. You spend a bulk of your money on running and updating these systems, and they are likely the ones that determine your overall speed of delivery for larger projects.
Improving workhorses will allow you to deliver projects faster and more reliably, but the technologies of many of these workhorses are not as easily adaptable to DevOps and Agile practices.
It is crucial to these systems that you work closely with the software vendor to make the technology more DevOps suitable. If you choose to get help maintaining and evolving these systems, make sure the partner you work with understands your need to evolve the way of working as well as the system itself.
The third group is your “innovation engines” applications. These are the customer-facing applications that you can use to drive innovation or, on the flip side, that can cause you a lot of grief if customers don’t like what you are presenting to them. The challenge here is that most of these will rely on the workhorses to deliver the right experience.
My favourite example is the banking mobile app, which you can experiment with but only in so far as it continues to show accurate information about your bank accounts; otherwise, you will get very upset as a customer.
Here, you will likely use custom technologies. You should work very closely with your software vendor if you chose a commercial-off-the-shelf (COTS) product, and the delivery partner should be a co-creator, not just a delivery partner.
Now, this grouping of applications is not static. As your application architecture evolves, certain applications will move between groups; that means your vendor and delivery-partner strategy evolves with it.
Active application portfolio management is becoming increasingly more important as the speed of evolution increases and application architectures become more modular.
Finding a Minimum Viable Cluster
The Agile principle of small batch sizes applies to transformations as well. We can use the information from the application portfolio analysis above to guide us.
It is very likely that the categories of workhorses and innovation engines contain too many applications to work on at the same time. Rather than just picking the first x applications, you need to do a bit more analysis to find what I call a minimum viable cluster.
Applications don’t exist in isolation from each other. This means that most functional changes to your application landscape will require you to update more than one application.
This, in turn, means that even if you are able to speed up one application, you might not be able to actually speed up delivery, as you will continue to wait on the other applications to deliver their changes.
The analogy of the weakest link comes to mind; in this case, it is the slowest link that determines your overall delivery speed. What you need to determine is the minimum viable cluster of applications. The best way of doing this is to rank your application based on several factors, such as customer centricity and volume of change.
The idea of the minimum viable cluster is that you incrementally review your highest-priority application and analyze the dependencies of that application. You look for a small subset of those applications in which you can see a significant improvement of delivery speed when you improve the delivery speed of this subset.
(Sometimes you might still have to deal with further dependencies, but in most cases, the subset should allow you to make significant changes independently with a little bit of creativity.)
You can continue the analysis for further clusters so that you have some visibility of the next applications you will start to address. Don’t spend too much time clustering all applications. As you make progress, you can do rolling-wave identification of the clusters.
I want to mention a few other considerations when thinking about the prioritization of applications. First, I think it is important that you start to work on meaningful applications as early as possible. Many organizations experiment with new automation techniques on isolated applications with no serious business impact.
Many techniques that work for those applications might not scale to the rest of the IT landscape, and the rest of the organization might not identify with the change for that application. (“This does not work for our real systems” is a comment you might hear in this context.)
Because the uplift of your minimum viable cluster can take a while, it might make sense to find “easier” pilots to (a) provide some early wins and (b) allow you to learn techniques that are more advanced before you need to adapt them for your first minimum viable cluster.
The key to this is making sure that considerations from the minimum viable cluster are being proven with the simpler application so that the relevance is clear to the organization. Collaboration across the different application stakeholders is critical to achieving this.
How to Deal with True Legacy
We have spoken about the strategy that you should employ for the applications that continue to be part of your portfolio, but what should you do with the true legacy applications?
Obviously, the best thing to do would be to get rid of them completely. Ask yourself whether the functionality is still truly required. Too often, we hang on to systems for small pieces of functionality that cannot be replicated somewhere else, because the hidden cost of maintaining the application is not visible; not enough effort is being put into decommissioning the system.
Assuming this is not an option, we should use for architecture what software engineers have been using in their code for a long time, the strangler pattern. The strangler pattern, in this case, means we try to erode the legacy application by moving functions to our newer applications bit by bit.
Over time, less and less functionality will remain in the legacy application until my earlier point comes true: the cost of maintaining the application just for the leftover functionality will become too high, and this will serve as the forcing function to finally decommission it.
The last trick in your “dealing with legacy” box is to make the real cost of the legacy application visible. The factors that should play into this cost are as follows:
the delay other applications are encountering due to the legacy application, the defects caused by the legacy application,
the amount of money spent maintaining and running the legacy application, and
the opportunity cost of things you cannot do because of the legacy application is in place.
The more you are able to put a monetary number on this, the better your chances are to overcome the legacy complication over time by convincing the organization to do something about it.
I said before that every application you build now will be the legacy of tomorrow. At the increasing speed of IT development, this statement should make us nervous, as we are creating more and more legacy ever faster. This means that, ultimately, the best way to deal with legacy is to build our new legacy with the right mindset.
There is no end-state architecture anymore (well, there never was, as we now know—in spite of what enterprise architects kept telling us). As a result of this new architecture mindset, each application should be built so that it can easily be decommissioned and to minimize its dependency on other applications.
How to Create Beneficial Strategic Partnerships with a System Integrator
Many organizations going down the path of Agile and DevOps determine that the best way to be successful is to transition to Agile and DevOps by initially relying on in-house capabilities due to the higher level of control over your people and the environment they work in (salaries, goals, incentives, policies) than you have over the people of your SI.
Unless you are really willing to take everything back in-house, you will at some stage start working with your SI partners. Fortunately, there are plenty of benefits to working with a partner.
The right partner will be able to bring you experience from all the companies they are working with, they have relationships with your product vendors that are deeper than yours, and they can provide an environment that entices talent to join them that you might not be able to provide.
IT is at the core of every business nowadays, but not every company can be an IT company. Strategic partnerships allow you to be a bit of both—to have enough intellectual property and insight into the way your system is built and run while permitting your strategic partner to deal with much of the core IT work.
Be open and willing to delegate IT when needed in order to maintain balance—and success—overall.
The world of technology is moving very fast, which means we have to learn new technologies all the time. If you have a good relationship with your partner, you might be able to co-invest in new technologies and support the training of your partner’s resources; and in return, you might get reciprocal benefits in exchange for a credential that the partner can use to showcase their ability with the new technology.
My heart warms every time I see a conversation like that take place—where two companies sit together truly as partners to look for win-win situations. Taking an active interest in the world of your partners is important.
In some of my projects, I was part of a blended team in which my people’s experience in technology worked together with the client’s employees’ intimate knowledge of the business. Those client teams could maintain and improve the solution long after we left, which is what real success looks like.
We not only built a better system but left the organization better off by having upskilled the people in new ways of working. As discussed in the application portfolio blog section, there might be applications where you don’t want to build in-house capability and for which this approach does not apply.
For your innovation and workhorse applications, you want to leverage the technology and project experience on the SI side with the business knowledge and continued intellectual property around the IT landscape from your organization.
You should avoid having vendors who do not align with your intended ways of working and those whom you don’t have visibility into their processes and culture to ensure they align with yours—otherwise, knowledge of your systems sits with individuals from these vendors/contractors, and most changes happen in what appears to be a black box mode.
This makes it very difficult for you to understand when things go wrong, and when they do, you don’t see it coming. One way to avoid this proliferation of vendors and cultures is to have a small number of strategic partners so that you can spend the effort to make the partnerships successful.
The fewer the partners, the fewer the variables you must deal with to align cultures. Cultural alignment in ways of working, incentives, values, as well as the required expertise should really be the main criteria for choosing your SI besides costs.
Importance of In-House IP
Your organization needs to understand how IT works and needs to have enough capacity, skill, and intellectual property to determine your own destiny. As we said before, IT is at the core of every business now; a minimum understanding of how this works is important so that you can influence how IT supports your business today, tomorrow, and the day after.
But what does it mean to have control of your own destiny in IT? While there are some trends that take “headaches” away from your IT department (think cloud, SaaS, or COTS), there is really no way of completing outsourcing the accountability and risk that comes with IT.
You will also have to think about the tools and processes that your partners bring to the table. It is great that your vendor brings additional tools, methods, and so on, but unless you are able to continue to use those tools and methods after you change vendors, they can become a hindrance later if those tools are critical for your IT delivery.
If they are not transparent to you and you don’t fully understand how they work, you have to take this into account in your partnering strategy, as you will be bound to them tighter than you might like.
Fortunately, there is a trend toward open methods and standards, which makes it a lot easier to communicate across company barriers. Agile methodologies like the Scaled Agile Framework (SAFe) and Large-Scale Scrum (LeSS) are good examples. It is likely that you will tailor your own method based on influences from many frameworks.
When you make using your method a condition for working with your organization, it helps you keep control. You do, however, need to make sure your methods are appropriate and be open to feedback from your partners. Your partners should absolutely bring their experience to the table and can help you improve your methods.
Standards are also important on the engineering side. Too many organizations have either no influence over or no visibility into how their partners develop solutions. Practices like automatic unit testing, static code analysis, and automated deployments are staples.
Yet many organizations don’t know whether and to what degree they are being used by their partner. Having the right structure and incentives in place makes it easier for your partner to use those practices, but it is up to you to get visibility into the engineering practices being used for your projects.
One practical way to address this is to have engineering standards for your organizations that every team has to follow no matter what, whether it’s in-house, single vendor, or multivendor.
These standards will also provide a common language that you can use with your partners to describe your vision for IT delivery (for example, what your definition of continuous integration is).
Changing the “Develop-Operate-Transition” Paradigm
In the past, contracts with system integrators had something mildly Machiavellian to them, where a company creates a terrible work environment in which nobody wins. One of the models that suffer from unintentional consequences over time is the develop-operate-transition (DOT) contract.
I am not sure how familiar you are with this contract term, so let me quickly explain what I mean. DOT contracts work on the basis that there are three distinct phases to a project: a delivery phase, where the product is created; an operating phase, where the product is maintained by another party; and a transition phase, where the product is brought back in-house.
Many organizations use two different vendors for development and operations or at least threaten to give the operational phase to someone else while working with a delivery partner.
There are a few things wrong with this model. First of all, if you have a partner who is only accountable for delivery, it is only natural that considerations for the operating phase of the project will be less important to them.
After all, the operating activities will be done by someone else. The operate party will try to protect their phase of the project on their side, and you will likely see an increasing amount of escalations toward hand-over. There is no ill intent here, it is just a function of different focuses based on the scope of the contracts.
The second problem is that many DOT projects are run as more or less black box projects, where the client organization is only involved as the stakeholder and, until it gets to the transition phase, has not built internal knowledge on how to run and maintain the system.
This causes problems not only during the transition but also when navigating misalignments between delivery and operating parties. With just a little tweaking, we can bring this model up to date.
Choose a partner that is accountable for both the delivery and operation. You can change the incentive model between the two phases to reflect the different characteristics. Make sure that there is team continuity between phases with your partner so that people who will operate the solution later are already involved during delivery.
Cultural Alignment in the Partnership
As mentioned earlier in the book, I have been on both sides of a partnership as a system integrator (SI) providing services to a client and in staff augmentation roles, where I had to work with SIs.
It is quite easy to blame the SIs for not doing the right thing—for not leveraging all the DevOps and Agile practices and for not experimenting with how to do things better.
The reality is that every person and every organization does what they think is the right thing to do in their context. No one is trying to be bad in software development.
Unfortunately, sometimes relationships have been built on distrust: because I don’t trust you, I will have a person looking after what you are doing. The vendor then creates a role for someone to deal with that person, and both sides add more process and more documents on each side to cover their backside.
More and more process, roles, and so on get introduced until we have several levels of separation between the real work and the people talking to each other from both organizations. To make things worse, all this is just non-value-added activities as payment for the distrust between partners.
But imagine you trusted your SI like you trust the best person on your team. What processes and documents would not be required, and what would that do to the cost and speed of delivery for you?
Despite these potential advantages, there is a way too little discussion on how to make the relationship work. How could we create a joint culture that incentivizes all partners to move toward a more Agile and DevOps way of working, and how do we do this when we have long-lasting relationships with contracts already in place?
First of all, I think it is important to understand your partner; as in any good marriage, you want to know what works and what doesn’t work for your partner. And when I say partner, I mean partner. If you do the off project with a vendor and it is purely transactional, then you don’t have to worry about this.
But if you work with the same company for many years and for some of your core systems, then it does not make sense to handle them transactionally. You want to build a partnership and have a joint DevOps-aligned culture.
In a real partnership, you understand how the SI defines his or her success, and both sides are open about what they want from the relationship. Career progression has been one of those examples, and I have been lucky, as most of my clients understood when I discussed the career progression of my people with them and why I needed to move people out of their current roles.
From a company perspective, they would have preferred to keep my guy in the same role for many years; but for me, that would not have been good, as my people would have looked for opportunities somewhere else.
Of course, all of this goes both ways, so you should not accept if the SI wants to behave like a black box—you want the relationship to be as transparent as you feel is right.
You have the choice to buy a “service” that can be a black box with only the interface defined. In this case, you don’t care how many people work on it or what they are doing; you just pay for the service.
This gives the SI the freedom to run independently—a model that works well for SaaS—and you might have some aspects of your IT that can work with a XaaS mindset.
For other projects that include working with your core IT systems and with people from your organization or other third parties, you want transparency. A vendor that brings in their own tools and methods is basically setting you up for a higher transition cost when you want to change.
You should have your own methods and tools, and each SI can help you improve this from their experience. You don’t want any black box behaviour. Fortunately, common industry frameworks such as SAFe or Scrum do help get to a common approach across organizations with little ramp-up time.
Thinking about partnerships, you should remember that you cannot outsource risk. I have often seen that the client is just saying “well, that’s your problem” when an SI brings up a possible situation. The reality is that if the project fails, the client will be impacted. Just closing your eyes and ears and making it the SI’s problem will not make it go away.
Think of the disaster with the Australian census, where the delivery partner took a lot of negative publicity, or Get 2018 health coverage. Health Insurance Market place in the United States, where vendors blamed each other for the problems.
Even if the vendors were at fault, the organizations took a huge hit in reputation in both cases; and given that they were public services, it created a lot of negative press.
In Agile, we want flexibility and transparency. But have you structured your contracts in a way that allows for this? You can’t just use the same fixed-price, fixed-outcome contract where every change has to go through a rigorous change control process. Contracts are often structured with certain assumptions, and moving away from them means trouble.
Time and materials for Agile contracts can cause problems because they don’t encourage adherence to outcomes— something that is only okay if you have a partner experienced with Agile and a level of maturity and trust in your relationship.
Agile contracts require you to take more accountability and be more actively involved in scope management. In my experience, the best Agile contracts are the ones that are built on the idea of fixed capacity aligned to some outcome and flexible scope (the delivery of a number of features for which some details are defined as the project progresses).
There are ways to create Agile contracts that work for all parties, so let’s explore some basics of a typical Agile project. While Agile encourages teams to deliver production-ready code with each sprint, the reality often means that the delivery process is broken down into four phases:
1. scope exploration upfront and ongoing refinement (definition of ready)
2. sprint/iteration delivery of user stories (definition of done)
3. release readiness preparation/hardening and transition to production (definition of done-done)
4. post-go-live support/warranty (definition of done-done-done)
With that in mind, a contract should reflect these four phases. As a departure from the common deliverable- or phase-based pricing, where your partner is being paid based on the deliverable (such as design documents) or completion of a project phase (such as design or development), these contracts reflect user stories as units of work.
Each story goes through the four phases described above, and payments should be associated with that; a certain percentage should be paid as a story achieves the definition of ready and the different levels of done. Here is a sample breakdown that works well:
We have three hundred story points to be delivered in three iterations and one release to production: $1,000 total price.
A payment schedule of 10%/40%/30%/20% (first payment at kickoff, the second one as stories are done in iterations, third one once stories are released to production, last payment after a short period of warranty).
Signing contract: 10% = $100.
Iteration 1 (50 pts. done): 50/300 × 0.4 × 1,000 = $66.
Iiteration 2 (100 pts. done): 100/300 × 0.4 × 1,000 = $133.
Iteration 3 (150 pts. done): 150/300 × 0.4 × 1,000 = $201.
Hardening and go-live: 30% = $300.
Warranty complete: 20% = $200.
With this contract model in place, we have a contractual model that ties the delivery of scope to the payments to the vendor. In my experience, this model is a good intermediate point of having flexibility while only paying for the working scope.
There are things that you want to provide as part of the contract too: an empowered product owner who can make timely decisions, a definition of the necessary governance, and a work environment that supports Agile delivery (physical workspace, IT, infrastructure, etc.).
Very mature organizations can utilize time and material contracts as they operate with their own mature methodology to govern the quality and quantity of outcome; less mature organizations benefit from the phased contract outlined above.
Another aspect of contracts is aligned incentives. Let’s start with a thought experiment: You have a really good working relationship with an SI over many years, but somehow, with all the legacy applications you are supporting together, you didn’t invest in adopting DevOps practices.
You now want to change this. You agree on a co-investment scheme and quickly agree on a roadmap for your applications. A few months in, you see the first positive results with demos of continuous integration and test automation at your regular showcases.
Your SI approaches you at your regular governance meeting and says he wants to discuss the contract with you, as the average daily rate of his overall team has changed. What do you expect to see? That the average daily rate of a worker has gone down thanks to all the automation? I mean, it should be cheaper now, shouldn’t it?
Well, let’s look at it together. The average daily rate is the average rate calculated on the basis that less-skilled work is cheaper and work that requires more skills or experience is paid higher. The proportion of those two to each other determines the average daily rate. When we automate, what do we automate first?
Of course: the easier tasks that require fewer skills. The automation itself usually requires a higher level of skill. Both of these mean that the proportion of higher-skilled work goes up and, with it, the average daily rate. Wait … does that mean things become more expensive? No.
Since we replaced some work with automation, in the long run, it will be cheaper overall. If you evaluate your SIs based on the average daily rate, you have to change your thought process. It is the overall cost, not daily rates, that matters.
Partnerships from the SI Side
I also want to look at the company–service provider relationship from the other side—the side of the system integrator. This perspective is not spoken about much, but given that it is usually my side, I want to show you how we think and what our challenges are.
The influence of DevOps culture has started to transform relationships to be more open. Even in the request for proposal processes, I can see an increased openness to discuss the scope and approach to delivery.
I have worked with government clients for whom, during the process, I was able to help them shift the request to something more aligned with what they were after by adopting an Agile contract like the one I mentioned earlier.
Originally, they wanted to pay per deliverable, something unsuitable for Agile delivery. Together, we can usually come up with something that works for all parties and makes sure you get what you are after.
As system integrators, we are still relegated too often to talking to procurement departments that are not familiar with modern software delivery. Contracts are set up with such efficiency that there is no room for experimentation, and where some experimentation is accepted, only positive results are “allowed.”
If you think about your relationship with SIs, I am sure you can think of ways to improve the relationship and culture to become more open and aligned with your goals. I have added a little test to diagnose your culture alignment in the exercises of this blog section.
I want to spend the last part of this blog section on partner evaluation. Clearly, you don’t want your partner to just take your money and provide a sub-optimal service to your organization.
So what can you do to govern the relationship while considering the open culture you are trying to achieve? And how do you do that while still having the means to intervene if the performance deteriorates?
In the example of a project being done by one SI, you can use a balanced scorecard that considers a few different aspects:
One of them is delivery; in this area, you care about quality and predictability of delivery. How many defects slip into production, how accurate is the delivery forecast, and how are the financials tracking?
You might also want to add the evaluation of delivery quality by stakeholders, potentially as an internal net promoter score (NPS, which quantifies the percentage of people recommending your service) of business stakeholders. Cycle time and batch size are two other metrics you should care about to improve the overall flow of work through your IT.
The second aspect is technical excellence. At a bare minimum, you want to look at the compliance with your engineering methods (unit testing, test automation, continuous integration, continuous delivery …). If your delivery partner is taking shortcuts, technical debt will keep increasing, and at some stage, you will have to pay it down.
In my experience, clients do a good job checking the quality of the end product but often fail to govern the engineering practices that prevent technical debt from accruing.
Providing a clear set of expectations around engineering methods and regular inspection (e.g., showcases for test and deployment automation, code reviews, etc.) reduces the chances of further technical debt. I have had great discussions about engineering strategies with clients during such showcases.
The third aspect is throughput; you can approach this from a story point or even stories per release perspective. I assume, though, that your releases will change in structure as your capabilities mature.
In light of this, cycle time and batch size are better measures. The interesting aspect of cycle time is that if you optimize for speed, you tend to get quality and cost improvements for free, as discussed earlier.
You should also reserve a section of the scorecard for improvements. Which bottlenecks are you currently improving, and how are you measuring against your predictions? You can add costs per service here (e.g., the cost of a deployment or a go-live) to see specific improvements for salient services in your organization.
Last but not least, you should have a section for the interests of your partner. Career progression, predictability, and market recognition come to mind as some of the non-commercial aspects of the relationship.
Of course, revenue and profitability are two other areas of interest that you want to talk about from a qualitative perspective—is this a relationship that is beneficial for both?
I recommend having a section of the scorecard where you track two or three priorities from the partner perspective and evaluate those together on a regular basis.
First Steps for Your Organization
Horses for Courses—Determining the Partners You Need
This whole blog section is about finding the right partner that fits your ambition and culture. But the truth is that you probably need different partners for different parts of your portfolio.
If you have done the application portfolio activity in blog section 2, this exercise will be easier. There are three different types of applications for the purpose of this exercise:
Differentiator applications: These applications are evolving very quickly are usually directly exposed to your customers, and define how your company is perceived in the marketplace.
Workhorses: These applications drive the main processes in your organizations, such as customer relationship management, billing, finance, and supply-chain processes. They are often referred to as enterprise or legacy systems.
They might not be directly exposed to the customer, but the company derives significant value from these applications and continues to make changes to them to support the evolving business needs.
True legacy: These applications are pretty stable and don’t require a lot of changes. In general, they tend to support your more stable, main processes or some fringe aspects of your business.
Based on these classifications, review your partner strategy to see whether you need to change either the partner itself or the way you engage with the existing one. For the first two categories, you want to engage strategic partners. For legacy applications, you are looking for a cost-effective partner who gets paid for keeping the system running.
The incentives for your strategic partners are different. Your partners for the workhorse applications should be evaluated by the efficiencies they can drive into those applications; for the differentiator applications, you want someone who is flexible and will co-invest with you. The outcome of this activity will feed into the second exercise for this blog section.
Run a Strategic Partners Workshop for Your Workhorse Applications
Organizations spend the majority of their money on their workhorse applications. This makes sense, as these applications are the backbone of the business. For this exercise, I want you to invite your strategic partners who support your workhorse applications (and possibly the differentiator ones) to a workshop.
You can do this with all of the partners together, which can be more difficult, or by running separate workshops for each partner. It is important to tell them to assume that the current contract structure is negotiable and to be open-minded for the duration of the workshop.
The structure of this workshop should be as follows:
Explain to your partner what is important for you in regard to priorities in your business and IT.
Discuss how you can measure success for your priorities.
Let your partner explain what is important for them in their relationship with you and what they require in their organization to see the relationship as successful.
Workshop how you can align your interests.
Brainstorm what the blocks are to truly achieve a win-win arrangement between your two organizations.
The key to this workshop is that both sides are open-minded and willing to truly collaborate. In my experience, it will take a few rounds of this before barriers truly break down—don’t be discouraged if all of the problems are not solved in one workshop.
Like everything else we talk about, it will be an iterative process, and it is possible that you will realize that you don’t have the right partners yet and need to make some changes in the makeup of your ecosystem.