8 - Systems Engineering Management

16
SYSTEMS ENGINEERING MOOC 8 – SYSTEMS ENGINEERING MANAGEMENT PART 1 Systems engineering is not all about the process that results in the design and development of a solution. Systems engineers are also responsible for managing the process to ensure that it remains focused and delivers expected outcomes without exposing parties to excessive risk. In this session, I will look at some key systems engineering management issues and explain why they are important and what they achieve. I wont try to cover every conceivable element of management but rather give you a good feel for the types of management issues that arise. I will look briefly at our interest in thoroughly and progressively testing our systems before handing them over to our stakeholders. The need for testing a system prior to use is self-evident but what is sometimes not so clear is the time, money and risks associated with a comprehensive test program. Unless it is managed properly, waste will result and the testing program will fail to deliver the expected results. Managing the configuration of all of the elements of our system can be tedious but if it is not done properly, we will sentence the through life support stakeholders to a life of pain and frustration as they attempt to support a system whose configuration is not known. Imagine trying to modify a complex system when the system does not appear to be accurately documented or when there seems to be significant variation in configuration across a fleet of systems that are meant to be identical. Risk is essentially the chance of something happening that adversely impacts on objectives. On a technical system development, there are plenty of risks that we face routinely. Should we use off the shelf systems or or should we develop them from first principles. Should we use an experienced team of experts for the technical program or should be use less experienced personnel. Maybe there are technical risks that need to be addressed by the design. For example, what is the severity of the risk of an aircraft loosing electrical power to flight or safety critical systems. How can we reduce those 1

description

8 - Systems Engineering Management

Transcript of 8 - Systems Engineering Management

  • SYSTEMS ENGINEERING

    MOOC 8 SYSTEMS ENGINEERING MANAGEMENTPART 1Systems engineering is not all about the process that results in the design and development of a solution. Systems engineers are also responsible for managing the process to ensure that it remains focused and delivers expected outcomes without exposing partiesto excessive risk.

    In this session, I will look at some key systems engineering management issues and explain why they are important and what they achieve. I wont try to cover every conceivable element of management but rather give you a good feel for the types of management issues that arise.

    I will look briefly at our interest in thoroughly and progressively testing our systems before handing them over to our stakeholders. The need for testing asystem prior to use is self-evident but what is sometimes not so clear is the time, money and risks associated with a comprehensive test program. Unless it is managed properly, waste will result and the testing program will fail to deliver the expected results.

    Managing the configuration of all of the elements of our system can be tedious but if it is not done properly, we will sentence the through life support stakeholders to a life of pain and frustration as they attempt to support a system whose configuration is not known. Imagine trying to modify a complex system when the system does not appear to be accurately documented or when there seems to be significant variation in configuration across a fleet of systems that are meant to be identical.

    Risk is essentially the chance of something happening that adversely impacts on objectives. On atechnical system development, there are plenty of risks that we face routinely. Should we use off the shelf systems or or should we develop them from first principles. Should we use an experienced team of experts for the technical program or should be useless experienced personnel. Maybe there are technical risks that need to be addressed by the design. For example, what is the severity of the risk of an aircraft loosing electrical power to flight or safety critical systems. How can we reduce those

    1

  • risks by designing redundancy into our design. Technical risk management is the focus of a lot of thethings we do as systems engineers.

    Speaking of risk, it makes no sense to set and forget the design and development effort and only check on it when it is meant to be completed. Instead, we tend to look for periodic reviews of our actual progress compared to planned progress. This allows us to address issues, answer questions, clarity conflicts, consider design decisions and record rationales at discrete points in our process. It also supports the desire of systems engineering to address problems as soon as possible not as late a possible. You will recall from an earlier module that addressing problems as soon as they arise is the most economically viable and time-effective approach.

    Another critical part of systems engineering management that is strangely often overlooked is theneed for systems engineers to consider the unique situation in which they find themselves. The process we described in this MOOC is just an example of howa program may be structured. The approach we presented is often called the waterfall approach because the whole system is developed in one pass; starting at the system level at the top then cascading down to the subsystem level and then finishing with the component level, in that order. I have heard some people say that the waterfall approach is dangerous and never works. In my view, this is just plain wrong. The waterfall approach can work, but it doesnt work all of the time. It relies on a thorough and complete understanding of the problem and the desired solution. It assumes the accurate translation of this into comprehensive system level requirements. It works best when these requirements do not change very often. It assumes that technology is available and works best when that technology remains relatively stable over the course of the system development. It assumes that we have enough time and enough money to solve the entire problem in one pass. A lot of assumptions, arent there? Well, I have worked on projects where all of these assumptions were valid and the waterfall approach was employed with great effect, so to say itis dangerous and never works is not correct in my view. However, what I think those critics are probablysaying was that it is not common to come across a situation where all of those assumptions are in place.If any of those assumptions are not in place, then systems engineers must think of alternative ways of executing the systems engineering process. Forcing a waterfall approach under unsatisfactory circumstances will expose the program to risks such as cost and time overruns (caused by potentially

    2

  • extensive rework) and delivery of a system that has been based on invalid or out of date sets of requirements. We will have a look at some alternatives to the waterfall approach in this session, but systems engineers must be capable of independent thought in this area and not just follow a process because thats how it has always been done.

    Once we have listed all of these things that we need to manage in systems engineering, it will come as no surprise that we have some planning to do. We will certainly produce a governing plan out of all of this, but it is the planning process that is the valuable exercise. The plan is just the artefact that results from planning. The big thing to remember about planning is that it needs to be ongoing in order to keep up with the current situation.

    Lets now work our way through these areas and cover some of the major themes.

    PART2

    Why do we test things? We need to verify that our design and development effort has been successful by confirming that our design approaches have resulted in components, subsystems and eventually asystem that meets it specified requirements. This helps identify areas where redesign might be necessary. This sort of verification is sometimes called Developmental Test and Evaluation. An example of component testing might be to test different types of concrete before deciding on what concrete to use in critical areas like footings or slabs. We would then test the concrete as it arrives on site prior to it being poured into the excavations. Although it would appear to be inconvenient and may even upset some people, It is much easier and cheaper to reject poor concrete before it is laid only to discover structural problems after our house is built requiring expensive and time-consuming rectification action.

    As the system passes through production and construction, we need to verify the acceptability of the system generally against our system level requirements. This sort of verification is often called Acceptance Test and Evaluation because the aim of this verification is to allow the customer to formally accept that the system meets its system-level function and performance requirements. Acceptancetesting will be a finite period of time and will involve both the customer and the contractor. In our house...

    3

  • ...example, it is likely to be a period of walking around the house and having the different key elements of the house ticked off.

    Once the system enters the utilisation stage, we continue to evaluate the system. Generally, this sort of evaluation aims to continually validate that the system is solving the problems that created the need in the first place.

    Naturally, this sort of exercise involves the system being employed in operational environments, being used by end-users who are trained by our training system, supported by our support system, and so on. As we live in our house, there are bound to be things about it that we dont like. For example, we may find out something about our house that was not apparent during acceptance testing that means that our house does not meet the specified requirements in some area. In this case, we would probably have some recourse against the contractor in the form of latent defect clauses or warranty provisions in our contract. We would use these provisions to have the defect rectified. In other cases, there may just be things about the house that we would have done differently if we had our time again. These experiences may raise issues that result in modifications or upgrades to the system. Modifications and upgrades are an opportunity to re-invigorate systems engineering as far as the system goes as the upgrades may be considered a system in their own right.

    Verifying and validating the system in this way is a risk mitigator because it allows us to confirm design adequacy, detect problems early, confirm rectification action and so on. In other words, it helpsprotect us from ending up with a system that doesntwork the way it was intended to work. Leaving testing until right at the end of the production process is leaving things too late. Progressive evaluation is the key.

    4

  • We mentioned examples of verification method in early modules. For example, we spoke about using tests, demonstrations, inspections and analyses to perform verification. Programs to adequately evaluate a system must be planned and managed. If they are not planned and managed, there will be serious ramification such as project cost and schedule blow-outs and acceptance into service of a poorly evaluated system. The sorts of things that need consideration include specialised facilities and test equipment, personnel availability and training, approved evaluation procedures, availability of necessary external systems for the evaluation program and so on. If things are planned properly, we will not have to repeat evaluation unnecessarily either. This will also save time and money. The bottom line is that evaluation is a critical part of systems engineering and it must be planned and managed from the earliest possible stage in the systems engineering lifecycle.

    In my experience, if this planning is not done you'll be left with a couple of very undesirable choices to make. Such as: do I blow the project cost and schedule to ensure that the system is adequately evaluated, or, do I deliver a system on time and budget without thoroughly evaluating it.

    These aren't good choices to have to make, so avoid having to make them by planning the evaluation process and allowing for it in your cost and schedule estimates.

    Configuration management is a very important part of systems engineering. It is there to make sure that we maintain control over the versions of all of the different things within our system design. This includes our documentation (such as specifications and drawings) and the hardware, software and interfaces that make up the design.

    For example, it is critical in building our house that allparties involved in the house project are running off the same set of drawings and associated descriptions. Imagine the difficulties that would be caused if the customer, architect and builder all had different versions of a document that listed the windows to be installed in our house. This could happen if changes had been made to the document without all parties being involved in the change process. Configuration management aims to avoid this sort of problem by establishing and maintaining control.

    5

  • There are four basic elements to configuration management. Firstly, we identify everything that we are going to place under control. In our house, this will include the set of drawings, specifications and other documents that will be used on the project. Only those documents listed are authorised for use. This will also include pieces of hardware and software used in our project. For example, the make and model of washing machine, oven, hotplate and security sensors are all examples of things that are likely to be specified and agreed upon. In terms of software, it is possible that integrated entertainment systems, security systems or automated watering systems will make use of specific operating systems and other software. It might be that the version of software running on some of these computer-based systems is also specified.

    Once we have identified what we are controlling, we need to be able to communicate that to all parties. We do this by being able to communicate what the current configuration baseline is for any part of the system. For example, we would want to be able to see what the current agreed configuration of kitchen appliances is and how this has changed over time. This is called status accounting in configuration management.

    Speaking of change, a critical element of configuration management is the ability to be able tomanage change. Change is not bad in itself but change without adequate control and visibility is potentially disastrous. Imagine if the customer and architect kept making changes to the configuration baseline of the kitchen without involving the cabinet maker and builder. Naturally, the cabinet maker and builder will build the kitchen against their baseline but the customer will expect the kitchen to be delivered against their baseline. Change management is all about being able to gather the appropriate parties together and look at change proposals. In this case, the customer, architect, cabinet maker and builder would get together to discuss the proposed change. The cost and schedule impact of the change could be discussed and a decision can be made about whether to make the change or not. Either way, everyone is informed and decisions are made based on accurate information which is the aim of the change management process.

    Finally, configuration management also involves periodically auditing the process to make sure it is all working properly. We check that everyone is using the latest agreed documentation in performing their work and we confirm that the materials being used and the construction process being used is in...

    6

  • ...accordance with the design documentation. Hopefully, our audits will confirm that everything is working well. Poor audit results indicate that there may be something wrong with our configuration management system. Poor audit results therefore warrant investigation. For example, if our plumber is using pipes that are different from what the drawingsspecify, this could be caused by the plumber doing the wrong thing or it could be caused by the plumberworking from the wrong drawings. The former requires action against the plumber whilst the latter requires tightening of our configuration managementprocess.

    PART3There are all sorts of risks facing projects including schedule risk (the risks of delivering a project later than expected) and cost risk (the risk of going over budget). Systems engineering assists in the management of an array of risks including schedule and cost risk but it is technical risk that is a primary focus of systems engineering. Technical risk could include delivering a system that:is not up to the required standard in terms of its function and performance,is not able to be maintained in accordance with the support concept, Is not sufficiently reliable to carry out its intended missions, orIs too expensive or difficult to produce in the required quantities.

    To understand individual risks and then to compare different risks to one another, we need to look at all of them to appreciate the likelihood of the risk occurring and understand the nature of the resultant impact. At one end of the spectrum, we may face risks that are extremely likely to occur and will have dire consequences on our objectives if they occur. At the other end of the spectrum, we may face risks that are very unlikely and will have only a small impact on our objectives. The former are extreme risks and the latter are insignificant risks. Naturally, inbetween the different combinations of likelihoods and consequences result in an array of risk severity assessments.

    Systems engineering is a discipline that continually assists with risk management. For example:we conduct progressive design reviews as we pass through the design phase to try to detect and correcterrors as early as possible,we conduct rolling evaluation programs and audits as the system passes through design and development and into construction andproduction to ensure that we have come up with a...

    7

  • ...properly documented design that works.we are always considering alternatives and choosing the most balanced design approach to our problems.

    Sometimes we are confronted with risks that are so severe that they need action. Classic responses to severe technical risks are to avoid the risk (by taking an alternative design approach) or to reduce the risk by either reducing its likelihood and/or impact. Lets say that the land upon which we are building our house is a sloped block. There may be a risk of subsidence on the block that is deemed too high. To avoid this risk, we could include a retaining wall in our design concept or we could use excavation to level the block. It should be noted that by avoiding one risk (in this case subsidence) we are invariably exposing ourselves to others. For example, excavation allows us to avoid the subsidence risk but the process might be very expensive and time consuming, exposing up to cost and schedule risk. Lets look at an example of reducing rather than avoiding risk. Our house design concept might include a spiral staircase. Although compact and pleasing to the eye, the probability of someone falling down the spiral staircase might be deemed toohigh by the house owners and a more traditional staircase requested instead. There is still a risk that someone will fall down the staircase but the probability will be much reduced and the revised risk severity might now be acceptable.

    Other examples of risk management within our house design might be to build spare capacity into the house for future growth. For example, we might reduce the risk of overloading our electrical system by ensuring that each circuit is only carrying 50% of its design capacity. This reduces the risk of overloading our circuits and provides a platform for future growth. Another design approach that we maytake is to build redundancy into our designs, especially for safety or mission critical elements of the design. For example, we may consider our watering system for our garden to be critical. If our watering system fails, we risk loosing very expensive plants, lawns and gardens. Of course, we might be relying on the garden for our food also. In this case, we might design a watering system that uses the house power under normal conditions but is also backed up by a battery in the case of electrical failure. Another risk mitigator is to use design diversity in our systems. Design diversity is where we use different design approaches in our design of redundant systems so that something that causes one approach to fail will not necessarily cause the other approach to fail.

    8

  • Lets look at another example of using the concept ofdesign redundancy and diversity to reduce technical risk. Lets say we are working on an aircraft system and there is a computer system on-board the aircraft that must not loose electrical power. It might be a flight control computer for example. Engineers will be tasked to design an electrical system to provide electrical power to that flight control computer. Initially, their design might involve driving a generator from one of the engines to provide the necessary power. Upon review, this design may be viewed as an unacceptable risk because of probability of generator failure may be too high and impact on flight safety may be dire. In short, failed generator = no electrical power = no computer = plane crash.

    So, the engineers revise their design and use conceptof redundancy by adding a second generator run off the other engine as a backup. If the main generator fails, the backup generator can take over.

    Another review of the design reveals that there are circumstances that could cause both generators to fail. This circumstance is often called a common mode of failure. Engineers may solve this by using the concept of design diversity and add battery. The performs the same function as the generators it provides electrical power - but it does this in a different way by using chemical rather than mechanical means.

    By using design techniques like redundancy and diversity, the engineers have addressed the technical risk (in this case, the risk of loosing electrical power to a critical computer). Note that they have not reduced the impact of the risk (if the computer looses power, the aircraft still crashes) but they have reduced the probability of the computer loosing power. Because risk severity is a function of both probability and impact, they have reduced the risk severity by reducing he probability of it occurring.

    Spare capacity, redundancy and design diversity havebeen incorporated in the design of many technical systems around us in order to mitigate risks. Examples include cars, transport systems, aircraft, medical facility design and so on.

    9

  • Spare capacity and redundancy has been included in the design of many technical systems around us in order to mitigate risks. Examples include cars, transport systems, aircraft, medical facility design and so on.

    Sometimes, though, we consider the risks that we are facing and decide to take on risks because of the potential benefits that may accrue as a result of taking those risks. For example, making use of leading-edge technology in our designs can be risky due to the unknown nature of leading edge technology. However, there may be major advantages in terms of function and performance associated with using leading edge technology. In other words, sometimes the risk=return adage is worth considering.

    Throughout this MOOC, we have discussed the idea of periodically reviewing our work at logical points in the design and development process. This is an effective way of detecting errors, conflicts or problems with our design as early as possible. After all, we know that the earlier we detect issues, the easier and cheaper they are to rectify.

    In this MOOC, we are suggesting reviewing things after major transitions. For example, we spoke of a system-level review called the System Design Review when we have transitioned from stakeholder to system-level requirements.

    We also suggested a detailed review that we called the Critical Design Review when the designers believed that they had completed the detailed designfor our system.

    10

  • Conducting reviews of this nature just make sense and are not mysterious systems engineering activities. They are simply technical meetings that areconducted in a controlled and professional manner, involving appropriate groups of people, that aim to review work packages, approve plans for the next stages, and resolve any problems that are facing the development effort. When I say that the meetings are conducted in a controlled and professional manner, I am referring to standard meeting protocol such as:an agenda for each review outlining what is going tobe covered, how long it is going to take and who is leading the presentations;minutes to be taken and agreed prior to the conclusion of the meeting; andan agreed chairperson(s) to maintain control over the meeting.

    These technical reviews must be held at the appropriate time within the development program. Ifthey are held too early, the development effort will not be sufficiently advanced for the review to be meaningful. If they are held too late, we may miss opportunities to rectify issues or problems in a timelyfashion. My over-whelming experience is that reviews tend to be held too early rather than too late. My experience is that people like to be seen to be progressing on-schedule even if the technical program is lagging. The attitude seems to be that the technical program will catch up after the review. Byconducting reviews against the project schedule (despite delays in the technical program) it gives the impression that the project is on-schedule. Conducting technical reviews too early in order to artificially adhere to a project schedule is not recommended. Fixing problems with the technical program as early as possible in the systems engineering process is recommended. There is a great expression that applies here that bad news is not like red wine. Bad news does not get better with age. It is best to recognise problems as early as possible rather than pushing them to later (where...

    11

  • ...they have invariably become much worse).

    Systems engineering planning must take account of the technical program and plan for these design reviews. Design reviews take time, cost money and will involve a number of different people. Thats why planning for them is important. The number and nature of technical reviews will be different for every project so we must think about every project as a unique undertaking when considering how to conduct the reviews. A risky project using developmental technology and involving large sums of money will be reviewed more thoroughly than a project at the other end of the spectrum.

    PART4A fundamental systems engineering management task is to determine the most appropriate technical strategy to use to take our stakeholders expectationsand turn them into a system that solves the stakeholders problems.

    In this MOOC, we have used a classic process known sometimes as the waterfall approach to development. We chose this approach for this MOOCfor a few reasons:it remains a popular approach to systems engineering,It is logical and sequential making it ideal for explaining the whole systems engineering process, andEven if it is not used as the development approach, it still represents the basic building blocks of other popular approaches.

    To recap how we explained systems engineering in this MOOC, using the house as an example, you will recall that we assumed that we were going to do the whole house project in one, single pass. That is, we were going to go from a complete conceptual design to a complete physical design in one pass. In doing this:First the system is understood via requirements engineeringThen all of the system elements are identified and understoodThen all of the elements that need designing are designed, integrated and testedThen all of the elements are integrated to form the system and it is testedThen the whole system goes through production

    12

  • This strategy does work in certain circumstances. For example when:We have enough time and money to do the whole lot at onceWe understand our requirements at the system levelwell enough to base the whole effort on those requirementsOur requirements are sufficiently stable that they dont keep changing (forcing expensive and time consuming rework)Technology and expertise is sufficiently available andstable to be able to solve the whole problem at once

    Note that to be successful, all of these things need tobe in place. If one or two are missing, then the waterfall approach may not be the best approach. There are alternatives.

    Lets say that we understood all of our requirements for our house and we had the technology and expertise in place to design and build the house but we did not have enough time or money to do the whole project at once. How would we proceed?

    Common sense would tell us that we would design the whole house so that the design accounts for everything we want but that we would implement the design in a series of interconnected stages or phases. In between the stages or phases, we would be living in the house and saving up money for the next phase. Because we had taken the subsequent phases into account right at the start, phase 2 would be able to build on phase 1, phase 3 would be able tobuild on phase 2 and so on. In systems engineering, we would call this an incremental approach.

    13

  • What if we didnt really understand all of our requirements in a lot of detail. We were certain of some requirements but not others. We might build the house based on the requirements we understoodand build plenty of spare capacity into the design so as to address our future needs when those future needs become apparent. As we live in the house, we develop our requirements for additional capability. When we have enough time and money and we understand our requirements a little better, we can embark on an evolution of the original house. This might be in the form of an extension or a reconfiguration. Naturally, in this case, we will be constrained by whatever form the house currently takes. Because we didnt have a thorough picture in mind when we started (like we did with the incremental approach or the waterfall approach) we

    may need to evolve in a sub-optimal manner. We might, for example, find ourselves saying if only I had realised that I would want to do this extension when I was building the original house, I would have. You can fill in the rest of the sentence with things like built a stronger concrete slab or locatedthe storm water drain in a different place and so on.

    The bottom line is that there are many ways to execute the systems engineering process. We have discussed the waterfall approach in this MOOC and also touched on alternatives such as incremental andevolutionary approaches. When people say that systems engineering has not worked on a project, they are probably saying that an inappropriate systems engineering approach was employed on the project. Systems engineering is definitely not a one size fits all process. What works in one situation probably wont work so well in other situations. Systems engineering must be tailored to suit different situations.

    All of the preceding discussions should highlight the critical importance of planning the overall systems engineering effort. For example, in the preceding discussions, we have explained that we really do need to plan:

    What strategy are we using

    Who is doing what

    When are the reviews happening

    What design, development and production resourcesare required

    What are some of the big risks we are facing...

    14

  • ...What is our approach to key systems engineering issues like T&E, configuration management and requirements engineering?

    In developing an idea of the answer to all of these key questions, we will be going through a planning process. When we have agreed on the answers and written those answers down, we will have a plan. In systems engineering, this plan is generally called the Systems Engineering Management Plan or SEMP.

    The critical thing to remember is that the plan is only an artefact. It is the planning effort that is the most vital component of producing the plan. I am sometimes asked to help organisations to produce a SEMP. Sometimes, the organisation is focused on producing an artefact that complies with some formatting and content requirement. Really, this is missing the point. What they really need is to go through the planning process and discuss how they are going to tackle all of the elements of the systems engineering process and then write the plan. I can not stress enough that the planning process results inthe plan.

    Another point to make is that the plan (SEMP) will not remain static over a typical project so systems engineers mush continue to plan and update their strategies to meet the challenges of a changing situation. This is a fact of life on a typical project.

    Systems engineering rarely, if ever, exists in its own right independent from other professional disciplines. Bringing a solution to a complex problem into existence will involve a lot of different disciplinesworking together.

    Systems engineers are closely related to the discipline of project management. In some cases, project managers will need input from systems engineers to organise things like scope, cost and schedule estimates. In some cases, systems engineers will need assistance from project managersin order to do their job. There is a very strong correlation between the systems engineering effort and the project management effort.

    Systems engineering is a lifecycle discipline. At various points in this MOOC, we have discussed lifecycle concepts that require us to think about maintenance and support, facilities, training, personnel, disposal and so on. A critical technical discipline known as Integrated Logistics Support is focussed on influencing the design and development of our system with through life support in mind....

    15

  • ... To that end, there is a very strong relationship between systems engineering and integrated logisticssupport. Both disciplines need to work closely together in order to achieve a system that both meets customer requirements but one that is also supportable through its life.

    On technical projects, the systems engineering effort will be responsible for managing, directing, controlling and supporting an array of classic technical disciplines. The nature of these disciplines will vary depending on the nature of the project and system. For example, on our house, we will be dealing with technical disciplines such as carpenters, joiners, plumbers, electricians, bricklayers and so on. On more complex systems like a modern aircraft, we will be dealing with aerospace engineers, jet engine specialists, materials specialists, electronic engineers,software engineers and so on. Naturally ensuring that all of these disciplines are working closely and cooperatively together will be a major determinant of success. This is a major role of the systems engineering management.

    16