Kiva Design Multicore Strategy Pete Wilson Kiva Design [email protected]

of 22/22
Kiva Design Multicore Strategy Pete Wilson • Kiva Design • [email protected]
  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Kiva Design Multicore Strategy Pete Wilson Kiva Design [email protected]

Multicore StrategyKiva Design
Customer Issues
Kiva Design
Lower cost Finer granularity Higher performance Lower power More flexibility
Legacy software doesn’t scale No established multicore software models/standards No multicore-oriented development chains New languages/language extensions programming models needed Difficulties in mapping problem to software/cores and lack of tuning tools/methodologies Lack of skilled staff to write new generation software
Silicon Vendor Business
Sharply-reduced next-generation core development costs and time Improved re-usability of methodologies and IP across multiple business units Very rich environment for the creation of new valuable IP for licensing or patenting Probable need for more custom products
Adopting wrong/inappropriate standards Being perceived as missing the bandwagon Pressure to revamp product lines New roadmaps needed New vendor/3rd party toolchains needed New vendor/3rd party modelling/simulation tools needed Risk of competitors gaining valuable fundamental IP/patents Probable need for more custom products
Silicon Vendor Technology
Simplified core designs Unified SoC architectures Easier to leverage industry tools Coherent interconnect/ interface/ programming models for DSPs, CPUs, accelerators,.. Another dimension for power management
Lack of simulation tools to allow architectural decisions/ tradeoffs Power management becomes a more complex issue Lack of customer system/software examples and knowledge Investment needed in development toolchains, languages, simulation technologies Pressures on architectures/microarchitectures likely to drive change
Kiva Design
Customer Resistance
customers will resist changing tools, approaches, languages, or designs in the absence of software standards supported by toolchain vendors
By definition, standards lag technology
Since widespread creation of highly-concurrent software has yet to happen, current deployable technology is at best immature
Thus there is a challenge to the creation of software standards
Good software standards should:
First, do no harm
Secondly, be open to evolution
Thirdly, to be expressible as standard APIs even if clearer, safer, more performant as language extensions
Define a “Concurrency Abstraction Layer”?
Evolve - and be replaced - over time
Standards Activities
Investigation of standards bodies active in this area is needed, followed by embracing a select few and active involvement in driving standards and tracking directions.
Kiva Design
Customer Issues: New Technologies
Multicore implies widespread and possibly deep changes to software design and implementation
Customer buy-in will be a plus
Positioning vendor as a multicore leader through its commitment to tools, vendors and standards will likely be beneficial
Technology discussions with customers can lay a foundation
What the company is trying to do
Describing investment areas
Explain software approaches under investigation and our toolset for measuring, experimenting, simulating..
Explain architectural possibilities
.. And provide useful feedback on “hot buttons”
The good things as well as the bad things
..and provide an opportunity for obtaining useful application information
What are they trying to get multicore machinery to do?
What are the barriers to their success?
What tools, technologies, approaches, heuristics… would help them do what they want to do?
Kiva Design
Customer Issues: Legacy Conversion
Customers tend to have a lot of working, qualified code tuned to run on (mostly) a single core or processor
Future cost-effective high-performance machinery will at best run this code at more or less current performance levels.
The customer will want a tool which takes this legacy code and refactors it to be able to leverage increasing numbers of cores
Initially, such a tool will only need to scale from one to two or perhaps four cores, but over time the number of cores will increase at roughly Moore’s Law rates
For appropriate nested loops, automatic compiler parallelisation is a good fix
Offered by several vendors
But, unfortunately an automatic conversion tool seems totally impractical for control-oriented code
It may be feasible, however, to identify a number of “concurrency templates” which provide heuristic guidance on how to repartition a given shape of app into multi-core suitability
Conversion will need to be done manually
And it may be feasible to develop a tool which can provide some confidence to the customer that the meaning of the new software is the same as that of the original version
Investment needed in these areas, both internally and in partnership with key toolchain vendors
Kiva Design
Customer Issues: Vendors, Tools & Learning
Customers will rely on tools vendors to provide appropriate tools. But they will also need training in the implications of the new hardware, design approaches, debug, tuning…
Having appropriate training materials implies the generation of basic knowledge allowing the building of training materials
Not just semi vendor-generated material, but books, papers etc written by others
Standards will play an important role
Partnerships between semi vendor and outsiders will play an important role
Tools vendors will be slow to adopt any approaches/technologies not supported by the industry, their own customers and standards
Pump-priming by semi vendor is likely to be necessary, through both funding and technology transfer
Kiva Design
Customer Issues: SW Design, Partition & Tuning
Customers do not have years of experience designing and developing multicore software
How to partition tasks across cores:
Made more interesting in the presence of heterogeneous cores and of accelerators
Dynamic or static load-balancing?
Tuning the software?
Kiva Design
Adopting wrong/inappropriate standards
Pressure to revamp product lines
New roadmaps needed
New vendor/3rd party modelling/simulation tools needed
Kiva Design
Silicon Vendor Business Issues: Standards
Standards can be a nuisance
Standards wars - backing the wrong side can cause customer perception issues, as well as wasting time, money and resources
Avoiding action until standards become established can indicate a lack of commitment to customers, in the absence of visible pro-active measures being taken
Standards in such an IP-target rich environment as multicore can be a double-edged sword - valuable created IP may need to be “given away” just to get silicon to be usable
New standards implies new tools, new training, new app notes, new problems - someone has to pay for all this; and someone has to put the effort into driving third-party toolchain vendors into action.
Kiva Design
Silicon Vendor Business Issues: Inaction
It’s well-known that it’s a multicore world - what are you doing to demonstrate that your company is on top of the implications?
It’s easy to generate a belief among customers and in the industry that a vendor doesn’t “get” multicore and all its implications
A technology roadmap indicating how a customer will be able to leverage new multicore technologies without abandoning legacy software/IP seems necessary - or at least highly desirable
The roadmap needs to cover both silicon and software
Inaction allows competitors to create a monopoly in valuable new IP
Kiva Design
Silicon Vendor Business Issues: Roadmaps
Multicore is new and exciting and scary - where’s the roadmap?
Interplay of hardware, software, technology, positioning - need to choose a direction through the minefield
Need to strike a reasonable balance between short, mid-term, long-term investments
Need to be able to tell the story behind the plans
Need to inspire third-party vendors to track and support roadmap
Need to strike partnerships with competitors and partners in the industry
Kiva Design
Silicon Vendor Business Issues: System Simulation
Customers need to be confident that they can partition software, allocate tasks to cores appropriately and choose the right-sized platform
This implies system-level, multicore-oriented simulation/modeling tools which can model software at multiple levels of abstraction as well as modeling hardware.
Where’s the technology for this coming from? Where’s the prototype? The products? The funding? The support?
Kiva Design
Modelling tools and technology
Current architectures have appalling context switch times, lack any architected message-passing/communications capability, and bear the area and power burden of exquisitely complex architectures
Current architectures are tuned for computers and do not match the needs of data-movement-intensive, multi-engine systems
Resources wasted in massive SIMD subsystems for which no language support can be made available; I/O is run naked without any MMU support; data movement is not even a decent afterthought
Current architectures assume control is the king - that there is one CPU - while the future probably needs data movement to be the king
Architected, power-efficient, language-accessible, asynchronous, low-latency data-movers seem desirable
Kiva Design
Multicore implies multithreaded software and interthread communication
Current languages are purely sequential, and rely on libraries or system calls to effect concurrency and communication. This means that there is a huge number of tricky concurrency problems that cannot be found at compile time - language extensions which support concurrency and communication are needed to fix this
The introduction of language extensions such as these will be a fairly long process, with lots of customer and industry involvement and the eventual driving of the extensions into the appropriate language standards
And the underlying concurrency architecture has to be available to current compilers/languages/tools and users as a well-supported library or perhaps Concurrency Abstraction Layer
Language features should allow message-passing to be about as cheap as passing arguments to a function; spawning threads should be about as cheap (in code space and path length) as calling a function
Although to do this properly needs something different from (and simpler than) a vanilla RISC architecture
While message-passing is almost certainly the best technological solution, shared-store cache-coherent systems will continue to flourish (perhaps as small SMP nodes in a larger SoC) - and so locking needs to be safe and efficient. New semantics such as transactional memory may vastly ease this problem - and improve performance - and also call for new language capabilities
Kiva Design
Multicore implies multithreaded software and interthread communication
There are no effective tools to perform what-if design analysis on partitioning functions across engines, across cores of various capabilities; and so vendors cannot choose appropriate silicon partitioning when designing SoCs, nor can they do evidence-driven design of appropriate architectures and microarchitectures for their engines and cores; and nor can customers partition their systems and choose resource-management strategies to share SoC or system resources effectively
A legacy of concentrating on overly-complex “clock-accurate” models of complex cores is a poor foundation for systems modeling
In designing architecture, it’s important that the proposal be shown to work across a range of microarchitectures, not that it works on a single microarchitecture
A systems modeling tool needs to be able to model software at various levels of abstraction, not just clocked hardware
Kiva Design
For some applications, large, complex cores will still be needed
As the use of concurrency matures, many of these “sequential” problems may transmute into efficient concurrent solutions
But meanwhile the availability of programmable swarms of engines will enable new applications and markets
For other apps, a swarm of simpler cores can be much better than one large one
The small cores can contain little but register resources and the computational blocks needed, with little or no supporting infrastructure
No register renaming, completion buffers, complex memory queues, vast branch-prediction structures, large shared global buses, sprawling SIMD computational units.…
Instead, performance will be obtained by having most of the cores doing something useful most of the time
Multithreading is a possible extra, although the simplicity of implementing architectures with little context is more attractive
These cores will dissipate less power per instruction executed than larger cores, and will allow a new dimension of power management
voltage and frequency scaling will still work, as will varying the number of cores being used for the work. This will likely need to be supported mainly through software, which will need appropriate “introspection” capabilities to understand what’s going on in the silicon
This will drive (at least some of) the onchip interconnect to support asynchronous intercore communication, and communications density will play a part in choosing how to dynamically reconfigure the swarm to manage power effectively
Kiva Design
Making efficient, competitive, attractive products which leverage multicore will require many (probably inconvenient) changes, and careful attention to their interplay and how they are presented to customers and the industry at large
Processor architecture changes
Embracing intelligences which are not mainstream processors but still need software tools
Power management has new dimensions, for both good and ill
Providing consistent system API’s from bare metal/1000-engine chips to rich Unix-like OS running on a few processors, perhaps all in the same SoC
SoC interconnect suitable for multiscale network-on-a-chip-like systems
Toolchains which handle heterogeneity in all its glory
Simulation/modeling technologies and tools to allow investigations
Cross-organisational efforts probably needed
… and more
Which of these need to be addressed first and why, and what the metrics for success might be, cannot be specified at this juncture. Decisions need to be driven by:
Planning horizon
Business imperatives
..and commitment from the company to embark on a program and see it through
Kiva Design
Strategy: How do we get There from Here?
First, identify the scope of the need and financial resources available
Key inputs are business needs within a chosen, defined time horizon. To be useful there should be 1, 3 and 5 year horizons and these should incorporate good competitive/industry trend projections as a backdrop
Also desirable is sufficient information to support evaluating various what-if analyses to estimate the likely effect on revenues/profit/markets/customers of some reasonable number of possible investment scenarios
With that as background, propose minimal, preferred and maximal technology investment projects, partnership plans (quantified with time, money, people etc); choose one plan; and obtain commitment, funding and resources through the 5 year horizon for the chosen plan
Real plans will need evidence-based refinement over time. Changes to the initial plan do not de facto represent failures of planning or execution.
Kiva Design
How Much Will It Cost?
It’s not practical to cost any plan right now, but a sketch may prove helpful
Assume that what is needed technically is several new architectures and associated toolchains from established vendors, along with one new programming language. To do this, a reasonable guess at what investment is needed might be:
An architecture description tool able to drive efficient compiler back-end generation and simulation engines along with the creation of those compilers and simulations:
5 people over 3 years - 15py
Three new microarchitectures ranging in scale from an ARM7 class machine to a MIPS 24K class machine, all completely synthesisable: 20 people over 2 years: 40 py
Four new application-specific accelerators, all needing software toolchains:
20 people over 2 years: 40 py
New onchip interconnect family with “interconnect compilers” which select the right variants for a given SoC:
5 people over 2 years: 10 py
Business unit support for customers and partners, including app note, specs, boards,…
ramping from 5 to 20 people over 5 years: 50 py
Payments to compiler, OS etc vendors to support new technology:
$10M over 5 years
Total: Money - about 150 py plus $10M, or $22.5M+$10M. Call it $35M
This is significantly cheaper than the cost of a traditional “next-generation processor” project and provides IP for product use much more quickly