LCU14 303- Toolchain Collaboration
description
Transcript of LCU14 303- Toolchain Collaboration
LCU14 BURLINGAME
Ryan Arnold, LCU14
LCU14-303: Toolchain Collaboration
● Participants● Linaro● ARM● QuIC● Cavium● ST
● Topics● Participant Introductions and Development Focus● GNU Toolchain Roadmaps● GNU Toolchain Specifics● LLVM Roadmaps● LLVM Specifics● System Libraries, Linkers, Debuggers, and Tools
Toolchain Collaboration For The Next 6 Months
● Representation● Ryan Arnold - Engineering Manager● Maxim Kuvyrkov - Tech Lead● Team - 6 Linaro employees and 6 member assignees
● Kugan Vivekenandarajah, Venkataramanan Kumar, Bernie Ogden, Omair Javaid, Will
Newton, Rob Savoye, Michael Collison, Christophe Lyon, Charles Baylis, Yvan Roux, Renato
Golin, Wang Deqiang
● Purpose● Improve Collaboration● Eliminate Roadmap Redundancy● Identify gaps in eco-system
Linaro - Introduction & Purpose
● Product Validation Framework Improvements● Backport, Release, and Binary Toolchain validation automation and reporting
● Toolchain Performance● GCC and LLVM Performance
● Benchmark Automation● Backport, Release, and Binary Toolchain benchmark automation and reporting
● Product offering expansions in 2015● x86_64 hosted cross toolchains● Aarch32 targeted cross toolchains● ARMv7 and ARMv8 hosted toolchains
Linaro - Focus from LCA14 into 2015
PUBLIC
Open Source Core ToolchainsThe Next Six Months
Matthew Gretton-DannAugust 2014
PUBLIC
▪ Tell you what ARM plans to work on, and what its current priorities are▪ However, things are likely to change – so:▪ We do not promise to achieve all of this in the next six months; nor▪ Do we promise not to do other work
▪ If your plans include the same topics, or work in the same areas▪ Come and talk to us – we should work together▪ Preferably this conversation should happen in the appropriate upstream communities.
▪ If you feel that we’re doing the wrong thing▪ Come and talk to us – we’re happy to work out a better way forward
▪ We are moving to tracking all our ‘public’ work in the appropriate community Bugzilla databases.
▪ This is the best place to have the conversation about best ways forward.
Purpose of this Presentation
PUBLIC
▪ Support the Architecture & Cores▪ Teams are involved in development of new cores and architecture extensions▪ We will not discuss those here▪ However, we plan to upstream functionality as soon as possible after public announcements
▪ Support the Community
▪ Improve Performance:▪ Focus on Cortex-A57 performance improvement▪ Focus on a range of benchmarks, including industry standard CPU benchmarks.▪ We analyze benchmarks both:
◦ for improvements we can make to the toolchains; and◦ to note any regressions and get them fixed in co-operation with the community
Overview of Goals for Year
ST - Introduction & Purpose
ST - Focus from LCU14 into 2015
QuIC - Introduction & Purpose
QuIC - Focus from LCU14 into 2015
● Supports GNU based ThunderX toolchain internally (and other Cavium products)
● Make sure that GCC performance areas are covered but not twice● Implemented ILP32 support in the kernel, glibc and parts of gcc
and binutils support● Helped in getting some performance improvements for AARCH64
already○ Naveen implemented many patterns in the back-end for the instructions which were not
being emitted○ Andrew helped with part of conditional compares; improving ifcombine○ Added issue rate to the AARCH64 cost table○ Added trap pattern so abort function is not used for __builtin_trap○ Removed some redundant cmp’s
● Added many new testcases
Cavium - Introduction
● Finish upstreaming ILP32 support○ Including gdb and glibc support○ glibc patch is almost done, just finalizing the patch set
● Upstream base ThunderX support○ Will not include a schedule model to begin with
● Upstreaming patches for GCC 6 stage 1○ Conditional moves improvements○ Improvements to conditional compares○ Large system extension support in GCC
■ Joel posted an infrastructure change that was rejected; might need to rewrite them○ LSE HWCAP support in glibc and kernel
■ Need to know what path is acceptable for glibc○ Some tweaks to the cost tables in AARCH64; needed for ThunderX support
● Looking into prefetch loop arrays
Cavium - Focus from LCU14 into 2015
GNU Toolchain Collaboration
Linaro - GNU Toolchain Roadmap
Q1 Q2Q3 Q420152014
Q2 Future
ClosedDevelopment
Drafting
Community/External
Upstream
ApprovedScheduled
Ongoing
● Continue Member Driven Optimizationscurrent examples in development:● Zero-sign-extension elimination using value-range propagation● NEON intrinsics improvements in Libvpx on ARM & Aarch64● STREAMS performance improvements
● Identify Linaro Toolchain product driven optimizations● Benchmarking Linaro toolchain products● Identifying Regressions● Improving performance based on investigations
● Performance Comparisons● Identify potential optimizations based on performance gains seen on other
architectures.● Future
● Whole System Profiling & Workload Profiling● Feature exploitation
● LTO for Aarch64
Linaro - What’s Next for GCC Performance?
● Improve NEON testing coverage and correctness● GCC community stewardship
● bug triage● patch review
● Unified Driver Development● LLVM Community Releases
Linaro - Community Involvement
● Improve validation of Linaro GCC source package backports● Improve automation● Add default configurations validated per backport: 8 17
● Provide expansive source release validation of existing products● all default configurations● all enabled secondary configurations● all supported languages● various tunings
● Offer new products● arm and aarch64 native binary toolchains● x86_64 hosted cross toolchains● Aarch32 targeted cross toolchains
Linaro - What’s Next for Product Offerings?
● Release Candidate Benchmarking○ Current Release Benchmarking
■ Manual SPEC2K (looking for release regressions)○ Future Release Benchmarking
■ Automated SPEC2K, SPEC2K6, EEMBC Suite● Backport Validation Benchmarking
○ Current Backport Benchmarking■ None
○ Future Backport Benchmarking■ Automated Coremark in development
● Reporting - uploading permitted relative results to members only portal● Why does Linaro do benchmarking?
○ Guides future development○ Informs validity of patches in development
■ Current Development Benchmarking● as-needed: Coremark, SPEC2K, SPEC2K6
Linaro - What’s next for Benchmarking?
PUBLIC
GNU Roadmap : Cortex-AM
OBI
LE
ENTE
RPRI
SE
CO
MM
ON
2014 FutureH1 2015
Released
Development
Adv. Planning
Concept
ARMv8 A32 - ISA extension
Cortex-A12 - Arch support
A64 toolchain production ready - GCC 4.9
Cortex-A12/A17 - uArch tuning, cost model
A64 performance gains
ACLE 64 - Specification
Cortex-A57 - uArch tuning, cost model
ILP32- User space & production
ACLE 64- Implementation
Big Endian- AArch64 auto-vectorization
Maths libraries
A64 GOLD
Big Endian – Basic AArch64 support
Performance optimization - CPU-centric performance enhancements
Toolchain features - Continuous ecosystem contribution for performance and features, NEON intrinsics
GCC 4.9
GCC 4.10 / 5.0
A7/A15 A32 big.LITTLE
PUBLIC
▪ Reworked AArch64 RTX costs ▪ Improved Neon intrinsics code generation▪ PUSH_ARGS_REVERSED improvements.▪ GLIBC math library improvements for AArch32 and AArch64 ▪ Improved code generation for copysign intrinsic ▪ Improved choice of spill size for FP registers (decreasing memory bandwidth)▪ Restructured and improved prologue/epilogue sequences – especially with –fomit-
frame-pointer.▪ Improved addressing modes for vectors on AArch64▪ Improve AArch32 memset inlining
What We’ve Done In the Past Three Months or SoGNU Toolchain
PUBLIC
▪ General bug fixes and maintenance▪ Enable shrink-wrapping for AArch64 (GNUTOOLS-2476)▪ Investigate and initial RFC for better load store pair generation (GNUTOOLS-154)▪ Improved bit field handling instructions (GNUTOOLS-197)▪ Big Endian AArch64 fixes (Focused on SIMD and vectorisation correctness)▪ Improved Register move costs (GNUTOOLS-4528)▪ Misc performance improvements based on scheduler / backend tweaks (GNUTOOLS-4317,
GNUTOOLS-4508)▪ Improved csinc / csneg generation (GNUTOOLS-4335)▪ Conditional compares ▪ Core tuning: Cortex-A57, Cortex-A12 and Cortex-A17▪ IVOpts improvements▪ Memcpy for AArch64 – inlining and improved alignment
What’s NextGCC – Things to do before Stage 1 closes (mid-October 2014)
PUBLIC
▪ Stage 3▪ Bug fixes/Regression fixes.▪ Improved conformance and performance for Advanced SIMD Intrinsics.
▪ Stage 4▪ Regression fixes.▪ Help community get GCC 5.0 released.
What’s NextGCC – During Stage 3 (October – December 2014) and Stage 4 (Early 2015)
PUBLIC
▪ Maintenance▪ Support the architectural roadmap▪ Help community get Binutils 2.25 released.
What’s NextBinutils & GDB
QuIC - GNU Toolchain Roadmap
QuIC - GNU Toolchain Details
ST - GNU Toolchain Roadmap
ST - GNU Toolchain Details
LLVM Collaboration
Linaro - LLVM Roadmap
Q1 Q2Q3 Q420152014
Q2 Future
ClosedDevelopment
Drafting
Community/External
Upstream
ApprovedScheduled
Ongoing
current staff coverage line
● Become the compiler of choice for all Qualcomm processor cores● Today LLVM is the compiler of choice for DSP and GPU● Would like to see LLVM reach that level acceptance for CPU before the end of 2015
● Realize the full benefits of code hygiene on ARM from LLVM’s family of projects, i.e., sanitizers.
QuIC - Goals for LLVM
● Collaborated with ARM on initial Aarch64 backend● Worked with the community on the ARM64/Aarch64 merge
● CortexA53 machine description● CortexA57 machine description
● Contributed initial Aarch64 ELF support to lld● ASAN bug fixes
QuIC - What has QuIC done with LLVM
● Continue weekly collaborate with ARM on performance optimizations, particularly Aarch64.
● Greedy inliner● PGO● Incremental use of sanitizers
QuIC - What QuIC will be working on
● Community Maintainership● LLVM 3.5 and LLVM 3.6 release maintainership
● Support● LLVM Kernel initiative, Android bugs, buildbots, member support
● LLVM Toolchain Stability● Assembler, compiler libraries, linker, tools, libc++
● LLVM Performance● Benchmarking & Profiling● Comparing against GCC/x86● Performance parity of 32-bit vs. 64-bit
● Sanitizers - might be covered under GCC development plan
● LLVM Linker● LLVM Integration on Android for Aarch64
Linaro - What’s Next For LLVM in Linaro?
current staff coverage line
PUBLIC
MO
BIL
EEN
TERP
RIS
EC
OM
MO
N
2014 FutureH1 2015
Released
Development
Adv. Planning
Concept
LLVM 3.4
LLVM 3.5
LLVM 3.6
LLVM Roadmap : Cortex-A
v8 NEON - AArch64
Big Endian - Basic
Benchmarking infrastructure – Public performance tracking buildbot
L
libc++ buildbotInitial Autovectorization
L
AArch32 buildbot
Cortex-A53 - uArch tuning
L
AArch64 and Cortex-A57 - Performance tuningARM64 / AArch64 backend merge
PUBLIC
▪ Completion of the ARM64 and AArch64 backend merge▪ Performance improvements:
▪ Improve code generation for converting in-memory 16-bit integer to 64-bit float (LLVM-1508)▪ Optimistically use ‘sqrt’ instruction where available, and only fall back to a library call in the
presence of NaNs (LLVM-1509)▪ Reduce spilling of Q registers (LLVM-1538)▪ Improve code selection between conditional instructions and branches. (LLVM-1489)▪ A57 Fused multiply tuning (LLVM-1610)▪ Improve Global Value Numbering (LLVM-1612)
▪ Re-engineering of ARM Neon intrinsic support▪ Big Endian Support - AArch32 & initial AArch64 support▪ Stack size reduction patches – some work still to do.
What We’ve Done In the Past Three Months or SoLLVM Toolchain
PUBLIC
▪ Inline parameter tuning (LLVM-1500)▪ Improve spilling heuristics (LLVM-1524, LLVM-1504, LLVM-1586)▪ Common expression hoisting (LLVM-1247, LLVM-1490, LLVM-1550)▪ TBNZ and CBNZ optimization (LLVM-1575)▪ Register coalesce and rematerialization (LLVM-1582)▪ Redundant common comparison expressions (LLVM-1491)▪ Loop induction variable selection (LLVM-1492)▪ Remove redundant stores▪ Improved usage of vectorization opportunites using structs (LLVM-1501)▪ Reduce xzr assignment on cbz target (LLVM-1583)
What’s NextLLVM – Performance
PUBLIC
▪ Global variable store should be hoisted (LLVM-1493)▪ Too many MOVs on function call boundaries (LLVM-1504)▪ Optimise LDR, LDRSW sequence into LDR, SXTW (LLVM-1581)▪ Tune loop unrolling (LLVM-1587, LLVM-1590, LLVM-1646)
What’s NextLLVM – Performance
PUBLIC
▪ Buildbots & benchmarking infrastructure▪ Plan to setup a public performance tracking bot on Juno-A57▪ To be publicly visible, maintained, and continuously producing performance numbers▪ Running the LLVM LNT test-suite as a benchmark
▪ Various bug fixes and improvements▪ Focus on ARMv8-A, ARMv7, and ARMv6-M.
▪ Support for selected ACLE (non Neon) intrinsics
What’s NextLLVM - Other
ST - LLVM Toolchain Roadmap
ST - LLVM Toolchain Details
System Libraries, Tools, Debuggers Collaboration
● System Libraries● malloc benchmarking● malloc improvements● string and memory function optimizations for arm-linux-gnueabihf● Linaro GDB and glibc source package releases with backported optimizations
● GDB● Finish GDB on Android for ARMv8 support - catchpoints● Aarch32/Aarch64 completeness - test-suite parity● Aarch32 mix-mode debugging (thumb and arm modes)
Linaro - What’s next for system libs & dev tools?
PUBLIC
▪ String routine improvements▪ Maintenance activities. ▪ Help community get 2.21 released.
What’s NextGlibc – up to 2.21 release (end of 2014)
PUBLIC
▪ Linkers: LLD & Gold▪ Libc++▪ Sanitizers▪ ILP32
What We Are Not Currently DoingBut Are Interested In…
QuIC - Libraries, Linkers, Debuggers, Tools
ST - Libraries, Linkers, Debuggers, Tools
More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/