REAL-TIME LINUX & TOOLS FOR THE JETSON TX1 · 2017. 5. 18. · Time-proven APIs for hard real-time...
Transcript of REAL-TIME LINUX & TOOLS FOR THE JETSON TX1 · 2017. 5. 18. · Time-proven APIs for hard real-time...
2017 - CONFIDENTIAL AND PROPRIETARY INFORMATION
REAL-TIME LINUX & TOOLS FOR THE JETSON TX1
KEN JACKSON
MAY 8, 2017
REDHAWKReal-Time Linux
RedHawk kernels are based on NVIDIA’s kernel source plus 280 Concurrent-developed real-time feature patches
Approximately 100K lines of Concurrent code added
Time-proven APIs for hard real-time development
High-performance real-time features can be used to improve application performance
Provides many transparent performance enhancements
Does not depend on the open-source PREEMPT_RT patch
REDHAWKReal-Time Linux
Software is installed on a Jetpack-initialized Jetson TX1
RedHawk real-time Linux kernels
• Run-time — absolute highest performance, but rarely used
• Trace — extremely low overhead kernel event tracing
• Debug — mainly used for driver debug
Command-line tools
Libraries and header files
Documentation
REDHAWKCPU Shielding
Best approach for maximizing real-time performance
Shielded CPUs are sheltered from all unnecessary activity
Real-time processes can then be assigned to shielded CPUs for lowest latency and maximum determinism in execution times
Ability to disable some or even all interrupts
Automatically moves kernel daemons off of shielded CPUs
Minimizes the effects of cross-processor interrupts
Fully dynamic configuration while system is running
REDHAWKCPU Shielding
The shield command is unique to RedHawk
Used to shield CPUs from one or all of:
• Arbitrary processes
• Arbitrary interrupts
• Local-timer interrupts
Examples
• shield –p 1,3 –i 0 –l 2
• shield –a 1-3
• shield –r
REDHAWKProcess Binding
The run command is unique to RedHawk
Can specify CPUs, scheduling class, priority and time quantum
Use at program startup or to change running processes
Memory lock process’ current & future pages, no source required
Display bindings for one, all, or any subset of processes
Examples• run –b 1 –s fifo –P 90 ––lock=all ./controller
• run –b 2-3 –s rr –q 50ms –n data-logger
REDHAWKKernel Event Tracing
Fully lockless kernel event tracing implementation
Extremely low-overhead event tracing with per-CPU buffers
Dynamically enable/disable tracing of all or a subset of events
Dynamically enable/disable tracing on all or a subset of CPUs
Many modes of operation, including exhaustive and wrap-around
Events can be viewed live or collected for later analysis
Much more about kernel event tracing in the next section covering the NightStar tools
REDHAWKReal-Time Demo
Runs the Cyclictest real-time benchmark on a CPU
• https://rt.wiki.kernel.org/index.php/Cyclictest
Creates a background load using the simple POSIX Stress utility
• http://people.seas.harvard.edu/~apw/stress/
Continuously samples and graphs results
Interactively start and stop background load
Interactively toggle between shielded and unshielded modes
Keeps track of worst-case result seen, zeroed on mode change
REDHAWKReal-Time CUDA Demo
Measure impact of CUDA activity on real-time performance
Runs the Cyclictest real-time benchmark on a CPU
Creates a background load using the CUDA Reduction example
Continuously samples and graphs results
Interactively start and stop background load
Interactively toggle between shielded and unshielded modes
Keeps track of worst-case result seen, zeroed on mode change
REDHAWKFormal Benchmarking
12 hour Cyclictest results on Jetson TX1
RedHawk with Stress load
• Min 8µs Average 16µs Max 38µs
RedHawk with Stress and CUDA load
• Min 8µs Average 16µs Max 49µs
NVIDIA r24.2.1 kernel
• Unable to run Cyclictest due to lack of voluntary preemption in kernel
• Concurrent’s home grown benchmarks showed >100 milliseconds worst case
REDHAWKMany More Features
Frequency-based scheduler
High-resolution process accounting
Ptrace extensions
• fast breakpoints
• debugger visibility and control
Optionally receive SIGBUS on page faults
NUMA memory shielding and user/kernel text replication
• Only on x86 today, but coming soon to ARM – via ARMv8.1-A spec
NightStar for the Jetson TX1
NIGHTSTAR FOR THE JETSON TX1Debugging & Analysis Tools
NightTrace
Trace & Performance
Utility
NightView
Symbolic
Application
Debugger
NightTune
System
Activity & Tuning
NightProbe
Data Monitoring
Utility
NightSim
Cyclic Process
Scheduler
NIGHTSTAR
NIGHTSTAR
Hosted on a variety of Linux distributions
• CentOS, Fedora, Ubuntu, RHEL
• X86 and ARM64 systems
Targets include ARM64 and X86 Linux systems
• Certified ARM64 targets includeNVIDIA Jetson TX1™
Applied Micro Circuits X-C1™
• X86Any 32-bit or 64-bit bit Intel or AMD64 system
• Host and target system may be the same machine
• Cross-target usage is supported in either direction (X86 or ARM64)
NightTrace
Incredibly powerful method of tracking code activity and system activity in a time-synchronized graphical display.
Presents a cohesive view of the operation of individual threads, processes, CPUs, and the operating system as a whole.
Invaluable tool to troubleshooting developing versions of software and on-site customer problems.
CUDA Tracing – GPU kernels, CUDA API usage.
NIGHTSTAR
NIGHTTRACE
Allows programmers to automatically trace CUDA API function calls and examine the values of parameters passed and returned without changing their source code.
Allows users to add trace points into the CUDA kernels that are executed by the GPU.
Provides CUDA-centric display panels for analysis.
NIGHTTRACELinux Kernel Tracing
Concurrent patches to kernel.org.
Real-time performance (very minimal overhead).
Scalable (operates well on systems with large # cpus).
Tracepoints are already inserted in the kernel.
While useful to kernel developers, it is aimed at users who need to understand what is happening in the kernel andtheir application.
NIGHTTRACE
NIGHTTRACE
NIGHTTRACE & CUDA
NIGHTTRACE GPU-TRACING
NIGHTSTARNightView
A complete symbolic debugger supporting Ada (Concurrent’s Ada Product), Fortran, C/C++.
Debugs multiple threads, multiple processes, on multiple system all from a single interface.
Superior multi-threaded debugging features, especially important for real-time applications.
Designed for the lowest amount of process intrusion.
Includes debugging CUDA user applications on the GPU.
Wipes out “heisenbugs”
NIGHTVIEW
NIGHTVIEW & CUDA
NIGHTTUNE
Provides a graphical and integrated view of a wide range of system metrics, process activities, and real-time performance.
It’s more than just graphical presentation.
A user interface to control RedHawk CPU shielding; a key part of RedHawk’s value-add.
Provides for remote monitoring and management.
All metrics and events it measures can be recorded for off-line analysis.
Provides details about the GPU cards installed in the system and dynamic statistics of GPU activity.
NIGHTTUNE
NIGHTTUNE
NIGHTTUNE & CUDAConfiguration Panel
Details about each CUDA device
• Kernel Version
• Compute Capability
• Clock Speeds
• Cores, Warps, and Lanes
• Grid & Thread Dimensions
• GPU Details
• Memory
NIGHTTUNE & CUDAActivity Panels
NIGHTSTARNightProbe
A non-intrusive tool for sampling data from a process
Browse variables in the program
Change values on the fly
Provide lists, tabular and graphical display of data over time
Includes support for synchronizing data capture with a process
Provides an API for locating and describing variables within a program file – the basis for customer-developed applications.
NIGHTSTARNightSim
Provides a graphical interface to RedHawk’s Frequency-Based Scheduler (FBS).
Schedule threads and processes based on user-defined cycles.
Typically driven from an interrupt source or real-time clock.
Monitors thread and process execution on a per-cycle bases.
Supports deadline detection.
Exports scripts for use with the FBS command line interface.