Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1'...

29
Sx WG Rev 0.2 Sx Issue Triage and Debug Steps Version 0.2 12/05/2014 1

Transcript of Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1'...

Page 1: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Sx Issue Triage and Debug StepsVersion 0.212/05/2014

1

Page 2: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Copyright and Disclaimer

Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any Intellectual property rights is granted by this document. except as provided in Intel's terms and conditions of sale for such products, Intel assumes no liability whatsoever and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other Intellectual property right. Unless otherwise agreed in writing by Intel, the Intel products are neither designed nor intended for any application in which the failure of the Intel product could create a situation where personal injury or death may occur.Intel may make changes to specifications and product descriptions at any time without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained at: http://www.Intel.com/design/literature.htmAll products, platforms, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. All dates specified are target dates, and are provided for planning purposes only; and are subject to change.This document contains information on products in the design phase of development. Do not finalize a design with this information. Revised information will be published when the product is available. Verify with your local sales office that you have the latest datasheet before finalizing a design. Code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licenses and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.*Other names and brands may be claimed as the property of others.

Copyright © 2013 Intel Corporation. All rights reserved.

2

Page 3: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Table of Contents1. Sx Issue- Debug/Triage BKM................................................................................................................5

1.1 Scope...........................................................................................................................................5

1.2 Target audience...........................................................................................................................5

1.3 Details..........................................................................................................................................5

1.4 Requirements..............................................................................................................................5

1.4.1 Hardware requirements.............................................................................................................5

1.4.2 Software requirements...............................................................................................................5

1.5 Description...................................................................................................................................5

1.6 Cycling........................................................................................................................................11

1.6.1Requirements............................................................................................................................11

1.6.2 Hardware requirement.............................................................................................................11

1.6.3 Software requirements.............................................................................................................11

1.7 Description.................................................................................................................................12

1.7.1 Examples on collecting PMC log is shown below......................................................................13

1.8 Collect PMC log using RW tool – as explained below.................................................................13

1.9 Requirements............................................................................................................................13

1.9.1 Software requirements.............................................................................................................13

1.9.2 Description................................................................................................................................13

1.10 System hangs in OS Phase.........................................................................................................16

1.11 Requirements............................................................................................................................16

1.11.1Hardware requirement............................................................................................................16

1.11.2 Software requirements...........................................................................................................16

1.12 Description.................................................................................................................................16

1.13 BCDEDIT.....................................................................................................................................17

1.14 How to analyze crash dump.......................................................................................................18

1.15 OS Crash (BSOD) during cycling.................................................................................................19

1.16 System hangs in BIOS Phase (POST code hangs) during cycling.................................................19

1.17 Requirements............................................................................................................................19

1.17.1 Hardware requirement...........................................................................................................19

1.17.2 Software requirements...........................................................................................................20

1.18 Description.................................................................................................................................20

3

Page 4: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.19 The hardware connections snap................................................................................................21

1.20 Teraterm Configuration.............................................................................................................22

1.21 The cause of Sx failure viz., ME..................................................................................................22

1.22 Requirements............................................................................................................................22

1.22.1 Hardware requirement...........................................................................................................22

1.22.2 Software requirements...........................................................................................................22

1.23 Description.................................................................................................................................23

1.24 Point of Contact.........................................................................................................................24

Revision Sheet

Release No. Date Revision Description

Rev. 0.2 12/05/2014 Revised Document

4

Page 5: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1. Sx Issue- Debug/Triage BKM

1.1 Scope The purpose of this document is to explain basic triage and debug that needs to be performed during Sx failure before filing sighting. This is first level triage and debug hence this may not have comprehensive list of issue scenarios.

1.2 Target audience Sx Validation and debug team

1.3 Details First level debug Steps for different Sx issues that we come across are explained below in details. CATERR failure Collect MCDump using ITP – command to collect MCdump itp.unlock() import sys sys.path.append(r"\\hsw-tb\hsw\itp_scripts") import BdwMCDump [Note: this is for BDW CPU] Collect AFD dump as explained below

1.4 Requirements

1.4.1 Hardware requirements ITP box 5V/2.5A adaptor

1.4.2 Software requirements Platform debug tool kit DFx Abstraction layer Python console

1.5 DescriptionStep1: Connect ITP on the board

Step2: Launch configuration console and select the appropriate target

5

Page 6: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step3: Start the Intel DAL Python Console

6

Page 7: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step4: Make sure ITP connection is established with the target by typing 'itp.devicelist'

Step5: Unlock the itp with 'itp.unlock' command.

Step6: Open Platform debug kit and navigate to State Freeze and dump under View

7

Page 8: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step7: Select dump type as Hang Base and click on run.

8

Page 9: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Note: This will trigger to collect AFD and the progress could be seen on Message log window.

9

Page 10: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

10

Page 11: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step8: The dump file will be saved in the output folder mentioned in the output tab as shown below, also once after AFD get generated 'Run' tab will change to 'Stop'.

1.6 Cycling System reset / shutdown unexpectedly during cycling Collect PMC log file using Stardebug– Refer this link for BKM

1.6.1Requirements

1.6.2 Hardware requirement UTAG

    1.6.3 Software requirements Stardebug application

11

Page 12: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.7 DescriptionBelow are the steps to establish connection between Stardebug and PCH.

Step1: Connect Stardebug on the rear XDP port.Step2: Download the latest version of stardebug from the below link: https://houston.fm.intel.com/wiki/doku.php?id=swtwiki:userdocs:tools:stardebug:start#download. Step3: Locate Stardebug.exe from the extracted folder

12

Page 13: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step4: On launching stardebug.exe the above highlighted debug blocks should be displayed. This indicates the connection to between stardebug and PCH is established.

1.7.1 Examples on collecting PMC log is shown below Once PCH get connected to Stardebug (Above step 4), log collection can be proceeded. Before initiating any log collection make sure the script file to create the log is placed in

the same folder as stardebug.exe Locate 'dft' tab by typing ' sw dft' command Now initiate the command to start log file collection, enter run LptPmDump1.5M.lua as

shown in the above screenshot. A text file Pmdump.text would have created in the same folder this is our PMC log file.

1.8 Collect PMC log using RW tool – as explained below

1.9 Requirements

1.9.1 Software requirements RWEverything

 1.9.2 Description

Step1: Launch RWEveryting and click on Memory icon as shown in the below figure.

13

Page 14: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step2: Write 0x03030002 to 0xfed1f320

Step3: Read DWORD from 0xFED1F338

Step4: Decode the PMC value from the below table:

Reg 0x303 (bit 0 to bit 7):

BIt 7: LTRESET# With Policy 1 (LTRST_POL1): This bit is set to '1' by hardware when a global reset is triggered by an LTRESET# assertion with LT_E2STS.LT_RESET_POLICY = 1.

BIT 6: ME-Initiated Global Reset (ME_GBL): This bit is set to '1' by hardware when a global reset is triggered by an ME FW write of 1's to both GENCTL-"ME Partition Reset” and “GENCTL”. ME-initiated Host Reset with Power Cycle" in the same write cycle (this is ME FW's method of requesting a global reset).

14

Page 15: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

BIT 5: CPU Thermal Trip (CPU_TRIP): This bit is set to '1' by hardware when a global reset is triggered by a CPU thermal trip event (i.e. an assertion of the THRMTRIP# pin).

BIT 4: ME-Initiated Power Button Override (ME_PBO): This bit is set to '1' by hardware when a global reset is triggered by an ME FW write of '1' to GENCTL."ME-Initiated Power Button Override".

BIT 3: ICH Catastrophic Temperature Event (ICH_CAT_TMP): This bit is set to '1 by hardware when a global reset is “Triggered by a catastrophic temperature event from the ICH internal thermal sensor”.

BIT 2: PMC SUS RAM Uncorrectable Error (PMC_UNC_ERR): This bit is set to '1' by hardware when a global reset is triggered due to an uncorrectable parity error on a data read from one of the PMC SUS well register files.

BIT 1: Power Button Override (PB_OVR): This bit is set to '1' by hardware when a global reset is triggered by a power button override (i.e. an assertion of the PWRBTN# pin for 5 seconds).

BIT 0: SUS Well Power Failure Status (SUSFLR_STS): This bit is set to '1' by hardware when a global reset is triggered by loss of SUS well power. This includes DeepSx entry and G3.

Reg 0x304 (bit 8 to bit15):

BIT 5: AS Well Power Failure (ASW_FLR): This bit is set to '1' by hardware when a global reset is triggered by an unexpected loss of ASW power (i.e. a de-assertion of APWROK at an unexpected time).

BIT 4: SYS_PWROK Failure (SYSPWR_FLR): This bit is set to '1' by hardware when a global reset is triggered by an unexpected loss of SYS_PWROK. FW arms this global reset source via GBLRST_CTL.EN_SYSPWR_FLR.

BIT 3: PCH_PWROK Failure (PCHPWR_FLR): This bit is set to '1' by hardware when a global reset is triggered by an unexpected loss of PCH_PWROK. FW arms this global reset source via GBLRST_CTL.EN_PCHPWR_FLR.

BIT 2: PMC Firmware Global Reset (PMC_FW): This bit is set to '1' by hardware when a global reset is triggered by a request from PMC firmware (i.e. a write of '1' to the GBLRST_CTL.TRIG_GBL bit).

BIT 1: ME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware watchdog timer.

BIT 0: PMC Firmware Watchdog Timer (PMC_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the PMC firmware watchdog timer.

Reg 0x305 (bit 16 to bit 23):

BIT 4: Over-Clocking WDT Expiration In ICC Survivability Mode (OC_WDT_EXP_ICCSURV): This bit is set to '1 by hardware when a global reset is triggered by the expiration of the

15

Page 16: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

over-clocking watchdog timer while running in a mode that has ICC survivability impact (OC_WDT_ICCSURV=1).

BIT3: Over-Clocking WDT Expiration In Non-ICC Survivability Mode (OC_WDT_EXP_NO_ICCSURV): This bit is set to '1 by hardware when a global reset is triggered by the expiration of the over-clocking watchdog timer while running in a mode that does not have ICC survivability impact (OC_WDT_ICCSURV=0).

BIT2: ADR GPIO Reset (ADR_GPIO_RST): This bit is set to '1' by hardware when a global reset is triggered by the assertion of the GPIO assigned to ADR.

BIT1: ME HW Uncorrectable Error (ME_UNCOR_ERR): This bit is set to '1' by hardware when a global reset is triggered by ME hardware due to the detection of an uncorrectable ECC or parity error on a data read from one of its SRAM s.

BIT0: CPU Thermal Runaway Watchdog Timer (CPU_THRM_WDT): This bit is set to '1' by hardware when a global reset is triggered by the expiration of the CPU Thermal Runaway Watchdog Timer.

Collect window Eventvwr log file BKM: Run EventVwr Windows system

1.10System hangs in OS Phase(System control transferred from BIOS to OS) during cycling - Example, Blank screen, window display freeze.

Collect Windbg and analyze current status of system – Refer below for BKM:

1.11Requirements

1.11.1Hardware requirement Ajay's USB debug cable

1.11.2 Software requirements Windbg setup (x64 Preferable)

1.12 DescriptionStep1: Install the USB to USB convertor driver on both the host and target machine. (Driver copied here: \\akasha1\PSPV-Tools\windbg-driver). WinBlue OS has inbox driver for the cable and it will install the driver automatically.

Step2: Using USBVIEW tool find out the USB port1 on the target machine and connect the debug cable to port1 (Usually debug port is port1).

Step3: Change the BIOS settings as mentioned below by pressing F2 while booting,

Step4: Go to Intel Advanced Menu -> PCH-IO Configuration -> USB Configuration; and set XHCI Mode – Manual.

16

Page 17: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step5: Route USB 2.0 pins to which HC? - Route Per-Pin and set all the pin to XHCI except pin#1 and pin#11. Pin#1& pin#11 should be routed with EHCI itself.

Step6: After seeing the BIOS, boot into the OS.

1.13 BCDEDIT On an elevated command prompt run the below commands,

bcdedit /debug on bcdedit /dbgsettings usb targetname: (type any name) bcdedit /set {dbgsettings} busparams 0.29.0 (bus, device and function of the usb root

controller) Restart the target system. Open the windbg on the host machine and enter the target name

under USB tab (File -> Kernal Debugging -> USB) Now the target will start pumping the debug messages to the kernel debug window

17

Page 18: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.14How to analyze crash dumpStep1: Navigate to file > open crash dump then select the crash dump to be analyzed.

Step2: In the command bar type 'analyze-v’, this is the command to analyze the crash dump.

18

Page 19: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.15 OS Crash (BSOD) during cycling Collect dump file and analyze If no dump created then connect windbg and take dump – as explained above.

1.16System hangs in BIOS Phase (POST code hangs) during cycling Collect BIOS Serial log - Refer below for the BKM

1.17Requirements

1.17.1 Hardware requirement RS232 Null-Modem cable RVP with debug BIOS flashed.

19

Page 20: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.17.2 Software requirements Any UART terminal utility like Putty or Teraterm

1.18  DescriptionStep1: Flash debugs BIOS which is downloadable from client download [ex: HSW_LP_LPT_V106.3_Debug.rom]

Step2: Enter into Bios using F2 -> Intel Advance Menu-> Debug Configuration-> Serial Debug messages-> Set the value as per your requirement.

Step3: Connect Null-Modem cable to host and RVP (May need a USB to Serial Adapter to connect to host)

Step4: Install Terminal program (Putty, Teraterm, and Termite) on host with following settings:

Port: (Look in Device Manager/Ports) Baud rate: 115200 Data: 8 bit Parity: none Stop: 1 bit Control: none

Step5: Open putty-> Set Serial-> select the com as shown in the client device manager

Step6: Boot the system (system will start pumping debug messages to Putty)

Step7: Stop log file after system boots to OS.

The BIOS serial log will look like, BIOS_Serial dump.txt

20

Page 21: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.19The hardware connections snap

21

Page 22: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.20Teraterm Configuration

1.21 The cause of Sx failure viz., ME Please collect ME debug log – refer to the below BKM

1.22Requirements

1.22.1 Hardware requirement Dediprog hardware

1.22.2 Software requirements Dediprog flash utility FITC tool

22

Page 23: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

1.23DescriptionStep1: Take the SPI BIOS fileStep2: Install FITC toolStep3: Browse for BIOS full image(16MB) file and modify with below settings, build new image and flash it on your target system.Step4: Make sure it's not a LAN-less imageStep5: In FITC, under ME Region -> Configuration -> ME Debug Event Service, set as shown below:

Step6: Please make sure in Event Filters, group 87 has the value 0x1 (you can leave other groups as-is)

Step7: To record it, connect another computer to the same LAN as the DUT (note: the DUT must be connected using the built-in LAN, not any external PCIe card). On that other computer, run PDA (Platform Debug Analyzer) or WireShark, to record all the packets sent. You should see quite a lot (hundreds or more) packets sent on UDP port 64507.

Step8: Reproduce your issue on the target

23

Page 24: Sx Issue- Debug/Triage BKM · Web viewME Firmware Watchdog Timer (ME_WDT): This bit is set to '1' by hardware when a global reset is triggered by the second expiration of the ME firmware

Sx WGRev 0.2

Step9: Go to the location \\akasha1\temp\nramalin\Tools and install PDA.

Step10: Connect LAN cable to target to host machine, ensure ping is successful

Step11: Launch PDA app, start capture the log

Check point before proceed on sighting:

Make sure to latest BKC stack from here: http://pspv.intel.com/sites/PSIV-BKC/Reports/SitePages/Home.aspx

Make sure system has all mandatory rework as applicable – Use this link to know applicable rework-https://sharepoint.gar.ith.intel.com/sites/RVP_CrescentBay/CRB-RVP-CSF/SitePages/Home.aspx?RootFolder=%2Fsites%2FRVP%5FCrescentBay%2FCRB%2DRVP%2DCSF%2FShared%20Documents%2FBroadwell%20U%20%28ULT%29RVP&FolderCTID=0x012000FFD2A2D256C7284E9CA6E17A025EA5E5&View={2DC03B06-DDA9-4EAA-ABA6-CA8A28FBF446}

Make sure to use recommend bios settings – Refer this link to get recommended BISO settings http://bkclc1.amr.corp.intel.com:7076/api/service/1975925211/447894345.zip

Verify if there is similar issue reported in Sx_WG – Refer this link to get known issue list - http://pspv.intel.com/sites/pspv/broadwell/Lists/Blocking%20Issues/AllItems.aspx?ShowInGrid=True&View={6D9F3299-A5E6-4D20-B7B3-1FD542C2D19F}&InitialTabId=Ribbon.List&VisibilityContext=WSSTabPersistence

1.24Point of ContactPlease mail [email protected] for feedback/query.

24