The Beginners Guide to Analyzing GSM Data in MatLab

9
The Beginners Guide to analyzing GSM data in MatLab 2007-03-02, The GSM Scanner Project (GSMSP) http://scratchpad.wikia.com/wiki/Gsm Abstract: The GSMSP uses the USRP hardware device to receive data from the GSM band. This data is raw and pretty useless unless filtered correctly. The GSMSP created a challenge to extract as much information as possible from 3 example USRP data dumps. Tore/Norway won the challenge. He used MaLab and routines from the GSMsim toolkit to extract meaningful data. His results are available on our webpage. Not everyone is familiar with MatLab. This document gives a step-by-step approach of how to analyze the example data dumps with MatLab. What Is MatLab? MatLab is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. It comes with a powerful scripting language (the famous .m files) that can be used to process/filter and visualize/display data. At the time of writing there is no public software GSM implementation. Tore decided to use MatLab because he was familiar with it and it gives quick results in a short time. MatLab is good for prototyping and processing small amount of data. The GSMSP is hoping to developer their own software GSM implementation to process GSM data in real-time once the theory is understood. Some Basic The USRP is a generic hardware receiver/transmitter (transceiver) for receiving data from any frequency band (http://www.ettus.com ). It is cheap ($750) and can be used to receive raw data from the GSM frequency band. The USRP runs at 64Mhz. This means that 64 million times the second it converts the signal (amplitude) on the frequency band from analog to digital. This is called Discrete Digital Signal Processing (there is more to it. Read more!). The GSM band is 25 MHz wide. It is divided into 125 carrier frequencies. This makes each carrier 200 kHz wide (25,000,000 / 125 = 200,000). Our sample rate has to be at least 400 kHz (after Nyquist’s theorem). A GSM burst period lasts for 15/26 msec (0.577 msec). Each burst lasts 156.25 bit. Only 148 bits carry data. The other 8 ¼ bit are used as ‘Guard Period’ and the end of each burst. This is to prevent overlapping of bursts. There is no useful information in them. This means that every 15/26/156.25 msec we have a new bit of information. Makes 270,833 bits per second ( 1 / (15/26/156.25)).

description

GSM analysis in MATLABGSM analysis in MATLAB

Transcript of The Beginners Guide to Analyzing GSM Data in MatLab

Page 1: The Beginners Guide to Analyzing GSM Data in MatLab

The Beginners Guide to analyzing GSM data in MatLab

2007-03-02, The GSM Scanner Project (GSMSP) http://scratchpad.wikia.com/wiki/Gsm

Abstract: The GSMSP uses the USRP hardware device to receive data from the GSM band. This data is raw and pretty useless unless filtered correctly. The GSMSP created a challenge to extract as much information as possible from 3 example USRP data dumps. Tore/Norway won the challenge. He used MaLab and routines from the GSMsim toolkit to extract meaningful data. His results are available on our webpage. Not everyone is familiar with MatLab. This document gives a step-by-step approach of how to analyze the example data dumps with MatLab.

What Is MatLab? MatLab is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. It comes with a powerful scripting language (the famous .m files) that can be used to process/filter and visualize/display data. At the time of writing there is no public software GSM implementation. Tore decided to use MatLab because he was familiar with it and it gives quick results in a short time. MatLab is good for prototyping and processing small amount of data. The GSMSP is hoping to developer their own software GSM implementation to process GSM data in real-time once the theory is understood. Some Basic The USRP is a generic hardware receiver/transmitter (transceiver) for receiving data from any frequency band (http://www.ettus.com). It is cheap ($750) and can be used to receive raw data from the GSM frequency band. The USRP runs at 64Mhz. This means that 64 million times the second it converts the signal (amplitude) on the frequency band from analog to digital. This is called Discrete Digital Signal Processing (there is more to it. Read more!). The GSM band is 25 MHz wide. It is divided into 125 carrier frequencies. This makes each carrier 200 kHz wide (25,000,000 / 125 = 200,000). Our sample rate has to be at least 400 kHz (after Nyquist’s theorem). A GSM burst period lasts for 15/26 msec (0.577 msec). Each burst lasts 156.25 bit. Only 148 bits carry data. The other 8 ¼ bit are used as ‘Guard Period’ and the end of each burst. This is to prevent overlapping of bursts. There is no useful information in them. This means that every 15/26/156.25 msec we have a new bit of information. Makes 270,833 bits per second ( 1 / (15/26/156.25)).

Page 2: The Beginners Guide to Analyzing GSM Data in MatLab

The USRP can give us up to 64,000,000 samples per second. That would be very precise but would be far to much data. The GSM data dumps that Robert created with gnu-radio and the USRP end in _128.cfile or *_64.cfile. Example: GSMSP_2007_robert_dbsrx_941.0Mhz_128.cfile This means that he used a decimation of 128. E.g. instead of receiving 64,000,000 samples he configured his USRP to receive 64,000,000 / 128 samples per second. This means 128 samples are merged into 1 sample. Let’s calculate if this is fast enough: We need at least 400,000 symbols per second. Roberts example file above has 64,000,000 / 128 = 500,000 symbols per second. Just enough! The GnuRadio M&M clock recovery blocks requires a minimum sample rate that is twice the symbol rate (270,833 / sec). That would have required at least 541,666 samples per second. This fact is irrelevant for MatLab. Next step is to oversample the signal so that we get a N * 270833 oversampled signal. This is because we want to feed the next stage with samples as close to the centre of the symbol interval as possible and we need some time resolution to do this. A large N gives good accuracy, but requires more processing power. We use 4 and get good results. 8 bursts are grouped into 1 TDMA frame (Burst 0 - Burst 7. GSM people like to call them TS0 - TS7). 1 burst lasts 15/26 msec and 1 TDMA frame lasts 8 * 15/26 = 120/26 msec. If we found one TS0 burst we get another TS0 burst after we waited 7 more bursts (TS1 – TS7). If we received 51 of these TS0 bursts we call it a 51-multiframe. 51-multiframes are used on the Control Channel (TS0). The other channels are either control channels or traffic channels (TCH) depending on the configuration. 26-multiframes are used on the TCH. If we are only interested in the 51-multiframe we have to read 408 bursts to get a full TS0 51-multiframe (51 * 8 = 408, 51 * 7 of the read frames are TCH bursts from TS1 – TS7). Our quest should be how to find the start of a 51-multiframe. This is how a 51-multiframe looks like:

FSBBBBCCCCFSCCCCCCCCFSCCCCCCCCFSCCCCCCCCFSCCCCCCCCI

F: FCCH (Frequency Correction Channel) S: SCH (Syncronisation Channel) B: BCCH (Broadcast Control Channel) C: CCCH (Control Channel) I: Idle (nothing) We get a FCCH burst ever 10 bursts on the TS0. This is always followed by a SCH burst. Remember that these are bursts only from TS0. In fact if we find a FCCH burst we have

Page 3: The Beginners Guide to Analyzing GSM Data in MatLab

to wait 7 more bursts (TS1 – TS7) before we get another burst from TS0 which then has to be a SCH. The information from the SCH allows us to calculate the Frame Number (FN). For each burst the FN is incremented by 1 until it reaches 8 * 26 * 51 * 2048. It then starts at 1 again. If we modulo 51 the FN we know exactly if the current burst is at the beginning of a 51-multiframe or somewhere else. The SCH also tells us about the Base Station Color Code (BCC) and the Network Color Code (NCC). Note: The above 51-multiframe is one out of many configurations how it can be structured. It’s up to the base station. The all start with FSBBBBCCCC and end with an I. The base station tells the MS in the BCCH message (4 BCCH bursts = 1 message) how the rest of the 51-multiframe is structured. Oh, and there can be 51-multiframes on TCH (TS0 – TS7) as well – more to that later. MatLab Extract the examples files into c:\gsmsp. Start MatLab. You should see a ‘Command Window’ on the right. Change to the directory that contains all the files by typing To get started, select MATLAB Help or Demos from the Help menu. >> cd c:\gsmsp >> dir Output: . .. DeMUX.m GSMSP_20070204_robert_dbsrx_941.0MHz_128.cfile GSMSP_20070204_robert_dbsrx_953.6MHz_128.cfile GSMSP_20070204_robert_dbsrx_953.6MHz_64.cfile T_SEQ_gen.m calc_freq_offset.m […more data here…] Open step1.m to get a feel of how MatLab scripts look like. Take a look at step1.m and you will see that it loads GSMSP_20070204_robert_dbsrx_953.6Mhz_128.cfile and sets the sample rate to 500,000 (64 Mhz / 128).. 953.6 MHz is within the European GSM band for the downlink (Base Station (BS) to Mobile Station (MS)). Uncomment the line 6-10 in step1.m if you want to load a different dump file. >> step1 Output: fcch_start =12962 frequency_offset_before_Hz = 8.6956e+003

Page 4: The Beginners Guide to Analyzing GSM Data in MatLab

frequency_offset_after_Hz =7.3344 The script also plots some graphics:

Wow, a Frequency Correction Control Channel burst (FCCH) was found at position 12962 in the example dump file. This is before over sampling with 4. Again, take a look at step1.m to see how the FCCH is found. It also calculates that the frequency offset it 8695 Hz. This is a hardware issue with the USRP. The frequency offset tells us that the frequency is shifted by 8695 Hz. We can compensate this in software (see step1.m). After the correction it’s only around 7Hz (7.3344 Hz). This is good enough for the GSMsim demodulator to work with. A FCCH is always followed by a SCH burst. Finding the SCH should be our goal in step2.m. We oversample by 4 so the fcch_start after oversampling is at 4 * 12962 = 51848. We expect the SCH burst 8 bursts (or 156.25 * 8 bits) later (skip TS1 – TS7). Because we oversampled by 4 we have to skip 156.24 * 8 * 4 = 5000 positions. We expect the SCH burst at position 51848 + 5000 = 56848. Nothing is precise and we might find the actual SCH a couple of positions before or after.

Page 5: The Beginners Guide to Analyzing GSM Data in MatLab

Take a look at step2.m and you will see that we start looking for the SCH at around position fcch_start + 5000. Step2.m helps us finding the exact location of the SCH burst. >> step2 Output: sync_burst_start =56850 BCC =0 PLM =7 FN =857107 The first synchronization burst is found at 56850. This is only 2 positions later to what we calculated above! The frame number can be calculated and is 857107 for the first SCH burst. The frame number is important and will later on help us to figure out what type of burst we have (CCCH, SCH, BCCH, …). The SCH burst also contains the Base Station Identity Code (BSIC) to be 56 (PLMN color = 7, BS color = 0 makes BSIC = 56). See 3GPP standard GSM_44.018:9.1.30a for more information how to decode the BSIC. Press space to continue the script. The script will look for the FCH 9 TS0-bursts later and then again find the SCH burst 1 TS0-burst later. A second SCH burst is found: sync_burst_start=106850 BCC =0 PLM =7 FN =857117 It contains the same information as the first SCH burst. Press space again to find a third SCH burst: sync_burst_start =156849 BCC =0 PLM =7 FN =857127 Press space once more to finish the script. No further SCH bursts are found. We found the FCH and the SCH. To calculate if the next burst is a BCCH or a CCCH we have to modulo the FN by 51: 857127 mod 51 = 5. It’s 5 bursts into a 51-multiframe which makes it a BCCH. Let’s execute step3.m! >> step3 Output:

Page 6: The Beginners Guide to Analyzing GSM Data in MatLab

rx_burst = Columns 1 through 13 0 0 0 1 0 1 0 1 1 0 0 1 1 Columns 14 through 26 0 1 1 1 1 0 1 0 0 1 0 0 1 Columns 27 through 39 0 0 0 0 0 1 1 0 1 0 0 0 1 Columns 40 through 52 1 0 0 0 0 1 1 0 1 1 0 0 1 Columns 53 through 65 1 1 0 0 1 0 0 0 1 0 0 1 0 Columns 66 through 78 0 1 0 1 1 1 0 0 0 0 1 0 0 Columns 79 through 91 0 1 0 0 1 0 1 1 1 1 1 1 0 Columns 92 through 104 0 1 1 0 0 1 0 1 1 0 1 0 1 Columns 105 through 117 1 0 0 0 1 1 1 0 0 1 0 0 0 Columns 118 through 130 0 0 1 0 0 1 0 1 0 1 1 1 0 Columns 131 through 143 1 1 0 1 0 0 1 1 1 0 0 0 1 Columns 144 through 148 0 1 0 0 0 This is the output of 1 burst. A BCCH or a CCCH message spans over 4 bursts. Press 3 times space to process the other bursts. We can calculate from the frame number (FN) if it is a BCCH or a CCCH. The next output shows the final message: Checksum correct! frame_number_mod_51 =5 Channel type = BCCH message = 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 0 1 0 1 1 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0

Page 7: The Beginners Guide to Analyzing GSM Data in MatLab

Each row shows bit 0 to bit 7 from left to right. We see 23 rows of 8 bit. This is a System Information Type 3 message. See 3GPP standard 44:018. It tells us the the country is 272 (Ireland) and the network operator is 02 (“O2 / Digifone mmO2”) and many more details. Frame number 5 means step2 analyzed frame 2,3,4 and 5 of the 51-multiframe. Press space 5 more times to see 4 CCCH bursts and the full CCCH message. Checksum correct! frame_number_mod_51 =9 Channel type = CCCH message = 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 Frame number 7-9 in the 51-multiframe is a PACH (Paging and Access Grant Channel) burst. This is an empty paging fill message (see 3GPP 04.06:5.4.2.3). Step3.m skips over the FCH and SCH and shows us another PACH message after pressing space 5 more times:

Page 8: The Beginners Guide to Analyzing GSM Data in MatLab

Checksum correct! frame_number_mod_51 =15 Channel type = CCCH message = 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 This message is from TS0 burst 12 – 15 of the 51-multiframe. It’s again an empty page message. Press space 5 more times for the message in burst 16 – 19: Checksum correct! frame_number_mod_51 =19 Channel type = CCCH message = 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0

Page 9: The Beginners Guide to Analyzing GSM Data in MatLab

1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 (see 3GPP 04.07:11.2.3.1.1 for more information.) 1st octet: L2 Pseudo Length octet: first two bits are reserved (10). 101000 = 5 bytes of layer 3 data follows. (Remember that the bits are in reverse order, the lowest bit first). 2nd octet: 0 1 1 0 means it’s a radio resource message. The next 4 bits are unused (0 0 0 0) 3rd octet: Message type (0x21) = Paging Request Type 1 4th octet: Page mode = 0, Normal 5th octet: channel needed = 1 6th octet: Mobile identity = 0 The other rows are filled with ‘fill bits’. Press space 3 more time and the script hits the end of robert’s dump file. That’s it. Edit the start of step1.m to process the other two dump files! Regards, The GSMSP Team http://scratchpad.wikia.com/wiki/Gsm