Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one...

17
Figure 1. Tutorial 1: Visual Basic Fishing for data Today, one of the main constrains in the development of science is not the lack of data but the lack of ideas for what to do with all the data that are out there. In this tutorial, you will learn a method to use Microsoft Excel and its companion “Visual Basic” to gather data from the internet. Most web-pages that host data use standard protocols to store the data and display them on the web. Identifying these standards will allow you to develop repetitive methods with the use of Visual Basic to gather such data. Let’s look at one example: The CIA World Factbook, for instance, contains several sources of data for each country in the world. By looking at their URL (Fig. 1) you can notice that every country has a different abbreviation. Albania, for instance, is “al” in the URL. So if you change “al” for “ag” in the URL,

Transcript of Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one...

Page 1: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

Figure 1.

Tutorial 1: Visual BasicFishing for data

Today, one of the main constrains in the development of science is not the lack of data but the lack of ideas for what to do with all the data that are out there.

In this tutorial, you will learn a method to use Microsoft Excel and its companion “Visual Basic” to gather data from the internet.

Most web-pages that host data use standard protocols to store the data and display them on the web. Identifying these standards will allow you to develop repetitive methods with the use of Visual Basic to gather such data. Let’s look at one example:

The CIA World Factbook, for instance, contains several sources of data for each country in the world. By looking at their URL (Fig. 1) you can notice that every country has a different abbreviation. Albania, for instance, is “al” in the URL. So if you change “al” for “ag” in the URL, you will then get the data for Algeria. So let’s imagine that you want to collect the data on GDP for all countries. You can do country by country

Page 2: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

but that could take you a few hours, plus the high potential for errors that you may miss a country or copy the data for the wrong country, etc. An alternative is to use Excell and Visual basic to make such a repetitive task.

1. Lets find out the abbreviations for all countries. For this, right click on the page and select “View page

source” (see Figure 2). That will show you all the background code used to display such a web-page. This is call the “back-end”, while what you see online is called “front-end”.

Figure 1

2. if you scroll down, you will find a code line with all the country names (see figure 3). That line of code is the back end for the box in Fig1, where you can choose all countries. Select that line of code, copy.

Figure 3

3. Now paste the data you copy in Excell. To do this, go to Excell, click on cell A1, right click and click paste. See figure 4.

Figure 4

4. As you will note, from Figure 4, all abbreviations from all countries are in a single line. What we want is to put all countries in a single column, each country occupying a single cell. This is so we can ask Excell to rotate the online searches for each row of the column with all countries. So to move each country in a cell,

Page 3: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

we need to break the data on that cell “A1”, by something that separates the names of all countries. If you look at the text on cell A1, you will notice that a good place to separate the names of the countries is the use of the “<” sign (Figure 5). And to brake the data on those places, we use the Excell Function “Text to columns” (see figure 5)

Figure 5.

5. So click on text to columns, then click on “delimited” and then “other” and type the “<” sign”. (Figure 6)

Figure 6

Page 4: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

6. That function will separate the full text for every time the “<” sign was used. Now lets move all that data into a single column. For this, select the cell “B1”, then press and keep pressing the keys “Shift”+”Ctrl”+”→” (that is the key with an arrow pointing to the right). That should select all of your cells with test, right click your mouse and select copy, then select cell “A1”, right click your mouse and select “paste special”, click “transpose”. Now the entire row will be in one column (Figure 7).

Figure 7.

7. Now we want to remove all the intermediate rows that are not necessary. For this, select column A, then click on “Sort” (Fig. 8). Because you are only sorting one column, Excell will ask you if may be this is an error, and if you want to expand your selection. So select “continue with current selection”. Click next on all following windows. Now all your data will be sorted, so select all the rows with data that you do not need, right click your mouse and select “Delete”. You should end up with a list of countries as shown in Figure 8.

Figure 8

Page 5: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

8. Now we need to further clean up the data, as we want the name of each country in one column and the abbreviation used by the CIA in the other column. For this, use “text to column” again, but use “>” for the place to brake the columns. You should end up with two columns of text like in Figure 9

Figure 9

9. As you will notice, we now have the name of the countries in one column as we wanted, but the abbreviations are still mixed with the data in column A (Fig. 9). So to get the countries abbreviations, Move column A to Column C (Right click on column A, click on cut (Fig 9), then select column C and click paste. Next delete column A, as it is empty. Your data should look like in Figure 10.

Figure 10

10. Now, let use “text to column” again to break the text in column B. So click on Column B, click on “text to columns”, click on “delimited”, and then “others” and type “/”. You should have the data as in Fig. 11

Figure 11

Page 6: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

11. Now you can see that the abbreviations and in column D. so delete columns B to C as they have data you do not need. For this, click on Column B, scroll you mouse until column C, right click and select “delete”.

See figure 12.

12. You will now have the country abbreviations in column B. but they still have unnecessary information. So use test to columns, again but this type separate by “.” For this, click on column B, click on text to columns, then delimited, then in “other”, and type “.” You should end up with the data as in figure 13. A list of country names in column A and their abbreviations in column B.

see Figure 13.

13. Now, the idea is to use Visual Basic to go to the internet, search for each country, get the data and bring it into Excel. So lets do this once before using the macro. 14. At the bottom of your workbook, select “Sheet 2” (Fig. 14), so that the data from the internet is added to that sheet, and not in our sheet1, where we have the names of the countries.

Figure 14.

Page 7: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

15. Then go to the tab “Data” (Fig 15), and click on “From Web”. This will allow you to import data from the web.

Figure 15.

16. Quickly, visit the web-page of the CIA world fact book, select one country, and copy the URL, and go back to excel. On the Box, that appeared after you click “From Web”, paste the URL in the area indicated in Fig 16, and click go. That will load all the data from that web-page in Excell.

Figure 16

17. Click Import (Fig. 17)

Figure 17.

Page 8: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

18. Select where you want your data to appear. For this example, select cell D5 (Fig. 18)

Figure 18

19. Now Excel will have imported all the data from the web and pasted it in your Sheet2, starting in cell D5 (Fig. 19).

Figure 19

21. So somewhere in sheet2 is the GDP data you want. To find out the row number, Click on “find and select” (Fig 20), type “GDP”, and click “Find Next”, and keep clicking until you find the row where the data you need is.

Figure 20

Page 9: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

22. In this case the data on GDP we need is in row 753 (Figure 21). The problem we have is that some countries may miss some of the data, so that row could be different for each country. So we need to allow excell a dynamic way to find the data on GDP, and one way to do it is using the function “Match”.

Figure 21

23. Match looks for a word in a sheet and returns the row where that word is found. So here we will look for “GDP (purchasing power parity):” because the number we want for each country is right below that text. So Double click on the cell that contains the tittle that the CIA uses to store the GDP, in this case cell D754” (it could be different in your case depending on the country you are using), Select all the content of that cell and then right click+copy (Fig. 22).

Figure 22

24. Now go to the top of your sheet 2, select cell A1, right click+paste (Fig 23)

Figure 23

25.Now let’s find that text in cell A1 in column D where all of our data are. So select cell B1, and type “=match(A1,B:D,0)” and then hit enter (Fig. 24). Basically, the function “Match” looks for the text you provided before the first comma (which is the text in cell A1), in the array you mentioned after the first comma (column D:D), and will look for an exact match (0). This function will also allow searching partial matches. But here we will be looking specifically for the cell containing “GDP (purchasing power parity)”.

Figure 24

Page 10: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

26. In this case the row containing the text in cell A1 is in row 754 (Fig. 25)

Figure 25

27. However, we do not need the row number where “GDP (purchasing power parity)” is. What we need is the number that appears right below it. So to do this, we will use the function “Index”, which returns the value at the intersection of any given column and row. Let’s do it. Select cell “C1” and type “=INDEX(D:D,B1+1)”. And hit enter (Fig. 26). As you will notice that give you the GDP of the country. Index will give you the value located in the cell with column D:D, and the row 754+1. The +1, is because the number we want is exactly one row below the row where the text “GDP (purchasing power parity)” is found. Finally, we got the data on GDP for one country. Now lets repeat all this tutorial from the start up to here for each one of >250 countries. Do not panic, Visual Basic is here to help.

Figure 26

28. Let’s first clean up the web-data you retrieved earlier for one country. For this, select row 3 (Fig 27), then press “Shift”+Ctrl”+”↓” (the key with an arrow pointing downwards). Right click and select “delete”. It will then ask you if you want to delete the retrieved data, click “Yes” (Fig. 27)

Figure 27

29. Now lets do a macro. Click on the “Developer” then on “Record Macro” (Fig. 28). From now on, everything you do in your computer will be recorded as code in Visual basic. If Macros are unavailable in your excell program follow this tutorial about activating it (http://www.isunshare.com/blog/how-to-enable-create-and-delete-macros-in-ms-excel-2010/)

Figure 28

Page 11: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

30. You will be asked to enter a name for your macro, and any other description. For now, we will use the default naming from Excel, so just click on “OK” (Fig. 29)

Figure 29

31. Now repeat step 15 to 18. You will now notice that after you imported the data from the web, the data on GDP for the given country will now appear automatically in cell C1 (Fig 30).

Figure 30

32. Now select cell C1, right click and select copy. Then, click on Sheet 1 (Fig.31)

Figure 31.

33. Select cell C1, right click and select “paste as values” (Fig. 32). We have now collected the data for one country. We need for excell to do this for all countries in our sheet 1.

Figure 32

Page 12: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

34. But before we adjust the macro, lets go back to sheet2 (Fig 33) to delete the retrieved data for this one country, as to avoid that the data from other countries get on top of each other.

Figure 33

35. To clean up the retrieved, select row 3 (Fig 34), then press “Shift”+Ctrl”+”↓” (the key with an arrow pointing downwards). Right click and select “delete”. It will then ask you if you want to delete the retrieved data, click “Yes” (Fig. 34)

Figure 34

36. Stop the macro by clicking on the blue square at the bottom left hand of your screen (Fig. 35)

Figure 35

37. You can test your macro by going to the tab “Developer”, click on “Macros”, select the macro you just recorded (“Macro1” in this case), and click run (Fig. 36). This will go to the internet, import the data from the one country, select the GDP, paste it in cell C1 in sheet 1, delete the retrieved data in sheet 2, and done.

Figure 36

Page 13: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

38. Now let’s make the macro go through each country in our sheet1, to get their GDP. For this, go to the tab “Developer”, click on “Macros”, select the macro you just recorded (“Macro1” in this case), and click edit (Fig. 37), which will take you to the visual Base editor.

Figure 37

39. In the code that was recoded for your macro, you will notice that the name of the country is shown in the highlighted in the red box in figure 38. What we need is to replace that abbreviation, for the abbreviations that we have in sheet1.

Figure 38

40. Replace the following lines of code:

Page 14: Visual Basic Fishing Data.…  · Web viewTutorial 1: Visual Basic. Fishing for data. Today, one of the main constrains in the development of science is not the lack of data but

41. Click Play (Fig 39), and let visual basic do the work for you. Your computer will be paralyzed while the macro is running so it is better not to use it for anything else. Depending on how fast your internet is, this could take a while. So it will be good to leave it working overnight.

Figure 39