Extract data from a list of URLs with similar layouts using Octoparse -- Advanced mode

Post on 11-Apr-2017

154 views 5 download

Transcript of Extract data from a list of URLs with similar layouts using Octoparse -- Advanced mode

Extract data from a list of URLs with similar layouts -- Advanced mode

www. octoparse.com

Step 1. Advanced Mode: Go to “Advanced Mode” “List of URL” “Start”.➜ ➜

Step 2. Complete basic information. Click “Continue” Click “Next”.➜ ➜

Step 3. Drag a "Loop Item” action and drop it into Workflow Designer.

Step 4. Click "Copy URLs". Paste the URLs in the textbox. ➜

Step 5. Enter a list of URLs with similar page structure. Paste the URLs in the textbox. ➜

Click "Save".

Step 6. Wait until the page loaded, extract the title and content of the first page. ➜ Click these two elements.

Click “Extract Text”. After extracting the elements of the first page, Octoparse will extract data with similar layout in other pages.

All the content will be selected in Data Fields. Click the "Field Name" to modify. Click ➜ “Next”

Click “Next”

Step 7. Click “Local Extraction”. “OK” to run the task on your computer. ➜Octoparse will automatically extract all the data selected.

The data extracted will be shown in "Data Extracted" pane.Click button to export the results to Excel file, databases or other formats and save the file to your computer.

Happy Data Hunting

www.octoparse.com