

There is a simple way to test what happens in cloud: to extract the outer HTML code of the whole website page. Octoparse cloud extraction process cannot come into our sight like local extraction. Some of the web scraping tools require you to have some programming skills in order to configure an advanced scraping, for example, Apify. How do I know what causes the cloud extraction failure? Some tools like Octoparse, provide scraping templates and services which are a great bonus for companies lacking data scraping skill sets, or who are reluctant to devote time to web scraping. So when using Octopare cloud extraction, please make sure you are extracting a site that will not be redirected according to IP locations.Įven the website would not be redirected, the source code can also be changed a little bit in a different browser under different network condition. The design of the sites for different locations is totally different. There is the case that website design is different in the cloud causing the extraction failure.įor example, when you open with an IP from China, the page would be redirected to Sephora.cn. It needs to recognize the HTML code to know what data to extract.
#OCTOPARSE NOT IN ORDER HOW TO#
To solve this, you will need to go through the log in steps once again by adding in the proper actions in order to obtain and save the updated cookie.(Check out how to save cookie )Ĥ) The website HTML design is different when opened in the cloudįor Octoparse, to extract the web data is actually to pick up content from the source code/HTML file. The saved cookie always has a valid time and will no longer work when it gets expired.
#OCTOPARSE NOT IN ORDER VERIFICATION#
Such verification like captcha, is not resolvable in cloud extraction. Many websites would ask for a verification before you log in. If you set up login steps or save cookies in a task to scrape a website, local extraction would work perfectly but cloud extraction may fail due to different IPs rotate while executing. IP blacklisted due to too frequent scraping can be resolved by adding wait time to slow down the extraction, but the restriction to IP location currently is a remained issue as all of Octoparse cloud IPs are based in the United States. Some websites may even block all the IPs of one location, for example, a Japanese website may not be opened in Canada. They may limit the times IPs can access during a certain time and block any IP that exceeds the limitation. Many websites apply anti-scraping technique to avoid being scraped. So if you find no data extracted, please try increasing the timeout for the "Go To Web Page" action.Ģ) Cloud IPs are restricted to accessing the website due to heavy scraping frequency. When you test the website in a local computer, the loading time may be shorter than that in the cloud. Website loading time depends on the internet condition and the website itself. Sometimes when executing your task in the cloud after a test run with local extraction, you may encounter no data extracted.īelow are some of the main reasons why no data is returned:ġ) The target website fails to load completely or the data to be extracted are not loaded The latest version for this tutorial is available here.
