What I’ve done
In this workshop, we try to use wedscraper to collect data.
POINT 1
Webscraper is a plugin. After the web page selection check, we need to find the last option in the pop-up box.
Introduction to Arduino as a Creative Hacking Tool
- Create a new sitemap.
- Set different parameters to obtain the wanted data.
- I try to set
next button
pagination, but it often showserror
POINT 3
- Export data
- There is no data after exported, even if it can be previewed normally.
- How can I select data that contains specified text?
More thoughts
In this Web Scraping workshop, I started to look at websites from the perspective of “a webpage is also data,” and using the developer tools to inspect HTML structure suddenly felt much clearer. When I tried using tools like OutWit Hub and WebScraper.io, I realised that scraping is more complicated than it looks. For example, some sites only load new content when you scroll down, but the tools couldn’t scroll automatically, so I could only capture whatever appeared on the first screen. The auto-generated selectors were also not always accurate, and I often ended up scraping empty fields or duplicated items. I later tried inspecting the code manually and adjusting the CSS selectors myself, which improved the accuracy; I also found that tools like Octoparse can simulate scrolling, making them more suitable for dynamic pages. Overall, this workshop made me realize that while tools are useful, the real key to successful scraping is understanding how webpages are structured and being willing to experiment when things don’t work. In the future, combining these tools with Python would likely help me scrape data more efficiently and accurately.
Reading references (Rogers, 2024)
- Digital methods are techniques for the study of societal change and cultural condition with online data. They make use of available digital objects such as the hyperlink, tag, timestamp, like, share and retweet, and seek to learn from how the objects are treated by the methods built into the dominant devices online, such as Google Web Search. They endeavour to repurpose the online methods and services with a social research outlook. Ultimately the question is the location of the baseline, and whether the findings made may be grounded online.
- In those cases, the data are good because they exist or have been captured from the beginning, cover long periods of time, and are complete, or mostly so. One knows the percentage of missing data. With the web much of the data is from a recent past, covers a short period of time and is incomplete, where there is often a difficulty in grasping what complete data would be.
- For other techniques virtual methods seek to overcome some difficulties inherent in the web as a site of study and data collection realm. When surveying, the question is how to find the respondents, and whether one knows a response rate. For sampling, similarly, there are questions about whether one can estimate the population of websites or Facebook pages on a given topic. The migration of methods online could be said to raise questions about the fit between the method and the medium.