I would like to do some data analysis of journal articles. I really just need the titles of the articles, perhaps the abstracts as well. Essentially, I'm analyzing how citation counts, "trendy topics", etc. change over the years and from one journal to the next.
The problem is that I'm not sure how to get a list of articles. Journal article titles and abstracts are normally available for free; I'm definitely not going to do any illegal downloading here.
Currently, I am using Google Scholar and a Julia script to scrape data from search results. This has a few problems:
- Inexact. Some articles are missing from the list; the number of search results seems to vary quite a bit.
- Messy. Google has rate-limiting measures in place (CAPTCHAs). Coding around these is difficult and time-consuming.
- Not allowed. While not illegal, this activity is of course against Google's Terms of Service, so I can't publicize any results I produce.
I really don't want to contact each journal individually or scrape their websites individually because that's very time-consuming (and each website is designed differently so I'd have to write a new script for each one), and there's a lot to go through. What I'm looking for is a service with an already existing API or list that already exists that I can obtain.
Aucun commentaire:
Enregistrer un commentaire