How do I retrieve the documents I want to mine?
If you already know the DOI (or the PII, a proprietary Elsevier identifier) of the documents you want to mine, you can get them one by one
by calling this URL:
https://api.elsevier.com/content/article/doi/[DOI] or https://api.elsevier.com/content/article/pii/[PII]Notes:
X-ELS-APIKey: [APIKey]This is the APIKey that is created for you when you register your text mining project here.
https://api.elsevier.com/content/article/doi/[DOI]?APIKey=[APIKey]E.g. https://api.elsevier.com/content/article/doi/10.1016/j.ibusrev.2010.09.002?APIKey=665f15638156da2156b60a48095b4abc (please note that the APIKey in this example is not actually valid).
https://api.elsevier.com/content/article/doi/[DOI]?view=FULLThis may make client-side exception handling a little easier to code.
Accept: text/xml (for full XML) or Accept: text/plain (for stripped-down full-text)Alternatively, the parameter can be passed-in as a URL parameter:
https://api.elsevier.com/content/article/doi/[DOI]?httpAccept=text/xml or https://api.elsevier.com/content/article/doi/[DOI]?httpAccept=text/plainIt is possible to pass in multiple variables as a URL parameter - e.g. a combination of APIKey and request format:
https://api.elsevier.com/content/article/doi/[DOI]?APIKey=[APIKey]&httpAccept=text/plain
Where can I find the DTD for your XML articles?
Here: https://www.elsevier.com/author-schemas/elsevier-xml-dtds-and-transport-schemas
How do I select the corpus I want to mine?
Or, in other words: how to find out what documents are relevant? Ultimately, your corpus selection process will have to lead to a list of URIs for the documents you want to mine. Generating that list can generally be done in two ways:
- searching an index that returns a list of documents. This approach allows you to limit your corpus to documents that match certain search terms, such as keywords, author, date, etceteras. See more below.
- browsing a resource to find references to documents and collating those references in a list. This approach allows you to limit a corpus by the way it is referenced or structured on a site - for example by using citation links between documents to mine a set of documents and all the documents they reference. See more below.
How do I search for documents I want to mine?
Elsevier's own search index for ScienceDirect can be targeted through:
https://api.elsevier.com/content/search/scidir?query=[query].A request to this URL returns a list of documents matching the [query] with their basic metadata and their URIs to retrieve them from api.elsevier.com as well.
X-ELS-APIKey: [APIKey]- [query] can be any search query that is valid on the expert search form on www.sciencedirect.com.
https://api.elsevier.com/content/search/scidir?query=[query]&count=200 https://api.elsevier.com/content/search/scidir?query=[query]&count=200&start=201 https://api.elsevier.com/content/search/scidir?query=[query]&count=200&start=401 https://api.elsevier.com/content/search/scidir?query=[query]&count=200&start=601 etc.Also, the ScienceDirect search index will never return more than the first 5,000 or so (the exact number varies) of results for any given query. This means that with this request...:
https://api.elsevier.com/content/search/scidir?query=[query]&count=200&start=5001... you will likely hit the end of the available results set. This can be worked-around (to some degree) by making the query itself more restrictive; e.g. if your search is '?query='heart+attack', you can use a date limiter - '?query='heart+attack+AND+PUBYEAR(2012)' - to first collate the results from 2012, and then move on to PUBYEAR(2011), etc., all the while using the '&count=200&start=...' to stage through the results for each query.
Can I use other sources to help me select my corpus before I download it?
Of course! Here are some options:
- Google and Google Scholar: if you append "site:sciencedirect.com OR site:linkinghub.elsevier.com" to your query, you'll limit search results to content indexed on ScienceDirect.com. In the URLs of the results, you'll see an identifier like S0190962202001020 or B0741521410016381, which is the PII of that article that you can use to construct a request with to retrieve that article - see above.
- If you use A&I databases like PubMed or Web Of Science or Elsevier's own Scopus, they will often return the DOI for documents in results sets. You can use that DOI to construct a retrieval request to our full-text API {see above} to try to retrieve that document; if the document is not an Elsevier document or if you are not subscribed to it, you will simply get an error.
- CrossRef's Metadata services, while not available to everyone, allow metadata-based requests to retrieve DOIs.
- Many Elsevier journals also have their own separate sites, such as cell.com and thelancet.com; they use DOIs and PIIs to identify articles as well, and thus could help in identifying the corpus to mine.