4

I am searching a scientific database for abstracts of papers containing the words project management. Here is the link:

For getting abstracts, I need to click on any paper and open a new page. How can I do that for 68 papers? I program in R and bash.

Dawny33
  • 8,476
  • 12
  • 49
  • 106
Hamideh
  • 942
  • 2
  • 12
  • 22

2 Answers2

3

try RSelenium. with phantomjs since the date is requested and filled in by ajax calls. so any static web scraping tools wont work.

I managed to get the list on the first page.

http://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-headless.html

sample of what i managed to pull.

remove( mopub, m, run , rx, x , first1)
library(RSelenium)
pjs<- phantom( pjs_cmd="C:/Users/bhavin.patel/Downloads/phantomjs-2.0.0-    windows/bin/phantomjs.exe")
Sys.sleep(5)
remDr <- remoteDriver( browserName = 'PhantomJS')
dsurl <- "http://en.journals.sid.ir/SearchPaper.aspxstr=project%20management"
remDr$open()
remDr$navigate(dsurl)
allt3 <-remDr$findElements('id', 'Table3')
lapply( allt3 , FUN=function(dst){ dst$getElementText(); })

[[1]]
[[1]][[1]]
[1] " 1 :   EFFECTIVE FACTORS ON RURAL PEOPLE’S NON-PARTICIPATION OF     MAHABAD’S DAM CATCHMENT IN WATERSHED MANAGEMENT PROJECTS\nAuthor(s): RASOULIAZAR SOLEIMAN*,FEALY SAEID\nJournal: INTERNATIONAL JOURNAL OF AGRICULTURAL MANAGEMENT AND DEVELOPMENT (IJAMAD)\nNumber: MARCH 2015 , Volume  5 , Number  1 ; Page(s) 19 To 26.\nKeyword(s): NON-PARTICIPATION, CATCHMENT, WATERSHED MANAGEMENT, MAHABAD TOWNSHIP, IRAN\nReference(s):  (0)      Citation(s):  (0) FullText:"
0

Another workaround is to get the listing by POST requests using curl in bash.

You can get the curl post statement from Firebug ( Firefox F12 ) under Network , filter for XHR requests and copy the last statement which requests SearchPaper.aspx?str=project+management (right-click -> copy curl-adress).

In this post request statement you have to increase the parameter ctl00$ContentPlaceHolder1$txtPageNo to a desired pagination number (1-6 in this case).

Then parse the output to a static xml parsing tool to get your data.

Tomas Pazur
  • 134
  • 2
  • 7