Penelope Bilton wrote:
<snipped>
Example:
tt <- "https://www.gofundme.com/f/baba039s-brave-battle-i-can-breathe"
> html_nodes(read_html(tt),".a-image--background")
{xml_nodeset (1)} [1] <div class="a-image a-image--background" style="background-image:url(https://images.gofundme.com/OBX ...
I can't get any further than this. I don't know how to extract the information from the html_nodes object.
***
Hello,
I think the R package xml2 will help here.
First save the html_nodes to an object:
> nodes <- html_nodes(read_html(tt),".a-image--background")
Then look at its attributes:
> xml_attrs(nodes)
[[1]]
class "a-image a-image--background"
style
"background-image:url(https://images.gofundme.com/OBX8u6ExqYkPs9mGp_zXCI-VYY4=/720x405/https://d2g8igdw686xgo.cloudfront.net/28360424_15211039920_r.jpeg)"
It seems you want the content under attribute "style":
> xml_attr(nodes,"style")
[1] "background-image:url(https://images.gofundme.com/OBX8u6ExqYkPs9mGp_zXCI-VYY4=/720x405/https://d2g8igdw686xgo.cloudfront.net/28360424_15211039920_r.jpeg)"
This is just text and you can proceed to extract information with your favourite regexp method. Does this help?
Regards,
Jason