Java Web Crawler: Difference between revisions

From Chorke Wiki
Jump to navigation Jump to search
Line 1: Line 1:
  A web crawler, or spider, is a type of bot that's typically operated by search engines like Google and Bing. Their purpose is to index the content of websites all across the Internet so that those websites can appear in search engine results.
  A web crawler, or spider, is a type of bot that's typically operated by search engines like Google and Bing. Their purpose is to index the content of websites all across the Internet so that those websites can appear in search engine results.
==Selenium Docker==
{|
| valign="top" |
<source lang="bash">
docker run --detach \
--publish 4444:4444 \
--hostname firefox \
--name firefox \
--shm-size 2g \
selenium/standalone-firefox:80.0
</source>
<code>'''--OR--'''</code>
<source lang="bash">
docker run --detach \
--publish 4444:4444 \
--hostname firefox \
--name firefox \
--volume /dev/shm:/dev/shm \
selenium/standalone-firefox:80.0
</source>
http://localhost:4444/wd/hub
| valign="top" |
<source lang="bash">
docker run --detach \
--publish 4444:4444 \
--hostname chrome \
--name chrome \
--shm-size 2g \
selenium/standalone-chrome:85.0
</source>
<code>'''--OR--'''</code>
<source lang="bash">
docker run --detach \
--publish 4444:4444 \
--hostname chrome \
--name chrome \
--volume /dev/shm:/dev/shm \
selenium/standalone-chrome:85.0
</source>
http://localhost:4444/wd/hub
| valign="top" |
<source lang="bash">
docker run --detach \
--publish 4444:4444 \
--hostname opera \
--name opera \
--shm-size 2g \
selenium/standalone-opera:71.0
</source>
<code>'''--OR--'''</code>
<source lang="bash">
docker run --detach \
--publish 4444:4444 \
--hostname opera \
--name opera \
--volume /dev/shm:/dev/shm \
selenium/standalone-opera:71.0
</source>
http://localhost:4444/wd/hub
|}


==References==
==References==

Revision as of 08:03, 9 October 2020

A web crawler, or spider, is a type of bot that's typically operated by search engines like Google and Bing. Their purpose is to index the content of websites all across the Internet so that those websites can appear in search engine results.

Selenium Docker

docker run --detach \
--publish 4444:4444 \
--hostname firefox \
--name firefox \
--shm-size 2g \
selenium/standalone-firefox:80.0

--OR--

docker run --detach \
--publish 4444:4444 \
--hostname firefox \
--name firefox \
--volume /dev/shm:/dev/shm \
selenium/standalone-firefox:80.0
http://localhost:4444/wd/hub
docker run --detach \
--publish 4444:4444 \
--hostname chrome \
--name chrome \
--shm-size 2g \
selenium/standalone-chrome:85.0

--OR--

docker run --detach \
--publish 4444:4444 \
--hostname chrome \
--name chrome \
--volume /dev/shm:/dev/shm \
selenium/standalone-chrome:85.0
http://localhost:4444/wd/hub
docker run --detach \
--publish 4444:4444 \
--hostname opera \
--name opera \
--shm-size 2g \
selenium/standalone-opera:71.0

--OR--

docker run --detach \
--publish 4444:4444 \
--hostname opera \
--name opera \
--volume /dev/shm:/dev/shm \
selenium/standalone-opera:71.0
http://localhost:4444/wd/hub

References