{"id":349,"date":"2013-07-09T10:33:09","date_gmt":"2013-07-09T15:33:09","guid":{"rendered":"http:\/\/www.marcblase.com\/blog\/?p=349"},"modified":"2019-08-27T09:33:47","modified_gmt":"2019-08-27T14:33:47","slug":"download-css-images-with-wget","status":"publish","type":"post","link":"https:\/\/ma.rcbla.se\/blog\/2013\/07\/download-css-images-with-wget\/","title":{"rendered":"Download CSS images with WGET"},"content":{"rendered":"<p>Sometimes you just don&#8217;t have the access you need when working on a site. While the thought of downloading all the included files individually sounds like fun, there&#8217;s always a better use of time. So I did some research and found that the best solution is via the command line using WGET. Here&#8217;s the command:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nwget --page-requisites http:\/\/site\/path\/page.html\r\n<\/pre>\n<p>If you need to download a whole site, give this a go:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nwget -m -p -E -k -K -np http:\/\/site\/path\/\r\n<\/pre>\n<p><small>Thanks stackoverflow users for this <a href=\"http:\/\/stackoverflow.com\/questions\/1581551\/download-webpage-and-dependencies-including-css-images\" target=\"_blank\" rel=\"noopener noreferrer\">solution<\/a>.<\/small><\/p>\n<p>UPDATE:<\/p>\n<p>I ran into an issue in trying to backup a WP site for a client where the original dev took it hostage and wouldn&#8217;t release it. They had turned on the &#8220;Discourage search engines&#8221; option in the admin and <code>WGET<\/code> was failing. So here&#8217;s a new method to circumvent a robots.txt file with <code>Disallow: \/<\/code>.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nwget -e robots=off -k -K  -E -r -l 10 -p -N -F --restrict-file-names=windows -nH http:\/\/DOMAIN.TLD\/\r\n<\/pre>\n<p>UPDATE:<\/p>\n<p>I ran into an issue with a client site hosted on a SaaS Member Solutions website. Some of the assets\/URLs\/etc if not used\/placed properly would use the clients proxy URL for the service, eg. https:\/\/client-service-url.com &#8212; so when running the above WGET commands anything that used the proxy URL would not be downloaded. Thus spawned the following which uses the <code>-D<\/code> flag.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nwget -e robots=off -k -K -r -l 3 -E -p -N -F --restrict-file-names=windows -D CLIENT_DOMAIN.com,CLIENT_VARIANT_DOMAIN.com -w 1 -nH https:\/\/CLIENT_DOMAIN.com\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes you just don&#8217;t have the access you need when working on a site. While the thought of downloading all the included files individually sounds like fun, there&#8217;s always a better use of time. So I did some research and found that the best solution is via the command line using WGET. Here&#8217;s the command: [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[51,3,34],"tags":[],"class_list":["post-349","post","type-post","status-publish","format-standard","hentry","category-cli","category-discoveries","category-networking"],"_links":{"self":[{"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/posts\/349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/comments?post=349"}],"version-history":[{"count":7,"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/posts\/349\/revisions"}],"predecessor-version":[{"id":624,"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/posts\/349\/revisions\/624"}],"wp:attachment":[{"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/media?parent=349"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/categories?post=349"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ma.rcbla.se\/blog\/wp-json\/wp\/v2\/tags?post=349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}