This seems to be the more appropriate URL:
http://hosts-file.net/ad_servers.txt.
I compared it to the original URL using opendiff and they are exactly
the same.
In my case there had been no whitespace in piholeIP (I configured it statically because 'hostname -I' returns multiple IP addresses, which ends in errornous hosts file), so it's more safe to add one between IP and hostname.
Further, I removed the obsolete latentBlacklist variable.
Due to so many users having issues with this list blocking legitimate
domains such as microsoft.com, apple.com, xkcd.com and more, I am
turning it off by default. While this drastically reduces the amount
of domains blocked, ad-blocking performance still seems to function
quite well.
Long-time users can simply uncomment the list, but this will make it so
new users have a more pleasant experience.
If you already have the mahakala list, you will need to remove it first
with this command:
sudo rm /etc/pihole/list.2.adblock.mahakala.is.domains
There are a few other lists I am looking at including, but this will
provide an immediate fix.
I'm not 100% sure if you can add multiple IP addresses to one domain on
a single line, so I just broke it up into two. It makes the file
bigger and longer, but I'll have to see how it works
Due to users having issues with the IP not being entered properly, I am restoring the original way to store `piholeIP` until a better solution can be formed.
This will increase the swap file to 500MB before downloading the lists.
Most of the issue comes from the mahakala list, which is so large. If
no swap file is found, one is created.
Prepend "^" to start of latentWhitelist.txt lines.
The -x switch requires a full line match of the regexp, where as -w
will try to find the match somewhere in the line, looking for work
breaks. Combined with turning the whitelist lines into full regexps,
this results in significantly faster parsing.
Having "^" prepended to the lines also keeps false whitelisting from
occuring, such as the following example:
If whitelist.txt contains "google.com" it would whitelist many other
sites that end in "google.com" as long as there is a non-word
character preceeding the google (such as "-", or ".").
manually running gravity.sh
This will print "Getting $domain list... " for each domain, followed
by either "Done" if data was received and validated, or "Skipping
list because it does not have any new entries" if no updates were
needed.
I also wanted to replace the for loop iterating over indices with
something like:
`for url in $sources[@]}`
It made the use of `$i` in the save location more annoying though.
Some people use a hostname other than raspberrypi, so their hostname
did not resolve to 127.0.0.1. I replaced that hardcoded value with a
variable so that does not happen.
I also added a few comments and minor formatting adjustments,.
Pushing files so they are available when the new article gets posted.
If the Pi's loopback is set in the hosts file, clients using it as a
DNS server will try to connect to their own loopback, which does not
have a Web server. So the real IP of the Pi is used. It is
recommended to use a static IP since this will be acting as a server.
Made one small change from some hard coded values to a variable.
Originally, I had this set to /run/shm (in RAM) but ran into errors
when the list reached 900,000 entries.
Then I moved it to /tmp.
Finally, I decided to just put the files in the pihole dir so they are
available after reboots. This will help with only downloading the
lists when absolutely needed--respecting the bandwidth of the people
serving the lists.
It is also possible to add addn-hosts=/path/to/hosts.conf within the
dnsmasq.conf file if you don't want to use hosts. For simplicity and
speed, I just use the regular hosts file.
Still need to get lighted to use IPv6. I am doing this because some
ads can get through using IPv6 if the IPv4 version is blocked. Also,
it seems to work fine as far as performance even though it doubles the
file size...
Also added a few comments for better documentation.