This is a project to build filtering capabilities comparable to those of Muffin into Squid. It consists of a filtering framework and a set of filter modules. Currently available filters:
Usually, a filtering proxy runs standalone and does nothing but filtering. Users have to configure this proxy in their browsers, and if they use a caching proxy too, chain them after the filter. In situations where the user runs Squid anyway (mostly because of caching for different browsers or a small LAN), it is convenient to build this capability into Squid.
You need the Squid sources, everything for compiling them, GNU patch, autoconf 2.50 and automake 1.6.
gzip -cd squid-3.0stable9-filter-0.2.patch.gz | patch -p1
sh bootstrap.sh sh configure (options...) --enable-filters
--with-morefilters="/path/to/file.cc /path/to/other.cc.."
filter_module name [ arguments... ] [ * {allow|deny} acls... ]It tells Squid to define a filter of the given type. The filter modules can take arguments as documented for the individual modules. Arguments are separated with whitespace with the same quoting mechanisms as used elsewhere in squid.conf. A filter type can be specified in more than one filter_module line, in that case several filter instances with different parameters will be created. See below on chaining filters.
Each filter line can optionally take an ACL list. This must start
with an asterisk (surrounded by whitespace), followed by either the
keyword allow
or deny
, followed by one or
more ACLs defined before the filter line.
A filter with no ACL specification is applied to every request. A filter with an ACL specification is applied to each request which is denied by the ACL. In other words: an allowing ACL allows to bypass the filter.
There is a new option for the http_port
directive:
The flag nofilter
specifies that requests arriving on
this port will not be filtered. Effectively this makes a
filtering and a non-filtering proxy running at once, on different ports.
grep -E
syntax), one pattern per
line, against which the URI is matched. Blank lines and lines
starting with a "number sign" are ignored in the usual fashion.
Whenever a pattern file is changed, it gets reloaded at the next
request automatically, no reconfigure needed. A pattern is marked as
case-insensitive by prepending a dash. (To place a real dash at the
start of a pattern use a class, like [-]
). Patterns may
not contain literal TABs, use \t
instead.
There are two types of pattern files: simple lists and replacement lists.
sed s///
-like fashion. This type of pattern file is
used by the redirection filter. Each line in the file consists of
two elements separated by (at least) one TAB character. The
first is a pattern, the second a replacement. The replacement may
contain \1, \2... \9
references to parenthesized
subpatterns; \0
means the whole match and
\*
means the complete original URI. The replacement
may also contain \_0, \_1..., \_*
references which copy
the same subpatterns in modified base64 encoding (see below).
A special replacement can be given as a shortcut for patterns which have no explicit replacement. This default is specified as replacement for the pattern consisting of a single exclamation mark, which should be the first line in the file. Negative match does not work in a replacement list.
+ / =
(plus, slash, equals) replaced by - _ .
(dash,
underscore, dot) respectively. This leads to an URL-safe encoding of
request URIs or part thereof (may be useful for script-based
redirect result postprocessing).
request_header_replace
clause must be set up to filter out
the Accept-Encoding and Accept-Ranges request headers.request_header_replace Accept-Encoding identity request_header_replace Accept-Ranges noneSee below for the exact reason.
Currently there are the following filters:
SCRIPT
tags, on...
handlers and browser-specific ways of inserting Javascript into tag
attributes) from HTML pages. (For also blocking JavaScript files use
an ACL against the "application/x-javascript" file type.)
OBJECT
tags from HTML pages. The tags
are preserved, only the classid
parameter is replaced
by a dummy, so the page will still be processed correctly (as if by
a non-ActiveX browser).
This filter takes a pattern file as optional argument. This file
contains a list of CLSIDs which are allowed through.
Each content filter specifies the MIME content type(s) to which it
applies (like image/gif
for the gifanim module) and
ignores all other types.
Content filters can be chained. When more than one filter applies to a given MIME content type, every filter operates on the results of its predecessor.
.X.nofilter
to the
host name in the URL, where the X
is replaced by the
Squid's visible host name. Example: to get
http://www.example.com/foo/bar
unfiltered from a Squid
called squid.cache
, use the URI
http://www.example.com.squid.cache.nofilter/foo/bar
.
The NOFILTER tag as part of the hostname in the URL implies that correctly written relative links, including images, linked scripts etc. on the same server, will also be unfiltered. Apply the necessary caution.
Reason for the inclusion of the Squid's host name is to avoid
that web servers add the NOFILTER tag to their junk banner links
themselves. This works best when visible_hostname
,
unique_hostname
and the canonical (DNS) host name of
the proxy are all different and not too related, because the origin
server sees the latter two but not the former.
Since ".nofilter" is not a valid top level domain, it can't clash with real host names.
Another possible way to bypass filters is to use a non-filtering port, as described above. Requests arriving on that port will always bypass all filters.
A class diagram (created with ArgoUML) for the filter classes is here: http://sites.inka.de/bigred/devel/filter-patch.zargo.
PatFile
provides the pattern file facility described
above. It is included in the Squid core and described in
PatFile.h
.
debug_options
directive) are used:
Section 92 | Filter framework |
Section 93 | Filter modules |
Section 94 | Library modules (PatFile etc.) |
Level 1   | Error messages |
Level 3   | "Filter caught something" messages |
Level 4   | Initialization/finalization messages |
Level 5   | Initialization/finalization trace |
Level 8   | Minor trace |
Level 9   | Full trace (big!) |
script
applied to a file with compression encoding
can silently deliver corrupted files, but mostly this is caught by
the HTML parser not accepting null characters.)
For this reason, the Accept-Encoding headers should always be
filtered out with an appropriate header_replace
clause. The origin server gets forced to always send unencoded data
with Accept-Encoding: identity. Another
header_replace
which sets the Accept-Ranges header to
none causes the client to never try Range requests, which
obviously are unfilterable too.
The cache stores always unfiltered objects. Content filtering happens in the data path from cache or memory to the client. The filter object is expected to copy the data into a new buffer, so it can do anything with it including insertions and deletions.
The only exception to the rule that filtering happens only in the path to the client are those filters which alter the request. This applies to the redirect module.
In a cache hierarchy, a filtering cache should only be placed at the bottom, i.e. where only clients directly access it. If another cache sits between the filter and client, that one will cache filtered pages and break the NOFILTER feature.
load_module
directive has been replaced by
filter_module
with slightly different syntax.
nofilter_port
directive has been replaced
by the nofilter
option in http_port
.
acl allow_activex url_regex "/usr/local/squid/etc/allowlist_activex" filter_module activex * allow allow_activexThe
""
around the path tell the ACL to read its
patterns from a file. The syntax of this file should be compatible
with the old allow lists.
You have to reconfigure when this file is changed, however.
header_access
clauses (use Cookie and Set-Cookie
with ACLs for allow lists).
rep_mime_type
ACLs.
The Junkbusters web page has one of the oldest and best known web filters as well as a very comprehensive resources list covering most issues from "What is this all about?" to a list of filtering software (by now most of them are either for Windows or for pay or both, which indicates there is a real demand for filtering).
The latest release is filter 0.2 for Squid 3.0.STABLE9. Download at http://sites.inka.de/bigred/devel/squid-3.0stable9-filter-0.2.patch.gz.
For use and distribution of this package, the same terms and conditions as for the Squid package itself (i.e. the GNU General Public License) apply. Note, however, that using a version or installation setup which has the NOFILTER feature removed or restricted in any way is in gross contradiction to the author's intentions, and people who do so should feel guilty of abuse.