Prepared By Visolve Squid Team
Abstract
Introduction
What is Transparent Caching?
How to implement transparent caching using Squid?
Policy Based Routing
Smart Switching
How Switch Operates
Comparison of L4 and L7 Switches
Squid Box as a Gateway
About IP chains
About IP tables
Squid in transparent mode
Hardware Recommendations
Comparative Study
Conclusion
About Visolve.com
Internet traffic is growing at a phenomenal rate, and such rapid increase in network traffic has created numerous networking challenges for ISPs and enterprises, like increased bandwidth cost for ISP's, bandwidth congestion, poor response time for end users and cost-efficient network / customer base scalability. The most efficient solution to these networking problems is to use your existing network infrastructure to localize traffic patterns, enabling content requests to be fulfilled locally. Increased speed/ decreased latency without the cost of additional bandwidth has catapulted caching software and appliances into a prominent place among the fastest growing segments of Internet technology.
A transparent cache is so named because it works by intercepting the network traffic transparently to the browser. In this mode, the cache short-circuits the retrieval process if the desired file is in the cache. Transparent caches are especially useful to ISPs because they require no browser setup modification. Transparent caches are also the simplest way to use a cache internally on a network, because they do not require explicit coordination with other caches. The purpose of this white paper is to discuss the various methods of implementating transparent caching using Squid on Linux with a policy based router, an externalL4 switch, and an L4 switch inside the Linux Squid box. First, some basic concepts will be discussed, followed by the advantages of transparent caching, and finally redirecting packets to Squid using IP-Chains.
The full explanation about the term "Transparent Caching and Transparent Proxying" depends on the context, but we can assume the context here is HTTP proxy/caches with transparent hijacking of port 80, which is the default HTTP traffic in the internet.
The difference is that the cache includes a cache, while the proxy only proxies without caching. The term transparent is overloaded, having different meanings depending on the situation. To some it means a setup that hijacks port 80 traffic where the client tried to go to other servers, to some it means a semantically transparent proxy that does not change the meaning or content of requests/replies. There is no such thing as a truly transparent proxy, only semitransparent and certainly not such a thing as a truly transparent cache. Squid can be configured to act transparently. In this mode, clients are not required to configure their browsers to access the cache, but Squid will transparently pick up the appropriate packets and cache requests. This solves the biggest problem with caching: i.e. getting users to use the cache server.
Advantages of Transparent Caching
As might be expected, the advantages and disadvantages of transparent caching are largely the reverse of those cited for proxy caching. In the advantages category we have the following :
Disadvantages of Transparent Caching
How to implement transparent caching using squid?
Transparent caching can be implemented by three ways.
Transparent caching using policy based routing
This arrangement uses a router to route WWW traffic (via policy routing) to the Squid cache box. Because the router can change only the IP address of a packet, the Squid Linux box must be configured to redirect the destination port of the packet. The Router policy redirects packets with port 80 to the Squid box and redirects other traffic to the Internet directly. To set the router policy rules, refer to your router's manual. Using the IP-Chains tool in the Squid box, one can redirect packets which are sent by router to the Squid application. See later chapters for more details about configuring IP-Chains. Since some routers (e.g. Cisco series) do not recognize Squid cache failures, if Squid does malfunction, service to the WWW breaks. To overcome this problem, a cache guard (a Perl script running on the computer inside serviced network) can be use to regularly query the Squid box for a cached object. When the cache guard fails repeatedly to retrieve the object from the cache, the cache guard changes the router configuration (by SNMP) to pass the WWW traffic directly to the Internet. In this way, a fail over strategy can be implemented
.
Transparent caching using smart switching
This arrangement uses a Layer 4 or Layer 7 router to route WWW traffic to the transparent Squid cache. Because the router can change only the IP address of a packet, the Squid Linux box must be configured to redirect the destination port of the packet. Both L4 and L7 switches intercept outgoing traffic and pass HTTP requests, typically port 80 traffic, to the squid proxy server that the switch is configured to recognize. The switch forwards non- HTTP traffic to other destinations. The architecture shows a switch passing HTTP traffic to the Squid proxy server and non-HTTP traffic to the Internet
L4 and L7 switches derive their names from the level of the Open Systems Interconnection (OSI) Reference model at which they operate. The capabilities of these switches are determined by the layer in the OSI model at which they operate.
An L7 switch has the same features that an L4 switch has, plus additional, more sophisticated features, as described in this section.
Similar features
How the L7 switch is different
Performance comparison between L4 and L7 switches :
This setup is used in small LAN or WAN where number of clients are less. Here it is mandatory to configure Squid box as a default Gateway in all machines. This method requires more configuration in the Squid box as compared to the other methods.
Squid box Configuration
Steps to be followed to implement Transparent caching.
pchains is an extremely powerful program that allows the user to set up complex IP filtering and accounting rules.
Purpose : To set up a firewall in the Squid/Linux box with the minimal options needed for transparent proxy. Here is the simplest method.
Details : Make sure that the following options in the kernel are enabled.
Else you must recompile the kernel. Also, make sure IP-forwarding is enabled in the kernel using the following command.
cat /proc/sys/net/ipv4/ip_forward
This should return 1. Else, do the command
echo 1 > /proc/sys/net/ipv4/ip_forward
PORT Redirection
The following command enables transparent caching :
ipchains -A input -j REDIRECT 3128 -p tcp -s 0.0.0.0/0 -d 0.0.0.0/0 80
This command redirects all the requests, irrespective of source IP Addresses, with destination port 80 to destination port 3128 in which Squid (in Transparent mode) is running.
IP-Masquerading
This is essential when the third method is implemented, where as this is not applicable for the other two methods. When squid is in transparent mode, the local network will not be able to access other protocols available on Squid. Squid supports ftp, gopher, and https only when clients of each specified protocol are aware of the cache. Hence this is not possible when squid is in transparent mode. Here, IP-Masquerading can be used to enable access to other protocols. The following are the rules for masquerading the protocols SMTP, FTP, POP, SSH, TELNET, and HTTPS. Here, assume the Squid box is connected to the Internet through a router using the eth1 interface.
ipchains -N good-bad
// New-User defined Rule is declared
ipchains -A forward -s 172.16.1.0/24 -i eth1 -j good-bad
// good-bad rule is added to the ipchains rule table. This is forwarding all the requests coming from the source 172.16.1.1 -254 to the interface through which internet is connecting to.
// In the following set of lines define the user defined rule good-bad
ipchains -A good-bad -p tcp -dport ssh -j MASQ
ipchains -A good-bad -p tcp -dport telnet -j MASQ
ipchains -A good-bad -p tcp -dport ftp -j MASQ
ipchains -A good-bad -p tcp -dport smtp -j MASQ
ipchains -A good-bad -p tcp -dport 110 -j MASQ
For more information about ipchain configurations visit us at https://www.visolve.com/squid
The iptables module ( for kernel 2.4.x series and above) which is a part of the Netfilter framework is a good upgrade of old ipchains( for kernel 2.2.x).
Kernel setup
To run the pure basics of iptables the following options are to be configured into the kernel :
CONFIG_PACKET
CONFIG_NETFILTER
And of course your interfaces are needed to be configured properly to work, ie. Ethernet, PPP and SLIP interfaces. The following are to be set in the kernel if more advanced options are needed :
Port redirection
iptables -t nat -A PREROUTING -p TCP --dport 80 -j REDIRECT --to-port 3128
The above rule redirects port 80 requests, irrespective of source ip address to port 3128 (or whichever port in which squid is running in transparent mode).
IP-Masquerading
iptables -t nat -A POSTROUTING -p TCP -s 0/0 --dport 21 -j MASQUERADE
iptables -t nat -A POSTROUTING -p TCP -d 0/0 --dport 20 -j MASQUERADE
iptables -t nat -A POSTROUTING -p TCP --dport 25 -j MASQUERADE
iptables -t nat -A POSTROUTING -p TCP --dport 110 -j MASQUERADE
iptables -t nat -A POSTROUTING -p TCP --dport 22 -j MASQUERADE
iptables -t nat -A POSTROUTING -p TCP --dport 23 -j MASQUERADE
The above rules are essential when we connect modem or squid is in between two different network to make TELNET, FTP, SMTP, POP, HTTPS to communicate to INTERNET.
To Run Squid in a transparent mode, enable the following directives in Squid.conf.
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
The httpd_accel_port directive tells which port the origin server is listening on (port 80). Squid does not need to know how requests arrive at its listening port (3128). This must be done by the operating system or router. Squid sees a request for a URL and connects to port 80 on the server where it thinks the URL resides. Squid does not have any control over what types of request arrive to it. If Squid is listening to port 3128 then it assumes the data arriving there is a protocol it can handler (HTTP, FTP, etc). The type of packets that are redirected to Squid is determined entirely by the TCP/IP implementation of the host (i.e. ipchains forwarding) and is out of Squid's control.
Processor :Intel P3 550MHz CPU
Hard Drive :For high performance and stability, a SCSI disk is highly recommended or use UDMA 66 Drive instead of IDE Disk. Typically 9 GB Disks are preferred.
Ethernet :High performance Ethernet is preferred.
RAM :For every 1 GB cache, 10 MB of RAM is required. For the above case, Minimum of 300 MB is required preferably 512 MB RAM.
Policy based routing
Advantages:
Policy based routing
Disadvantages :
Using smart switching
Advantages :
Disadvantages:
Squid box as a Gateway
Advantages:
Disadvantages:
This paper has outlined the various methods of implementing Transparent Caching using Squid. Each of these methods has its advantages, the choice is left to the implementation team which has to decide based on their network, data access pattern, volume of data, request rate, criticality and budget available. Web caching is a matured technology and Squid is very widely used web caching application, the choice and method of implementation as said may vary, although other features present in the implementation may continue or be enhanced, the underlying fundamentals will be the same as those discussed here. There are other tools available to supplement the system like reporting tools, configuration and management tools and load balancing for implementing multiple cache boxes. And finally the overall success largely depends on the configuration and fine-tuning of both Squid and Linux.