|
||||||
| | Abstract | Introduction
| What is Transparent Caching? | How to implement transparent caching using Squid? | | Policy Based Routing | Smart Switching | How Switch Operates | Comparison of L4 and L7 Switches | | Squid Box as a Gateway | About IP chains | About IP tables | Squid in transparent mode | Hardware Recommendations | Comparative Study | Conclusion | About Visolve.com | |
||||||
Abstract |
Internet traffic is growing at a phenomenal rate, and such rapid increase in network traffic has created numerous networking challenges for ISPs and enterprises, like increased bandwidth cost for ISP's, bandwidth congestion, poor response time for end users and cost-efficient network / customer base scalability. The most efficient solution to these networking problems is to use your existing network infrastructure to localize traffic patterns, enabling content requests to be fulfilled locally. Increased speed/ decreased latency without the cost of additional bandwidth has catapulted caching software and appliances into a prominent place among the fastest growing segments of Internet technology. |
|||||
Introduction |
A transparent cache is so named because it works by intercepting the network traffic transparently to the browser. In this mode, the cache short-circuits the retrieval process if the desired file is in the cache. Transparent caches are especially useful to ISPs because they require no browser setup modification. Transparent caches are also the simplest way to use a cache internally on a network, because they do not require explicit coordination with other caches. The purpose of this white paper is to discuss the various methods of implementating transparent caching using Squid on Linux with a policy based router, an externalL4 switch, and an L4 switch inside the Linux Squid box. First, some basic concepts will be discussed, followed by the advantages of transparent caching, and finally redirecting packets to Squid using IP-Chains. | |||||
What is transparent caching? |
The full
explanation about the term "Transparent Caching and Transparent
Proxying" depends on the context, but we can assume the context here is
HTTP proxy/caches with transparent hijacking of port 80, which is the
default HTTP traffic in the internet. The difference is that the cache includes a cache, while the proxy only proxies without caching. The term transparent is overloaded, having different meanings depending on the situation. To some it means a setup that hijacks port 80 traffic where the client tried to go to other servers, to some it means a semantically transparent proxy that does not change the meaning or content of requests/replies. There is no such thing as a truly transparent proxy, only semitransparent and certainly not such a thing as a truly transparent cache. Squid can be configured to act transparently. In this mode, clients are not required to configure their browsers to access the cache, but Squid will transparently pick up the appropriate packets and cache requests. This solves the biggest problem with caching: i.e. getting users to use the cache server. Advantages of Transparent CachingAs might be expected, the advantages and disadvantages of transparent caching are largely the reverse of those cited for proxy caching. In the advantages category we have the following :
Disadvantages of Transparent Caching
|
|||||
How to implement transparent caching using squid? |
Transparent
caching can be implemented by three ways.
|
|||||
Transparent caching using policy based routing |
This arrangement
uses a router to route WWW traffic (via policy
routing) to the Squid cache box. Because the router can change only the
IP address of a packet, the Squid Linux box must be configured to
redirect the destination port of the packet. The Router policy
redirects packets with port 80 to the Squid box and redirects other
traffic to the Internet directly. To set the router policy rules, refer
to your router's manual. Using the IP-Chains tool in the Squid box, one
can redirect packets which are sent by router to the Squid application.
See later chapters for more details about configuring IP-Chains. Since
some routers (e.g. Cisco series) do not recognize Squid cache failures,
if Squid does malfunction, service to the WWW breaks. To overcome this
problem, a cache guard (a Perl script running on the computer inside
serviced network) can be use to regularly query the Squid box for a
cached object. When the cache guard fails repeatedly to retrieve the
object from the cache, the cache guard changes the router configuration
(by SNMP) to pass the WWW traffic directly to the Internet. In this
way, a fail over strategy can be implemented. |
|||||
Transparent caching using smart switching |
This arrangement
uses a Layer 4 or Layer 7 router to route WWW traffic
to the transparent Squid cache. Because the router can change only the
IP address of a packet, the Squid Linux box must be configured to
redirect the destination port of the packet. Both L4 and L7 switches
intercept outgoing traffic and pass HTTP requests, typically port 80
traffic, to the squid proxy server that the switch is configured to
recognize. The switch forwards non- HTTP traffic to other destinations.
The architecture shows a switch passing HTTP traffic to the Squid proxy
server and non-HTTP traffic to the Internet |
|||||
How Switch Operates? |
L4 and L7
switches derive
their names from the level of the Open Systems Interconnection (OSI)
Reference model at which they operate. The capabilities of these
switches are determined by the layer in the OSI model at which they
operate.
|
|||||
Comparing L4 and L7 Switches |
An L7 switch has
the same features that an L4 switch has, plus
additional, more sophisticated features, as described in this section.Similar features
How the L7 switch
is different
Performance comparison between L4 and L7 switches :
|
|||||
Squid box as a gateway |
This setup is
used in small LAN or WAN where number of clients are
less. Here it is mandatory to configure Squid box as a default Gateway
in all machines. This method requires more configuration in the Squid
box as compared to the other methods.Squid box Configuration
|
|||||
About IP Chains
|
Ipchains is an
extremely powerful program that allows the user to set
up complex IP filtering and accounting rules. Purpose : To set up a firewall in the Squid/Linux box with the minimal options needed for transparent proxy. Here is the simplest method. Details : Make sure that the following options in the kernel are enabled.
cat /proc/sys/net/ipv4/ip_forward This should return 1. Else, do the command echo 1 > /proc/sys/net/ipv4/ip_forward ipchains -A input -j REDIRECT 3128 -p tcp -s 0.0.0.0/0 -d 0.0.0.0/0 80 This command redirects all the requests, irrespective of source IP Addresses, with destination port 80 to destination port 3128 in which Squid (in Transparent mode) is running. // New-User defined Rule is declared ipchains -A forward -s 172.16.1.0/24 -i eth1 -j good-bad // good-bad rule is added to the ipchains rule table. This is forwarding all the requests coming from the source 172.16.1.1 -254 to the interface through which internet is connecting to. // In the following set of lines define the user defined rule good-bad ipchains -A good-bad -p tcp -dport ssh -j MASQ ipchains -A good-bad -p tcp -dport telnet -j MASQ ipchains -A good-bad -p tcp -dport ftp -j MASQ ipchains -A good-bad -p tcp -dport smtp -j MASQ ipchains -A good-bad -p tcp -dport 110 -j MASQ For more information about ipchain configurations visit us at http://squid.visolve.com/squidconf.html |
|||||
About IP table |
The iptables
module ( for kernel 2.4.x series and above) which is a
part of the Netfilter framework is a good upgrade of old ipchains( for
kernel 2.2.x). Kernel setup To run the pure basics of iptables the following options are to be configured into the kernel : CONFIG_PACKET CONFIG_NETFILTER And of course your interfaces are needed to be configured properly to work, ie. Ethernet, PPP and SLIP interfaces. The following are to be set in the kernel if more advanced options are needed :
iptables -t nat -A PREROUTING -p TCP --dport 80 -j REDIRECT --to-port 3128 The above rule redirects port 80 requests, irrespective of source ip address to port 3128 (or whichever port in which squid is running in transparent mode). IP-Masquerading iptables -t nat -A POSTROUTING -p TCP -s 0/0 --dport 21 -j MASQUERADE iptables -t nat -A POSTROUTING -p TCP -d 0/0 --dport 20 -j MASQUERADE iptables -t nat -A POSTROUTING -p TCP --dport 25 -j MASQUERADE iptables -t nat -A POSTROUTING -p TCP --dport 110 -j MASQUERADE iptables -t nat -A POSTROUTING -p TCP --dport 22 -j MASQUERADE iptables -t nat -A POSTROUTING -p TCP --dport 23 -j MASQUERADE The above rules are essential when we connect modem or squid is in between two different network to make TELNET, FTP, SMTP, POP, HTTPS to communicate to INTERNET. |
|||||
Squid in transparent mode |
To Run Squid in a
transparent mode, enable the following
directives in
Squid.conf. httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on The httpd_accel_port directive tells which port the origin server is listening on (port 80). Squid does not need to know how requests arrive at its listening port (3128). This must be done by the operating system or router. Squid sees a request for a URL and connects to port 80 on the server where it thinks the URL resides. Squid does not have any control over what types of request arrive to it. If Squid is listening to port 3128 then it assumes the data arriving there is a protocol it can handler (HTTP, FTP, etc). The type of packets that are redirected to Squid is determined entirely by the TCP/IP implementation of the host (i.e. ipchains forwarding) and is out of Squid's control. |
|||||
Recommended Hardware for Transparent Cachin |
Processor :Intel
P3 550MHz CPU Hard Drive :For high performance and stability, a SCSI disk is highly recommended or use UDMA 66 Drive instead of IDE Disk. Typically 9 GB Disks are preferred. Ethernet :High performance Ethernet is preferred. RAM :For every 1 GB cache, 10 MB of RAM is required. For the above case, Minimum of 300 MB is required preferably 512 MB RAM. |
|||||
| Comparision |
Policy
based routing Advantages:
Disadvantages :
Using smart
switching
Disadvantages:
Comparison of using a router to using an L4 or L7 switch For many routers, complex filters, such as a filter for intercepting HTTP (port 80) or NNTP
Squid box as
a Gateway
Disadvantages:
|
|||||
Conclusion
|
This paper has outlined the various methods of implementing Transparent Caching using Squid. Each of these methods has its advantages, the choice is left to the implementation team which has to decide based on their network, data access pattern, volume of data, request rate, criticality and budget available. Web caching is a matured technology and Squid is very widely used web caching application, the choice and method of implementation as said may vary, although other features present in the implementation may continue or be enhanced, the underlying fundamentals will be the same as those discussed here. There are other tools available to supplement the system like reporting tools, configuration and management tools and load balancing for implementing multiple cache boxes. And finally the overall success largely depends on the configuration and fine-tuning of both Squid and Linux. | |||||
|
About ViSolve.com ViSolve is an international corporation that provides technical services, for Internet based systems, for clients around the globe. ViSolve is in the business of providing software solutions since 1995. We have experience of executing several major projects and we are now completely focused on leading Internet technologies, Testing QA and support. We are committed to the Open source movement and in the same lines we provide free support for products like Linux, Apache and Squid to the user community. |
||||||
| Document Version :
1.0 | Created
On : 28-01-02
| Updated On : 29-05-06
|
||||||