In a parent relationship, the child cache will forward requests to its parent cache. If the parent does not hold a requested object, it will forward the request on behalf of the child. A cache hierarchy should closely follow the underlying network topology. Parent caches should be located along the network paths towards the greater Internet. For example, if your Internet Service Provider (ISP) operates a cache, it should probably be a parent to yours, since your Web traffic will have to travel along your ISP's infra structure anyway.
In a sibling relationship, a peer may only request objects already held in the cache; a sibling can not forward cache misses on behalf of the peer. The sibling relationship should be used for caches ``nearby'' but not in the direction of your route to the Internet. For example, it may make sense for a number of department-specific caches within an organization to have sibling relationships among them. This approach is even more compelling when there is no parent cache available for the organization as a whole.
A unicast packet is the complete opposite: one machine is talking to only one other machine. All TCP connections are unicast, since they can only have one destination host for each source host. UDP packets are almost always unicast too, though they can be sent to the broadcast address so that they reach every single machine in some cases.
A multicast packet is from one machine to one or more. The difference between a multicast packet and a broadcast packet is that hosts receiving multicast packets can be on different lans, and that each multicast data-stream is only transmitted between networks once, not once per machine on the remote network. Rather than each machine connecting to a video server, the multicast data is streamed per-network, and multiple machines just listen-in on the multicast data once it's on the network.
An IP address has two components, the network address and the host address. For example, consider the IP address 172.16.1.25. Assuming this is part of a Class B network, the first two numbers (172.16) represent the Class B network address, and the second two numbers (1.25) identify a particular host on this network.
Subnetting enables the network administrator to further divide the host part of the address into two or more subnets. In this case, a part of the host address is reserved to identify the particular subnet. This is easier to see if we show the IP address in binary format. The full address is:
11111000.00001000.00000001.00011001 The Class B network part is:
The cache_dir type in Squid has nothing to do with the underlying filesystem type, it defines the storage method / implementation.
Currently Squid has 4 different implementations:
Kind of diskd is designed to work around the problem of blocking IO in a unix process. async ufs gets around this by using threads to complete disk IO. diskd uses external processes to complete disk IO.
Asyncufs works just that little bit faster, but only works on systems where threads can do async disk IO without blocking the main process. Systems with user-threads (eg FreeBSD) can not use this effectively. Diskd, being implemented as an external process, gets around this. If cache is slightly active, then the difference cannot be noticed. diskd/aufs are only useful when the cache is under high load.
In case it was not clear, asyncronous I/O (diskd/aufs) is beneficial for single drive configurations with "higher" request loads, in many cases allowing you to push about 100% more I/O thru the drive before latency creeps up too high.
For multiple drive configurations, it is almost a requirement to be able to use the I/O capacity of the extra drives. Without it, a multiple disk configuration is effectively limited to almost the speed of a single disk configuration. With asyncronous I/O, the disk I/O scales quite well (at least for the first few drives, other limits gets very apparent when you have more than ~3 drives).
cache_peer proxy.visolve.com1 parent 3128 3130 no-query default
In other words,the round-robin option is similar to default, except that Squid forwards the request to the parent with the lowest use count. The cache_peer_domain restrictions still apply, of course. A typical configuration might look like:
cache_peer proxy.visolve.com1 parent 3128 3130 round-robin no-query
Squid will wait for up to dead_peer_timeout seconds after sending out an ICP request before deciding to ignore a peer. With a multicast group, peers can leave and join at will, and it should make no difference to a client. This presents a problem for Squid: it can't wait for a number of seconds each time (whatif thecaches are on the same network, and responses come back in milliseconds: the waiting just adds latency.) Squid gets around this problem by sending ICP probes to the multicast address occasionally. Each host in the group responds to the probe, and Squid will know how many machines are currently in the group. When sending a real request, Squid will wait until it gets atleast as many responses as were returned in the last probe: if more arrive, great. If less arrive, though, Squid will wait until the dead_peer_timeout value is reached. If there is still no reply, Squid marks that peer as down, so that all connections are not held up by one peer.
An accelerator caches incoming requests for outgoing data (i.e., that which you publish to the world). It takes load away from your HTTP server and internal network. You move the server away from port 80 (or whatever your published port is), and substitute the accelerator, which then pulls the HTTP data from the ``real" HTTP server (only the accelerator needs to know where the real server is). The outside world sees no difference (apart from an increase in speed, with luck).
The httpd_accel_uses_host_header option. A normal HTTP request consists of three values: the type of transfer (normally a GET, which is used for downloads); the path and filename to be retrieved (or executed, in the case of a cgi program); and the HTTP version.
This layout is fine if you only have one web site on a machine. On systems where you have more than one site, though, it makes life difficult: the request does not contain enough information, since it doesn't include information about the destination domain. Most operating systems allow you to have IP aliases, where you have more than one IP address per network card. By allocating one IP per hosted site, you could run one web server per IP address. Once the programs were made more efficient, one running program could act as a server for many sites: the only requirement was that you had one IP address per domain. Server programs would find out which of the IP addresses clients were connected to, and would serve data from different directories for each IP.
There are a limited number of IP addresses, and they are fast running out. Some systems also have a limited number of IP aliases, which means that you cannot host more than a (fairly arbitrary) number of web sites on machine. If the client were to pass the destination host name along with the path and filename, the web server could listen to only one IP address, and would find the right destination directores by looking in a simple hostname table.
From version 1.1 on, the HTTP standard supports a special Host header, which is passed along with every outgoing request. This header also makes transparent caching and acceleration easier: by pulling the host value out of the headers, Squid can translate a standard HTTP request to a cache-specific HTTP request, which can then be handled by the standard Squid code. Turning on the httpd_accel_uses_host_header option enables this translation. You will need to use this option when doing transparent caching.
It's important to note that acls are checked before this translation. You must combine this option with strict source-address checks, so you cannot use this option to accelerate multiple backend servers (this is certain to change in a later version of Squid).
Squid checks all always_direct tags before it checks any never_direct tags. If a matching 'always_direct tag' is found, Squid will not check the never_direct tags, but decides which cache to talk to immediately. This behavior is demonstrated by the following example here, Squid will attempt to go the machine intranet, even though the same host is also matched by all acl.
Bypassing a parent for a local machine
cache_peer proxy.visolve.com parent 3128 3130
Now, suppose, a request arrives for an external host. Squid works through the always_direct lines, and finds that none of them match. The never_direct lines are then checked. The all acl matches the connection, so Squid marks the connection as never to be forwarded directly to the origin server.
The native access.log has ten (10) fields.There is one entry here for each HTTP (client) request and each ICP Query. HTTP requests are logged when the client socket is closed. A singledash (-) indicates unavailable data.
2. Elapsed Time
3. Client Address
4. Log Tag / HTTP Code
6. Request Method
9. Hierarchy Data / Hostname
10. Content Type
"TCP_" refers to requests on the HTTP port.
"UDP_" refers to requests on the ICP port
Squid switched from a Time-To-Live based expiration model to a Refresh-Rate model. Objects are no longer purged from the cache when they expire. Instead of assigning TTL's when the object enters the cache, we now check freshness requirements when objects are requested. If an object is 'fresh' it is given directly to the client. If it is 'stale' then we make an If-Modified-Since request for it.
delay_parameters pool aggregate network individual.
The variables here are:
where pool is a pool number , i.e., a number between 1 and the number specified in delay_pools as used in delay_class lines, aggregate is the parameter for the aggregate bucket, network for the network bucket, and individual for the individual bucket. Aggregate is only useful for classes 1, 2 and 3, individual for classes 2 and 3, and network for class 3. Each of these parameters is specified as restore / maximum - restore being the bytes per second restored to the bucket, and maximum being the amount of bytes that can be in the bucket at any time. It is important to remember that they are in bytes per second, not bits. To specify that a parameter is unlimited, use a -1.
If we wish to limit any parameter in bits per second, divide this amount by 8, and use the value for both the restore and the maximum. For example, to restrict the entire proxy to 64kbps, use:
delay_parameters 1 8000/8000
Squid can act as a proxy server for various Internet protocols. The most commonly used protocol is HTTP, but the File Transfer Protocol (FTP) is still alive and well.
FTP was written for authenticated file transfer (it requires a username and password). To provide public access, a special account is created: the anonymous user. When you log into an FTP server you use this as your username. As a password, you generally use your email address. Most browsers these days automatically enter a useless email address.
It's polite to give an address that works, though. If one of your users abuses a site, it allows the site admin get hold of you easily.
Squid allows you to set the email address that is used with the ftp_user tag. You should probably create a firstname.lastname@example.org email address specifically for people to contact you on.
There is another reason to enter a proper address here: some servers require a real email address. For your proxy to log into these ftp servers, you will have to enter a real email address here.
Squid can only bind to low numbered ports (such as port 80) if it is started as root. Squid is normally started by your system's rc scripts when the machine boots. Since these scripts run as root, Squid is started as root at bootup time.
Once Squid has been started, however, there is no need to run it as root. Good security practice is to run programs as root only when it's absolutely necessary, and for this reason Squid changes user and group ID's once it has bound to the incoming network port.
The cache_effective_user and cache_effective_group tags tell Squid what ID's to change to. The Unix security system would be useless if it allowed all users to change their ID's will, so Squid only attempts to change ID's if the main program is started as root.
If you do not have root access to the machine, and are thus not starting Squid as root, you can simply leave this option commented out. Squid will then run with whatever user ID starts the actual Squid binary.
Now let us assume that, you have created both a squid user and a squid group on your cache machine. The above tags should thus both be set to 'squid' .
Half closed clients:
Fully closed clients:
The Syntax is:
htpasswd [ -c ] passwdfile username .
The redirector program is NOT a standard part of the Squid package. However there are a couple of user-contributed redirectors in the "contrib/" directory. Since everyone has different needs, it is up to the individual administrators to write their own implementation. For testing, and a place to start, this very simple Perl script can be used:
The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O.
The data channel varies depending on whether you ask for passive ftp or not. When you request data in a non-passive environment, you client tells the server ``I am listening on .'' The server then connects FROM port 20 to the ip address and port specified by your client. This requires your "security device" to permit any host outside from port 20 to any host inside on any port >1023. Somewhat of a hole.
In passive mode, when you request a data transfer, the server tells the client ``I am listening on .'' Your client then connects to the server on that IP and port and data flows.
The byte-hit ratio measures the ratio of total bytes from cached objects over the total bytes of objects requested.