HTTP 1.0 Logs Considered Harmful

Ramón Cáceres, Balachander Krishnamurthy, and Jennifer Rexford

AT&T Labs-Research; 180 Park Avenue

Florham Park, NJ  07932 USA

{ramon, bala, jrex}@research.att.com

Motivation

Virtually all Web performance evaluation work has focused on server logs, proxy logs, or packet traces based on HTTP 1.0 traffic. HTTP 1.1 [1] introduces several new features that may substantially change the characteristics of Web traffic in the coming years. However, there is very little end-to-end HTTP 1.1 traffic in the Internet today. This has led to a dependence on HTTP 1.0 logs and synthetic load generators to postulate improvements to HTTP 1.1, and to evaluate new proxy and server policies. We believe that Web performance studies should use more realistic logs that take into account changes to the HTTP protocol. In particular, we suggest techniques for converting an HTTP 1.0 log into a semi-synthetic HTTP 1.1 log, based on information extracted from packet-level traces and our knowledge of the HTTP 1.1 protocol. As part of this study, we plan to collect detailed packet-level server traces at AT&T's Easy World Wide Web (EW3) platform [2], the Web-hosting part of AT&T WorldNet.

Differences Between HTTP 1.0 and HTTP 1.1

The changes in the HTTP protocol address a number of key areas, including caching, hierarchical proxies, persistent TCP connections, and virtual hosts. We focus on specific new features that are likely to alter the workload characteristics (as summarized in Table 1):


 
Table 1: Effects of changes in the HTTP protocol
HTTP 1.1 Feature Implication
Persistent connections Lowers number of connection set-ups
Pipelining Shortens interarrival of requests
Expires Lowers number of validations
Entity tags Lowers frequency of validations
Max-age, max-stale, etc. Changes frequency of validations
Range request Lowers bytes transferred
Chunked encoding Lowers user perceived latency
Expect/Continue Lowers error response/bandwidth
Host header Reduces proliferation of IP addresses

Using Packet Traces to Model HTTP 1.1

Research on Internet workload characterization has typically focused on creating generative models based on packet traces of various applications [5,6]. These models range from capturing basic traffic properties, like interarrival and duration distributions, to representing application-level characteristics. Synthetic workload generators based on these models can drive a wide range of simulation experiments, allowing researchers to perform accurate experiments without incurring the extensive overhead of packet trace collection. A synthetic modeling approach has also been applied to develop workload generators for Web traffic [7,8]. Although these synthetic models of HTTP 1.0 traffic are clearly valuable, it may be difficult to project how these synthetic workloads would change under the new features in HTTP 1.1.

In contrast to Internet packet traces, many sites do maintain Web proxy or server logs. Having a way to convert these HTTP 1.0 logs to representative HTTP 1.1 logs would allow these sites to evaluate the potential impact of various changes to the protocol. These semi-synthetic HTTP 1.1 traces could also be converted into synthetic workload models that capture the characteristics of HTTP 1.1. The process of converting HTTP 1.0 logs to representative HTTP 1.1 logs requires insight into the components of delay in responding to user requests, as well as other information that is not typically available in logs. A packet trace, collected at the Web proxy or server site, can provide important information not available in server logs:

The value of packet traces has been demonstrated in recent studies on the impact of TCP dynamics on the performance of Web proxies and servers [9,10]. Similarly, a complete collection of packet traces of both request and response traffic at a Web server would provide a unique opportunity to gauge how a change to HTTP 1.1 would affect the workload.

For example, the packet trace could be used to estimate the latency reductions under persistent connections by measuring the delay involved in closing and reopening a TCP connection between a client and the server for consecutive transfers. As a more complicated example, consider the potential use of range requests in HTTP 1.1 to fetch partial contents of an aborted response message. If a client aborts a request during the transmission of the response, the client (or proxy, if one exists) may receive only a subset of the response. Abort operations can be detected in a packet trace by noting the client RST packet, whereas the server log would either include (or not include) an entry for the request/response. The packet trace would also indicate how much of the transfer completed before the abort reached the server. If the client initiates a second request for the resource, the HTTP 1.0 server would transfer the entire contents again. However, an HTTP 1.1 client (or proxy) could initiate a range request to transfer only the missing portion of the resource. The HTTP 1.0 packet traces would enable us to recognize the client's second request, and model the corresponding range request in HTTP 1.1, assuming the partially-downloaded contents are still in the cache.

Implementation

Server Packet Traces

During the past year and a half, AT&T Labs has built and deployed two high-performance packet monitors at strategic locations inside AT&T WorldNet. Traces from these PacketScopes have been used for a number of research studies [10,11]. For the purposes of this study, we are constructing a third PacketScope to be installed at AT&T's EW3 Web-hosting complex.

This third packet monitor consists of a dedicated 500-MHz Alpha workstation attached to two FDDI rings that together carry all traffic to and from the EW3 server farm. The monitor runs the tcpdump utility [12], which has been extended to process HTTP packet headers and keep only the information relevant to our study [13]. The monitor stores the resulting data first to a 10-gigabyte array of striped magnetic disks, then to a 140-gigabyte magnetic tape robot. We ensure that the monitor is passive by running a modified FDDI driver that can receive but not send packets, and by not assigning an IP address to the FDDI interface. We control the monitor by connecting to it over an AT&T-internal network that does not carry customer traffic. We make our traces anonymous by encrypting IP addresses as soon as packets come off the FDDI link, before writing any packet data to stable storage. Our experience with an identical monitor elsewhere in WorldNet indicates that these instruments can capture more than 150 million packets per day with less than 0.3% packet loss.

Augmented Server Logs

In addition to collecting packet traces, we plan to extend the server logging procedures in EW3 to record additional timing information. A server could log the time it (i) starts processing the client request; (ii) starts writing data into the TCP send socket; and (iii) finishes writing data into the TCP send socket. Typically, servers log just one of the three (often (ii)). But logging all three would allow us to isolate the components of delay at the server. For example, the first two timestamps would allow us to determine the latency in processing client requests (e.g., due to disk I/O, or the generation of dynamic content). The packet traces, coupled with the extended server logs, provide a detailed timeline of the steps involved in satisfying a client request, with limited interference at the server (to log the additional time fields).

Client Packet Traces

Although our initial study will focus on the server packet traces and the augmented server logs, future work could consider additional measurements at (a limited subset of) the client sites. For example, a packet monitor is already installed at one of the main access points for WorldNet modem customers; this data was used in a recent study of Web proxy caching [10]. This data set would provide a detailed view of the Web traffic for the (admittedly small) subset of EW3 requests that stems from these WorldNet modem customers. By measuring Web transfers at multiple locations, and through multiple measurement techniques, we hope to create a clearer picture of how both the network and the server affect Web performance.

Acknowledgments: We thank Dave Kristol for his clarifications on some of the aspects of HTTP 1.1.

References

1
R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, ``Hypertext transfer protocol - HTTP/1.1,'' September 11 1998.
ftp://ftp.ietf.org/internet-drafts/draft-ietf-http-v11-spec-rev-05.txt.

2
AT&T Easy World Wide Web
http://www.ipservices.att.com/wss/hosting.

3
J. C. Mogul, ``The case for persistent-connection HTTP,'' in Proc. ACM SIGCOMM, pp. 299-313, August/September 1995.
http://www.acm.org/sigcomm/sigcomm95/papers/mogul.html.

4
H. F. Nielsen, J. Gettys, A. Baird-Smith, E. Prud'hommeaux, H. W. Lie, and C. Lilley, ``Network performance effects of HTTP/1.1, CSS1, and PNG,'' in Proc. ACM SIGCOMM, pp. 155-166, August 1997.
http://www.inria.fr/rodeo/sigcomm97/program.html.

5
R. Caceres, P. Danzig, S. Jamin, and D. Mitzel, ``Characteristics of wide-area TCP/IP conversations,'' in Proc. ACM SIGCOMM, pp. 101-112, September 1991.
http://www.research.att.com/~ramon/papers/sigcomm91.ps.gz.

6
K. C. Claffy, H.-W. Braun, and G. C. Polyzos, ``A parameterizable methodology for internet traffic flow profiling,'' IEEE Journal on Selected Areas in Communications, vol. 13, pp. 1481-1494, October 1995.
http://www.nlanr.net/Flowsresearch/Flowspaper/flows.html.

7
P. Barford and M. Crovella, ``Generating representative web workloads for network and server performance evaluation,'' in Proc. ACM SIGMETRICS, June 1998.
http://cs-www.bu.edu/faculty/crovella/paper-archive/sigm98-surge.ps.

8
B. Mah, ``An empirical model of HTTP network traffic,'' in Proc. IEEE INFOCOM, April 1997.
http://www.ca.sandia.gov/~bmah/Papers/Http-Infocom.ps.

9
H. Balakrishnan, V. N. Padmanabhan, S. Seshan, M. Stemm, and R. H. Katz, ``TCP behavior of a busy Internet server: Analysis and improvements,'' in Proc. IEEE INFOCOM, April 1998.
http://http.cs.berkeley.edu/~padmanab/index.html.

10
R. Caceres, F. Douglis, A. Feldmann, G. Glass, and M. Rabinovich, ``Web proxy caching: The devil is in the details,'' in Proc. ACM SIGMETRICS Workshop on Internet Server Performance, June 1998.
http://www.cs.wisc.edu/~cao/WISP98.html.

11
A. Feldmann, A. Gilbert, and W. Willinger, ``Data networks as cascades: Explaining the multifractal nature of internet WAN traffic,'' in Proc. ACM SIGCOMM, pp. 42-55, September 1998.
http://www.acm.org/sigcomm/sigcomm98/tp/abs_04.html.

12
V. Jacobson, C. Leres, and S. McCanne, ``tcpdump,'' June 1989.
ftp://ftp.ee.lbl.gov.

13
A. Feldmann, ``Continuous online extraction of HTTP traces from packet traces,'' October 1998. In submission to the W3C Workload Characterization Workshop.