Coping with Large Log Files
Log files can easily become very, very large, taking hours to download at normal modem speeds. In this document, we detail a few strategies for dealing with large log files.
FastStats can access your log files three ways: (1) it can read them from your hard drive, (2) it can download them from an FTP server, and (3) it can download them from a web server. FastStats will intelligently download the files that you tell it to -- that is, it will not download a log file unless it has changed since its last download.
Most of the time, it is perfectly acceptable to give FastStats your FTP server name and a wildcard that covers all of your log files, and let it manage the downloading process. However, you can get extra performance by manually downloading and deleting your log files. The boost in performance depends on how your ISP accumulates log files:
A new, uniquely named log file each day
For example, www_log.100199, www_log.100299, etc.
This is the best type of log file storage system. FastStats's intelligent download system will only download the log files that have been updated since the last download. The only manual tuning possible is to occasionally log into your FTP server and manually delete old log files that have been already downloaded -- and save you disk space.
A "rotating name" log file system
Today's log file is www_log.0, yesterday's log file is www_log.1, the day before yesterday's log file is www_log.2, etc.
From FastStats' point of view, this is a bad type of log file storage system. Each day, every log file on the system changes (.0 goes to .1, .1 goes to .2, etc.), so FastStats will download completely new log files each day. You can help things along by, every week or so, downloading the log files into a separate folder and then deleting them from the server. This way you save both server disk space and download time.
A "one file" system
The log file is always access_log, and new log file information is written to the end of the log file.
This is not a good system. With every hit stored in access_log, the log file changes and needs to be re-downloaded by FastStats. We recommend downloading your access_log file to a folder on your computer once a week and then deleting it from your server. Your log files will be smaller and easier to download.
You should probably check with your web hosting provider before deleting any log files. If you have any other good tips for managing log files, e-mail us at firstname.lastname@example.org and we'll post them here.
The Dates Reported In My Log Files Are Grossly Incorrect. I Think FastStats Might Be Confusing The Day And Month. How Do I Fix It?
You should check your log files; the time data may be written in European (DD-MM-YY) format. FastStats, by default, reads log files in United States (MM-DD-YY) date format.
To change the offset, go to the Report menu and choose Options. Make sure All log files are in European date format is checked. The changes will take effect when the report is regenerated.
FastStats is Linked to Missing Export .. in COMCTL32.DLL
This problem is caused by an very old version of the COMCTL32.DLL file being installed on your system. You can download a new version of the COMCTL32.DLL file from this URL:
The COMCTL32.DLL file is used by a wide variety of applications, and upgrading will most likely slightly improve user interface performance of those applications and eliminate some errors, in addition to enabling FastStats to work.
Things that Can Mess Up The Log File Analysis Process
This help topic tells you about common problems in configuring log file analysis that can cause FastStats to mess up.
1. If your log files are stored locally and you tell FastStats to parse an entire directory, beware! FastStats will not only parse every file in the directory you told it to, but it will parse every file in every subdirectory of the directory you specified. People have run into problems where a directory containing .EXE files is a subdirectory of the log file directory. FastStats does its best to ignore the .EXE file, but it may try to parse them and crash in some situations.
2. Never let FastStats parse your error_log file (if you have one). If you have multiple log files, only let FastStats parse your access_log file. A common problem is to have a directory with access_log, agent_log, referer_log, and possibly error_log. If you tell FastStats to analyze this entire directory, it will give you an error and may even crash on the error_log file. You should tell FastStats to only analyze access_log.
?Why is the "Geographical Location" report missing
The Geographical Location report is generated by analyzing the domain name suffixes of users accessing your web site. For example, a hit from "user12.isp.us" is considered to be a hit from a US address, a hit from "user13.isp.de" is a hit from Germany, etc.
The Geographical Location report is enabled if FastStats comes across any domain names in the log file. The majority of log files store IP addresses, numbers that look like "126.96.36.199". Unless your web server automatically stores domain names instead of IP addresses in your log file, it is necessary for FastStats to perform a "Reverse DNS lookup" and translate IP addresses into domain names. Doing a Reverse DNS lookup dramatically slows down the web server log file analysis process.
In summary, the Geographical Location report is enabled if one or more of the following is true:
- Your log file contains mostly domain names
- You have enabled Reverse DNS in FastStats
Note: The Reverse DNS feature has been removed from FastStats 2.69a. Read more about that here.
The Times Reported In My Log File Are Off By Several Hours. How Do I Adjust Them?
It is common for the time data in your log files to be recorded in an inappropriate time zone. To correct this, FastStats allows you to add or subtract an offset to the hour recorded in the log file. This problem is common if a thirty party hosts your web site; the third partys servers may be located on the opposite side of the country, and in a completely different time zone. You can adjust the offset to any value between -12 and 12. The best way to determine the offset for your log files is to examine the times recorded near the beginning of your log file. If you know what time your web-hosting provider generates the log files, then you can just subtract to determine the offset. It may take some experimentation. Note that some servers record their log files in GMT time (which is 5-8 hours off times in the continental US).
To change the offset, go to the Report menu and choose Options. The option to change the Time Offset is down at the bottom of the page.
Why Don't My Hits Show Up Immediately in the Log File?
Most web servers, especially Apache, buffer the log file data. That is, they store the hit data internally, in memory, and only write it to the log file after it reaches a certain threshold -- generally when the log file data occupies more than a few kilobytes in memory. A 2-3 kilobyte buffer can hold several hundred hits.
You most likely ran into this problem when testing out FastStats. You visited your web site, pressed 'Reload' to ensure that a few hits were registered, and then re-downloaded your log files. Unless a few hundred people have "hit" the server between the time you visited your site and the time you downloaded the log file, your visits are most likely not in the log file (and therefore will not show up in the 'Recent Accesses' report).
Note: this information only applies if your web hosting provider (or system administrator) allows you to access the current day's log files. Most web hosts only let you access yesterday's log files (for a lot of good reasons).
No Reverse DNS Tab?
FastStats used to have a problem with the Reverse DNS process, which reverses an IP address, such as 188.8.131.52, into a host name like user.isp.com. We removed the Reverse DNS feature, including the Reverse DNS tab, from FastStats 2.69.
Update 3/13/2000: We've fixed the Reverse DNS problem in the updated version.