September 2006

qmail error resolution: sorry, although i’m listed as a best-preference mx for that host, it isn’t in my control/locals file.

today, i had to reenable a domain through plesk. once the guy’s site was up and running, he said that he couldn’t receive email. i sent him a test email and had the following message:
Sorry. Although I’m listed as a best-preference MX or A for that host, it isn’t in my control/locals file, so I don’t treat it as local. (#5.4.6)
how come? i honestly never saw this problem before.
well, qmail/plesk stores the hostname in a file located in /var/qmail/control/rcpthosts. i checked and it was there. so what gives?
my guess is that plesk did things too quickly, or not well enough. i ended up having to restart qmail. after that was done, he began receiving his messages again.

qmail: 7 day mail queues? too long.

i’ve been taking a proactive stance in checking the mail queue in my office, since if it gets cluttered with newsletters or unnecessary stuff (including the occasional password phishing from code vulnerabilities in contact forms), it ends up slowing down other emails significantly.
by default, the qmail queue is 7 days long (604800 seconds). to check that, you can run the following:
# qmail-showctl | grep queue
queuelifetime: (Default.) Message lifetime in the queue is 604800 seconds.

(side point: there’s a lot of cool stuff you can see there related to the qmail setup if you don’t only grep for the queue.)
in my opinion, 7 days is just way too long. sometimes i’m checking the queue and an email is mailed to a wrong address… and the email just sits there while the mailserver repeatedly attempts to send the message to this nonexistent address. (for example, if you’re looking to email and you accidentally addressed it with the domain, you’ll be waiting a long time for a bounceback, which might cause frustration and anger because you thought you sent it to the right guy to begin with.)
everything on linux can be tweaked, and it’s relatively easy to do at times. in this particular case, what is needed is a newly created file, /var/qmail/control/queuelifetime, which contains a single line: the number of seconds that you want the queue to last. in my case, i made it 172800 seconds (2 full days; a single day is 86400), so these emails get returned to sender informing them that they should get the right address or try later.
once you run this file, you can verify that the new queue length is in effect by running the following:
# /var/qmail/bin/qmail-showctl | grep queue
queuelifetime: Message lifetime in the queue is 172800 seconds.

note how it doesn’t say “Default” anymore like the previous execution of the same command did.
to force those old emails to be sent? just run qmHandle -a and you’ll notice that the queue (qmHandle -l) has gotten a lot shorter.
if you don’t have qmHandle, you can get it on sourceforge; just click here. it’s not part of the regular qmail distribution. more information on qmHandle can be found in this blog entry.

robots.txt and spidering.

when you have content that is not for public consumption, you should always be safe than sorry by preventing the search engines from crawling (or spidering) the page and learning your link structure. for example, in a development environment, it would hardly be useful for the page to be viewed as if it’s a public site when it’s not ready yet.
enter robots.txt. this file is extremely important; search engines look for that file and determine whether the site can be entered into its search cache or if you want to keep it private.
the basic robots.txt file works like this: you stick the file in the root of your website (e.g. the public_html or httpdocs folder. it won’t work if it’s located anywhere else or in a subdirectory of the site.
the crux of the robots.txt is the User-Agent and disallow directives. if you don’t want any search engine bots to spider your any files on your site, the basic file looks like this:
User-agent: *
Disallow: /

however, if you don’t want the search engines to crawl a specific folder, e.g., you would create the file as so:
User-agent: *
Disallow: /private/

if you don’t want google to spider a specific folder called /newsletters/, then you would use the following:
User-agent: googlebot
Disallow: /newsletters/

there are hundreds of bots that you’d need to consider, but the main ones are probably google (googlebot), yahoo (yahoo-slurp), and msn (msnbot).
you can also target multiple user-agents in a robots.txt file that looks like this:
User-agent: *
Disallow: /
User-agent: googlebot
Disallow: /cgi-bin/
Disallow: /private/

there’s a great reference on user agents on wikipedia. another great resource is this robots.txt file generator.
where security is concerned, a robots.txt file makes a huge difference.

showing and understanding mysql processes in detail.

i’ve learned a little trick on how to determine how your mysql server is running and where to pinpoint problems in the event of a heavy load. this is useful in determining how you might want to proceed in terms of mysql optimization.
# mysql -u [adminuser] -p
mysql> show processlist;

granted, on a server with heavy volume, you might see hundreds of rows and it will scroll off the screen. here are the key elements to the processlist table: Id, User, Host, db, Command, Time, State, Info, where:
Id is the connection identifier
User is the mysql user who issued the statement
Host is the hostname of the client issuing the statement. this will be localhost in almost all cases unless you are executing commands on a remote server.
db is the database being used for the particular mysql statement or query.
Command can be one of many different commands issued in the particular query. the most common occurrence on a webserver is “Sleep,” which means that the particular database connection is waiting for new directions or a new statement.
Time is the delay between the original time of execution of the statement and the time the processlist is viewed
State is an action, event, or state of the specific mysql command and can be one of hundreds of different values.
Info will show the actual statement being run in that instance
another useful command is:
mysql> show full processlist;
which is equivalent to:
mysqladmin -u [adminuser] -p processlist;
this shows my specific query as:
| 4342233 | adminusername | localhost | NULL | Query | 0 | NULL | show full processlist |

or you can display each field in a row format (vertical format), like so, simply by appending \G to the end of the query:
mysql> show full processlist\G
this list is very likely preferable in the event that your data scrolls off the screen and you want to find out the specific field name of a value in your database.
******** 55. row ********
Id: 4342233
User: adminusername
Host: localhost
db: NULL
Command: Query
Time: 0
State: NULL
Info: show full processlist

you can also check how many mysql queries a user has open by running the following command:
mysqladmin -u [adminuser] -p pr | awk -F\| {‘print $3’} | sort -nk1 | awk -F_ {‘print $1’} |uniq -c |sort
to see which database has the most active queries, run the following:
mysqladmin -u [adminuser] -p pr | awk -F\| {‘print $3’} | sort -nk1 |uniq -c |sort
oh, and since it’s useful… here’s a recommend /etc/my.cnf:

slave_net_timeout = 50
delayed_insert_timeout = 50

another fine tuning would include the following and is good for machines with plesk:

key_buffer = 128M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 64M
thread_concurrency = 8

the above will help you optimize your mysql database as well, but the configuration isn’t for everyone.