Tuesday, June 24, 2008

Python + Bash + isoinfo + mysql = Python CD Integrity Verifier

I have a requirement to verify by md5sum or sha1sum, CDs or DVDs that I burn - so I wrote a bunch of scripts. I am not saying that this is the best way to skin this particular cat, but it is working.

First of all a bit of background info.

This stuff only works on Linux because the commands make use of Linux tools such as isoinfo and dd. I am sure Windows command line equivalents exist...

I have a mysql database with one table in it that has the following fields:
distro_label --- Volume ID of CD or DVD
distro_name ---- Name of the CD or DVD
hash_type ------ 1 = md5sum, 2 = sha1sum
hash_detail ---- Known good md5sum or sha1sum of the particular CD or DVD

Here is an example record:
distro_label --- Slack11d1
distro_name ---- Slackware 11 Disk 1
hash_type ------ 1
hash_detail ---- a7cfcb4be158beca63af21b3b4dbc69c

In case you are wondering how I know the volume id - try this while you have a CD or DVD in your cd / dvd drive:
[~]$  isoinfo -d -i /dev/cdrom

This python script has one dependency. Basically I need my python script to be able to query the database.

  • MySQL-python - Download and install from here or if you are in Fedora try: yum install MySQL-python

Retrieving the volume id from the disk.
I have a small utility bash script that does this.
# small utility to find the volume id of a cd / dvd

isoinfo -d -i /dev/cdrom \
| grep "Volume id:" \
| cut -d ":" -f 2 \
| sed "s/ //g" 

It does the following:

  1. read the iso info from the disk

  2. Find the line with the Volume id

  3. Cut out the second field delimited by ":"

  4. remove all spaces

Reading the disk
In order to accurately read the CD we need to know some details about the cd first. There is a very useful script ( from which I borrowed all the technical stuff about finding the blocksizes and blockcounts required by dd ) I found the rawread.sh script here.
Here it is for your convenience:

blocksize=`isoinfo -d -i $device | grep "^Logical block size is:" | cut -d " " -f 5`
if test "$blocksize" = ""; then
  echo catdevice FATAL ERROR: Blank blocksize >&2

blockcount=`isoinfo -d -i $device | grep "^Volume size is:" | cut -d " " -f 4`
if test "$blockcount" = ""; then
  echo catdevice FATAL ERROR: Blank blockcount >&2

command="dd if=$device bs=$blocksize count=$blockcount conv=notrunc,noerror"
echo "$command" >&2


Here is my modified version to suit a call from my python script.


#Find details of the device
blocksize=`isoinfo -d -i $device | grep "^Logical block size is:" | cut -d " " -f 5`
if test "$blocksize" = ""; then
 echo catdevice FATAL ERROR: Blank blocksize >&2
 exit 1

blockcount=`isoinfo -d -i $device | grep "^Volume size is:" | cut -d " " -f 4`
if test "$blockcount" = ""; then
 echo catdevice FATAL ERROR: Blank blockcount >&2
 exit 1

command="dd if=$device bs=$blocksize count=$blockcount conv=notrunc,noerror"

# execute the command to read the disk and pipe through md5sum or sha1sum
result=`$command | $checksumtype`

#get the checksum
checksumresult=`echo $result | cut -d " " -f1`

echo $checksumresult

This script does the same things as rawread.sh but lets the user specify a checksum type as a command line argument. When called from within our python script this bash script will simply return the real checksum of the disk in the cd / dvd device.

  1. Store the checksum type in a variable.

  2. Find the block size and block count values for the disk.

  3. Format the dd command

  4. Execute the dd command and pipe into checksum type. eg: dd .... | md5sum

  5. cut the resulting checksum from the output of the above.

  6. echo just the checksum

The python script.
I have tried to comment it so it all makes sense. I look forward to a lively discussion in the comments. I will soon know how many of my readers actaually care about python or, more to the point, how many readers I have...
#!/usr/bin/env python

# MySQLdb is the only dependency required for this script.
# popen2 comes with standards python 2.5

import MySQLdb
import popen2

class Verify:
    # Constructor uses volumeid.sh to find the volumeid if the cdrom
    # and checks it against the database.
    # If no match is found then an error is generated otherwise
    # checksum details are stored in class variables.
    def __init__(self):
        ## Get the volumeid
 # fout = stdout
 # fin = stdin
 # ferr = stderr
        (fout, fin, ferr) = popen2.popen3('./volumeid.sh')
        id = ''
 ## Check for errors
        errLineCount = 0
        while True:
            if ferr.readline():
                errLineCount += 1
        if errLineCount > 0:
            print "Errors were found."
 ## We are reading each character of the standard out because
 ## we do not wish to capture the newline at the end.
        while True:
            c = fout.read(1)
            if c != "\n":
                id += c
 ## Store the volumeid in the class variable.
        self.volumeid = id
 ## Clean up.
 ## Establish mysql connection and query database.
        conn = MySQLdb.connect(host = 'localhost',
                               user = 'resu',
                               passwd = 'drowsapp',
                               db = 'cdburner' )
        cursor = conn.cursor(MySQLdb.cursors.DictCursor)
        sql = "SELECT * FROM distro WHERE distro_label = '%s'" % self.volumeid

        row = cursor.fetchone()

        ## TO DO: Check for non existent entry in database and throw error.

 ## Find the required checksum type from the database.
        if row["hash_type"] == 1:
            self.checksumtype = 'md5sum'
            self.checksumtype = 'sha1sum'
 ## Find the known checksum from the database
        self.goodchecksum = row['hash_detail']
 ## Print some information to the user.
        print "Found [ %s ] in cd drive" % row['distro_name']
        print "Good Checksum = %s" % self.goodchecksum
        print "..."
        print "performing %s check on disk now..." % self.checksumtype
 ## Clean up.
        ##  Read checksum from disk
 ## TO DO: change this to check for errors like the popen3 command above.
        cmd = "./verify.sh %s" % self.checksumtype
        (fout, fin) = popen2.popen2(cmd)
        checksum = ''
 ## Same as above in terms of not wanting the newline at the end of stdout.
        while True:
            c = fout.read(1)
            if c != "\n":
                checksum += c
 ## Clean up

        ## Compare the checksums and report!
 if checksum == self.goodchecksum:
            print "Checksum found: %s" % checksum

## If this script is being executed then do this stuff.
## This block allows us to use the above as a class or library or as a simple script.
if __name__ == "__main__":
    print "DISK Verifier -- Console Application."
    print "by David Latham ( The Linux CD Store ) 2008"
    v = Verify()

All in all these scripts join together to provide a non technical user ( ie: My lovely wife ) the ability to verify Linux distros before she ships them out. Want to know more? Check out: http://www.thelinuxcdstore.com.

No comments: