Case sensitivity in MySQL

It is sometimes necessary to perform a case-sensitive query for text stored in a MySQL table. By default, searches are case insensitive.

In MySQL, the character set defines what kind of characters you can store, but not how they are compared or sorted. This is determined by the character set’s collation setting.

For example, take the following table:

CREATE TABLE `CHEM_FRAGMENT` (
`CFR_ID` bigint(20) NOT NULL auto_increment,
`CFR_SMILES` varchar(255) NOT NULL,
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The character set is ‘latin1′. The collation settings are not shown because they are set to the default value. What’s that? Well:


mysql> show collation where charset = 'latin1';

+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1  |  5 |         | Yes      |       1 |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci  | latin1  | 15 |         | Yes      |       1 |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 |
| latin1_bin        | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci | latin1  | 48 |         | Yes      |       1 |
| latin1_general_cs | latin1  | 49 |         | Yes      |       1 |
| latin1_spanish_ci | latin1  | 94 |         | Yes      |       1 |
+-------------------+---------+----+---------+----------+---------+

8 rows in set (0.00 sec)

The default collation is ‘latin1_swedish_ci’. Ignoring everything else, the important thing is the ‘_ci’ at the end, which stands for ‘case insensitive’. So change it! ‘latin1_general_cs’ is case sensitive, and so is ‘latin1_bin’.

To alter the collation for an entire table:

ALTER TABLE `THE_TABLE` COLLATE latin1_general_cs;

To alter the collation for a single column:

ALTER TABLE `THE_TABLE` MODIFY `THE_COLUMN` varchar(760) default NULL COLLATE latin1_general_cs;

How to recover from a tarbomb incident, and why not to do it

Tarbombs are irritating. I just experienced a minor one downloading the project.net version 9 beta 1 tarball. My working directory was filled with install scripts and a file called “ping.html” that I was fairly sure I didn’t care about.

There’s an easy way to fix it using xargs, that I found via google on Toolbox for IT, a new site for me.

All you do is ask tar to list the contents of the tarball and use xargs to delete those files. Look how easy it seems:

$ tar ztf project_net_installer_linux90b1.tar.gz | xargs rm -rf

HOWEVER. The first entry in this particular tarball is ‘./‘. So the first thing running that script did was to delete everything in my working directory (and tell me that it couldn’t delete the working directory itself). Awesome.

Nothing of value was lost, though I did have to redownload the tarball. And this is nowhere near as elegant, but to save youself from this one could try

$ tar ztf project_net_installer_linux90b1.tar.gz | egrep -v '^./$' | xargs rm -rf

It worked for me.

Working with SDFs using MX and groovy

Rich Apodaca recently introduced MX, a lightweight cheminformatics toolkit. He also recently published a quick demo to demonstrate how easy it was to use MX to read and write SDFiles using jruby to script it.

I wanted to replicate the functionality in groovy, another scripting language that runs on the JVM — one I’ve recently been using quite extensively (in conjunction with its web development framework called, inevitably, grails).

Here’s what I did to get this working

  1. Download mx
    $ wget wget http://mx-java.googlecode.com/files/mx-0.107.0.jar
  2. Copy to ~/.groovy/lib
    $ cp mx-0.107.0.jar ~/.groovy/lib
  3. Test reading an SD file in groovy (this is transliterated from the example provided by Mr Apodaca)
    $ groovysh
    Groovy Shell (1.5.6, JVM: 1.6.0_0-b12)
    Type 'help' or '\h' for help.
    -------------------------------------------------------------------------------
    groovy:000> import com.metamolecular.mx.io.mdl.SDFileReader
    groovy:000> r = new SDFileReader("example.sdf")
    ===> com.metamolecular.mx.io.mdl.SDFileReader@4e2892b
    groovy:000> r.nextRecord()
    ===> null
    groovy:000> m = r.molecule
    ===> com.metamolecular.mx.model.DefaultMolecule@64c272bc
    groovy:000> m.countAtoms()
    ===> 20
    groovy:000> r.keys.each{ key -> println "$key : ${r.getData(key)}" }
    Mol Weight : 275.2203
    Formula : C11H9N5O4
    ===> [Mol Weight, Formula]

There you have it! I am impressed with the relatively straightforward API, though it is rather tempting to wrap it inside a groovy builder to make a mini-DSL. SDFileBuilder has a certain ring to it…

Follow

Get every new post delivered to your Inbox.