Friday, 20 July 2007

Perl, unicode, utf-8, mysql

Handling unicode/utf8 in Perl is quite trivial when you understand "two string approaches".

Perl 5.8 by default can handle UTF-8 strings like a sequence of bytes (1-4 bytes per one char). You can "compress" them to unicode strings with
$wide_char_string = Encode::decode_utf8($octets)

Such encoded string has unicode flag, you can check it with:

If they have latin1/2/russian/etc. chars after unicode packing:
length($octets) > length($wide_char_string)

Remember: unicode flag does not mean you MUST have wide characters in string.
Wide characters (ord > 255) can be in such unicode string, but it can also has a set of unicode-octets, "unpacked" in unicode string.

Use Test::utf8 to check you have wide chars or not.

FAQ / typical problems:

- check all sources in your program are the same, coded as unicode (wide) or unicode-string. Typically wide-char-strings is a better approach then byte-string (see perldoc Encode). All inputs/outputs like files, DBI, network need to be converted to your choosen internal format.

- do not use use utf8 pragma unless you really need it. In this case all strings have unicode flag, but it does not mean they have wide chars!

(octet_string eq unicode_string) == false
You cannot compare such strings without decode/encode, they are natively different!

Perl, utf8, MySQL

Two approaches to get unicode working with MySQL:

1) after connecting to database, do("SET NAMES 'utf8');
All unicode strings will be octets, without unicode flag.

2) use DBI connection with flag: mysql_enable_utf8 (since DBD::mysql >= 4)
All unicode strings will have wide chars and unicode flag.

Only the second approach works correctly with AutoReconnect flag.

Good news: there is no difference for inserts/updates you use octets or wide_char_strings.

More to read

Check also this Martin Fowler - utf8 in perl

Linux - Mac OsX - Redhat - Centos - Ubuntu - FreeBSD - Solaris - Windows commands equivalents

Updated [2016-02]: OsX/Mac/Darwin/Yosemite/El Capitan
Update: Added Solaris (both OpenSolaris and SunOS) and Windows (where applicable) coverage for this set of commands.
Update: Coverage for redhat-fedora-centos Linux line / yum package manager.

Systems covered:
Linux groups:
* Redhat based: Redhat, Fedora, Centos (yum/rpm packages)
* Debian based: Debian, Ubuntu, Kubuntu (dpkg/.deb packages)
Mac OsX: Darwin/Yosemite/El Capitan
OsX is usually very close to BSD systems, but sometimes suprisingly has Linux-style commands/tools.
Windows: around XP/2007 and newer


Check what groups do I belong to?
{linux, osx, freebsd, solaris}$ groups
{windows - active directory}> dsquery user -samid %USERNAME%|dsget user -memberof

Disk, filesystem

Disk usage
{linux, osx, bsd}$ du -sh

Count subdirectories in current directory:
{linux}$ du --max-depth=1
{osx, bsd}$ du -d1
{SunOS}$ du

Typical approach to find biggest directories/files on disk:
{linux}$ du --max-depth=1 -kx|sort -n
{osx, bsd}$ du -d1 -kx|sort -n

Find some kinds of files (regex is a mask for full path, no need for begin/end marks)
{linux}$ find . -regextype posix-extended -type f -regex ".*\.(java|class)"
{osx, bsd}$ find -E . -type f -regex ".*\.(java|class)"

Show open files and programs:
{linux, osx}$ lsof
{freebsd}$ fstat

Real time disk usage (is there something which shows results for every disk in linux?):
{linux}$ vmstat 3
{osx, bsd}$ iostat 3

Swap info:
{linux}$ free
{freebsd}$ swapinfo
{osx}$ vm_stat
{osx}$ top -l 1 -s 0 -n 0


Show open ports and apps connected to them:
{linux}$ netstat -apne --inet
{osx}$ lsof -i
{freebsd}$ sockstat
{SunOS}$ netstat
{windows}$ netstat -b
netstat -b -v   # slower but with tree of dependencies

Kernel issues

Show loaded modules:
{linux}$ lsmod
{osx}$ kextstat
{freebsd}$ kldstat

Load kernel module:
{linux}$ modprobe SomeModule
{freebsd}$ kldload SomeModule

Remove loaded module:
{linux}$ rmmod SomeModule
{freebsd}$ kldunload SomeModule

Program development

Trace the system calls of a program:
{linux}$ strace
{osx}$ dtrace   
{freebsd}$ truss
  (strace is also available in /usr/ports/devel/strace)

libraries - show all paths + libs:
{linux}$ ldconfig -p
{freebsd}$ ldconfig -r

Packages management

Different linux distros make it own way. I'll focus on debian-based distributions like Debian, Ubuntu, Kubuntu etc.

Find which package this file belongs to?
{freebsd}$ pkg_info -W /path/to/checked_file
{debian ubuntu}$ dpkg -S /path/to/checked_file
{redhat centos}$ rpm -qf /path/to/file
{osx, for brew}$ ls -l `which node`|perl -lane '{print $F[-1]}'
OsX - it shows the original brew package/file this command is linked to

Have we got a package like... (in (k)ubuntu you can use more friendly tools like synaptic, apt-get, kPackageKit)
{linux}$ apt-cache search your_name
{freebsd}$ cd /usr/ports; make search key=your_name
                          make search name=pear display=name,path
    you can also try simple locate (only in package names):
{freebsd}$ locate -i your_name | grep "/usr/ports/"
{redhat centos}$ yum search name
yum provides name

Install a binary package
{debian ubuntu}$ apt-get install package_name
{redhat centos}$ yum install name
{freebsd}$ pkg_add -r package_name
{windows}$ msiexec /i package.msi
{osx}$ ??? 
In FreeBSD you have packages made in distribution release time - unfortunately there are no binary upgrades for released version)

Update binary packages
{debian ubuntu}$ apt-get update; apt-get upgrade
{redhat centos}$ yum update

Install a package from sources
{debian ubuntu}$ apt-src
{freebsd}$ cd /usr/ports/path/package; make install clean
{osx}$ use brew or macports

...The more I see the less I know...

Friday, 13 July 2007

YAPC Vienna 2007 - my talk

I will have a Catalyst talk at Yet Another Perl Conference in Vienna, 28th to 30th August 2007 in Web track.

More on conference site,