Oracle Tip: Create linguistic helper functions in SQL
Takeaway: You can generalize simple tasks in a globalized environment by breaking down a particular language into a set of grammar rules and exceptions to those rules. Scott Stephens demonstrates how you can you do just that in SQL.
In a globalized environment, many simple tasks become more difficult because of the variety of grammar rules in different languages. You can generalize these tasks by breaking down a particular language into a set of grammar rules and exceptions to those rules (as well as a base vocabulary). In some languages (e.g., Perl and Java), there are public domain modules that do linguistic transformations on text.For a somewhat simple example, let's take the trick of converting a number into its spelled-out version (e.g., for writing checks and legal contracts). This trick has been around since the early days of Oracle and generally goes like this:
select to_char(to_date(12345,'J'),'Jsp') from
dual;
Twelve Thousand Three Hundred Forty-Five
The TO_DATE function converts the number into a date using the Julian date format. Then, the TO_CHAR takes the date and formats it as a string of spelled-out numbers for the Julian date again. This trick has some limitations.
First, Julian dates are only valid in Oracle until the year 9999, so the largest possible value is 5373484. The lower end of the range is 1 or 4712BC. Also, since there was no year "zero," it isn't possible to generate the text "zero" without an extra DECODE or CASE statement. The third big limitation is that it ignores your NLS settings. No matter which language you use, the numbers are always spelled out in American English. The same problem exists for simple operations such as spelling out the day. For instance, try to generate the phrase "Cinco de Mayo" in Spanish:
alter session set nls_language = 'SPANISH';
select to_char(to_date('0505','MMDD'),'Ddspth Month') from
dual;
Fifth Mayo
The grammar involved in generating numbers for most languages is actually fairly simple. Most of the work involves gathering all the different grammar rules and building up enough rules to generate the correct grammar patterns. (I'll avoid the problems involved in trying to match number and gender for now.)
First, I'll create two tables: one to hold the base words and exceptions, and a second to hold some simple template patterns for generating text. If a number exists in the first table, then my linguistic function will return that text. For every other number, I'll try to match it against a series of patterns and apply a template to generate the correct text.
create table numwords
(
lang varchar2(2),
num integer,
word varchar2(30),
constraint numwords_pk primary key
(lang,num)
);
create table numrules
(
lang varchar2(2),
seq integer,
p1 integer,
p2 integer,
temp0 varchar2(30),
temp varchar2(30),
constraint numrules_pk primary key
(lang,seq)
);
Here is the code needed to generate the spelled-out version of a number. I'll stick to cardinal numbers (such as one, two, and three); however, these functions can be used to generate ordinals (first, second, third) and plural versions by listing more exceptions and patterns for each language:
REM -- create a table of base words and
exceptions
create or replace package genword
as
function get_word(n number) return
varchar2;
function cardinal(n number) return
varchar2;
end genword;
/
show errors;
create or replace package body genword
as
function get_word(n number) return
varchar2
is
l_word
numwords.word%type;
begin
select word into
l_word from numwords
where lang =
sys_context('userenv','lang') and num = n;
return
l_word;
exception
when no_data_found
then
return
null;
end;
--
function cardinal(n number) return
varchar2
is
p
number; -- power
t varchar2(30); --
template
v
number; -- lower
portion
l_word numwords.word%type;
begin
if n < 0
then
l_word
:= get_word(-1);
if
l_word is null then
return
null;
end
if;
return
l_word||' '||cardinal(-n);
end if;
l_word :=
get_word(n);
if l_word is not
null then
return
l_word;
end if;
for row in
(
select
* from numrules
where
lang = sys_context('userenv','lang')
order
by seq
)
loop
if
length(n) <= row.p1 + row.p2 then
p
:= power(10,row.p2);
v
:= mod(n,p);
if
row.seq = 0 then
if
n < 20 then
return
replace(row.temp0,'~2',cardinal(v));
end
if;
else
if
v = 0 then
return
replace(row.temp0,'~1',cardinal(n/p));
else
return
replace(replace(nvl(row.temp,'~1 ~2'),
'~1',cardinal(n-v)),
'~2',cardinal(v));
end
if;
end
if;
end
if;
end loop;
return 'NUMBER TOO
LARGE';
end cardinal;
end genword;
/
show errors;
Finally, here is some data I gathered for English and German. I also copy the data from American English to British English and use the terms "thousand million" and "million million" instead of "billion" and "trillion" (in the American sense), which are often a source of confusion outside the United States. This is enough data to generate spelled-out versions of whole integers between -999,999,999,999 and 999,999,999,999, including zero.
REM -- American English
insert into numwords values ('US',-1,'negative');
insert into numwords values ('US',0,'zero');
insert into numwords values ('US',1,'one');
insert into numwords values ('US',2,'two');
insert into numwords values ('US',3,'three');
insert into numwords values ('US',4,'four');
insert into numwords values ('US',5,'five');
insert into numwords values ('US',6,'six');
insert into numwords values ('US',7,'seven');
insert into numwords values ('US',8,'eight');
insert into numwords values ('US',9,'nine');
insert into numwords values ('US',10,'ten');
insert into numwords values ('US',11,'eleven');
insert into numwords values ('US',12,'twelve');
insert into numwords values ('US',13,'thirteen');
insert into numwords values ('US',15,'fifteen');
insert into numwords values ('US',18,'eighteen');
insert into numwords values ('US',20,'twenty');
insert into numwords values ('US',30,'thirty');
insert into numwords values ('US',40,'forty');
insert into numwords values ('US',50,'fifty');
insert into numwords values ('US',80,'eighty');
insert into numwords select 'GB',num,word from numwords where lang
= 'US';
insert into numrules values ('US',0,1,1,'~2teen',null);
insert into numrules values ('US',1,1,1,'~1ty','~1-~2');
insert into numrules values ('US',2,1,2,'~1 hundred',null);
insert into numrules values ('US',3,3,3,'~1 thousand',null);
insert into numrules values ('US',4,3,6,'~1 million',null);
insert into numrules select 'GB',seq,p1,p2,temp0,temp
from numrules where lang = 'US';
insert into numrules values ('US',5,3,9,'~1 billion',null);
insert into numrules values ('GB',5,3,9,'~1 thousand
million',null);
insert into numrules values ('US',6,3,12,'~1
trillion',null);
insert into numrules values ('GB',6,3,12,'~1 million
million',null);
REM - German
insert into numwords values ('D',-1,'negativ');
insert into numwords values ('D',0,'null');
insert into numwords values ('D',1,'eins');
insert into numwords values ('D',2,'zwei');
insert into numwords values ('D',3,'drei');
insert into numwords values ('D',4,'vier');
insert into numwords values ('D',5,unistr('f\00FCnf'));
insert into numwords values ('D',6,'sechs');
insert into numwords values ('D',7,'sieben');
insert into numwords values ('D',8,'acht');
insert into numwords values ('D',9,'neun');
insert into numwords values ('D',10,'zehn');
insert into numwords values ('D',11,'elf');
insert into numwords values ('D',12,unistr('zw\00F6lf'));
insert into numwords values ('D',13,'dreizehn');
insert into numwords values ('D',16,'sechzehn');
insert into numwords values ('D',17,'siebzehn');
insert into numwords values ('D',20,'zwanzig');
insert into numwords values ('D',21,'einundzwanzig');
insert into numwords values ('D',30,unistr('drei\00DFig'));
insert into numwords values
('D',31,unistr('einunddrei\00DFig'));
insert into numwords values ('D',41,'einundvierzig');
insert into numwords values
('D',51,unistr('einundf\00FCnfzig'));
insert into numwords values ('D',60,'sechzig');
insert into numwords values ('D',70,'siebzig');
insert into numwords values ('D',100,'hundert');
insert into numwords values ('D',1000,'tausend');
insert into numwords values ('D',1e6,'eine Million');
insert into numwords values ('D',1e9,'eine Milliarde');
insert into numwords values ('D',1e12,'eine Billion');
insert into numrules values ('D',0,1,1,'~2zehn',null);
insert into numrules values ('D',1,1,1,'~1zig','~2und~1');
insert into numrules values ('D',2,1,2,'~1hundert','~1~2');
insert into numrules values ('D',3,3,3,'~1tausend','~1 und
~2');
insert into numrules values ('D',4,3,6,'~1 Millionen',null);
insert into numrules values ('D',5,3,9,'~1
Milliarden',null);
insert into numrules values ('D',6,3,12,'~1 Billionen',null);
Here are some simple SQL statements that use the functions and the above data. You can try setting the language to 'GERMAN', or 'ENGLISH' to try out the other two sets of data:
SQL> alter session set nls_language =
'AMERICAN';
SQL> select genword.cardinal(123456789) from dual;
one hundred twenty-three million four hundred fifty-six thousand
seven hundred
eighty-nine
TechRepublic's Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!
SponsoredWhite Papers, Webcasts, and Downloads
- Live Webcast: Enterprise Search Architectures of the Future Google
- Microsoft SQL Server and Dell EqualLogic PS Series Solution Brief Dell EqualLogic
- New Release - Diskeeper 2008 with InvisiTasking: It's Smart. It's Transparent. It Will Take Your PC from Zero to Sixty--Automatically! Diskeeper
- Improving Backup Performance with Defragmentation Diskeeper
- Nextel Direct Connect Fact Sheet Sprint
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET


