gcc/libjava/scripts/encodings.pl

# encodings.pl - Download IANA text and compute alias list.
# Assumes you are running this program from gnu/gcj/convert/.
# Output suitable for direct inclusion in IOConverter.java.

# Map IANA canonical names onto our canonical names.
%map = (
	'ANSI_X3.4-1968' => 'ASCII',
	'ISO_8859-1:1987' => '8859_1',
	'UTF-8' => 'UTF8',
	'Shift_JIS' => 'SJIS',
	'Extended_UNIX_Code_Packed_Format_for_Japanese' => 'EUCJIS',
	'UTF16-LE' => 'UnicodeLittle',
	'UTF16-BE' => 'UnicodeBig' 
	);

if ($ARGV[0] eq '')
{
    $file = 'character-sets';
    if (! -f $file)
    {
	# Too painful to figure out how to get Perl to do it.
	system 'wget -o .wget-log http://www.iana.org/assignments/character-sets';
    }
}
else
{
    $file = $ARGV[0];
}

# Include canonical names in the output.
foreach $key (keys %map)
{
    $output{lc ($key)} = $map{$key};
}

open (INPUT, "< $file") || die "couldn't open $file: $!";

$body = 0;
$current = '';
while (<INPUT>)
{
    chop;
    $body = 1 if /^Name:/;
    next unless $body;

    if (/^$/)
    {
	$current = '';
	next;
    }

    ($type, $name) = split (/\s+/);
    # Encoding names are case-insensitive.  We do all processing on
    # the lower-case form.
    my $lower = lc ($name);
    if ($type eq 'Name:')
    {
	$current = $map{$name};
	if ($current)
	{
	    $output{$lower} = $current;
	}
    }
    elsif ($type eq 'Alias:')
    {
	# The IANA list has some ugliness.
	if ($name ne '' && $lower ne 'none' && $current)
	{
	    $output{$lower} = $current;
	}
    }
}

close (INPUT);

foreach $key (sort keys %output)
{
    print "    hash.put (\"$key\", \"$output{$key}\");\n";
}
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`# encodings.pl - Download IANA text and compute alias list.`
			`# Assumes you are running this program from gnu/gcj/convert/.`
			`# Output suitable for direct inclusion in IOConverter.java.`

			`# Map IANA canonical names onto our canonical names.`
			`%map = (`
encodings.pl: Added `ASCII' alias. * scripts/encodings.pl: Added `ASCII' alias. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added new files. * gnu/gcj/convert/Input_ASCII.java: New file. * gnu/gcj/convert/Output_ASCII.java: New file. * gnu/gcj/convert/Output_8859_1.java (write): Use `?' to represent out-of-range characters. * gnu/gcj/convert/natIconv.cc (iconv_init): New method. (read): Swap bytes if required. Treat `count' as character count, not byte count. (write): Likewise. Also, handle case where iconv fails on a given character. (init): Put encoding into exception. * gnu/gcj/convert/IOConverter.java (iconv_byte_swap): New global. (static): Call iconv_init. Rebuilt alias list. (iconv_init): New private method. From-SVN: r37190 2000-11-01 18:00:02 +01:00			`'ANSI_X3.4-1968' => 'ASCII',`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`'ISO_8859-1:1987' => '8859_1',`
			`'UTF-8' => 'UTF8',`
			`'Shift_JIS' => 'SJIS',`
PR libgcj/14358, libgcj/24552: * gnu/gcj/convert/IOConverter.java: Regenerate aliases. Add aliases for 'euc_jp' and 'eucjp'. * scripts/encodings.pl: Recognize 'none', not 'NONE'. Include canonical names in output. (%map): Added UnicodeLittle and UnicodeBig. From-SVN: r106490 2005-11-04 16:08:18 +01:00			`'Extended_UNIX_Code_Packed_Format_for_Japanese' => 'EUCJIS',`
			`'UTF16-LE' => 'UnicodeLittle',`
			`'UTF16-BE' => 'UnicodeBig'`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`);`

			`if ($ARGV[0] eq '')`
			`{`
			`$file = 'character-sets';`
			`if (! -f $file)`
			`{`
			`# Too painful to figure out how to get Perl to do it.`
encodings.pl: Generate lower-case names. * scripts/encodings.pl: Generate lower-case names. Updated URL for `character-sets' file. * gnu/gcj/convert/IOConverter.java (canonicalize): Convert name to lower case. Rebuilt list of aliases. From-SVN: r43566 2001-06-26 06:36:47 +02:00			`system 'wget -o .wget-log http://www.iana.org/assignments/character-sets';`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`}`
			`}`
			`else`
			`{`
			`$file = $ARGV[0];`
			`}`

PR libgcj/14358, libgcj/24552: * gnu/gcj/convert/IOConverter.java: Regenerate aliases. Add aliases for 'euc_jp' and 'eucjp'. * scripts/encodings.pl: Recognize 'none', not 'NONE'. Include canonical names in output. (%map): Added UnicodeLittle and UnicodeBig. From-SVN: r106490 2005-11-04 16:08:18 +01:00			`# Include canonical names in the output.`
			`foreach $key (keys %map)`
			`{`
			`$output{lc ($key)} = $map{$key};`
			`}`

encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`open (INPUT, "< $file") \|\| die "couldn't open $file: $!";`

			`$body = 0;`
			`$current = '';`
			`while (<INPUT>)`
			`{`
			`chop;`
			`$body = 1 if /^Name:/;`
			`next unless $body;`

			`if (/^$/)`
			`{`
			`$current = '';`
			`next;`
			`}`

			`($type, $name) = split (/\s+/);`
encodings.pl: Generate lower-case names. * scripts/encodings.pl: Generate lower-case names. Updated URL for `character-sets' file. * gnu/gcj/convert/IOConverter.java (canonicalize): Convert name to lower case. Rebuilt list of aliases. From-SVN: r43566 2001-06-26 06:36:47 +02:00			`# Encoding names are case-insensitive. We do all processing on`
			`# the lower-case form.`
			`my $lower = lc ($name);`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`if ($type eq 'Name:')`
			`{`
			`$current = $map{$name};`
			`if ($current)`
			`{`
PR libgcj/14358, libgcj/24552: * gnu/gcj/convert/IOConverter.java: Regenerate aliases. Add aliases for 'euc_jp' and 'eucjp'. * scripts/encodings.pl: Recognize 'none', not 'NONE'. Include canonical names in output. (%map): Added UnicodeLittle and UnicodeBig. From-SVN: r106490 2005-11-04 16:08:18 +01:00			`$output{$lower} = $current;`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`}`
			`}`
			`elsif ($type eq 'Alias:')`
			`{`
			`# The IANA list has some ugliness.`
PR libgcj/14358, libgcj/24552: * gnu/gcj/convert/IOConverter.java: Regenerate aliases. Add aliases for 'euc_jp' and 'eucjp'. * scripts/encodings.pl: Recognize 'none', not 'NONE'. Include canonical names in output. (%map): Added UnicodeLittle and UnicodeBig. From-SVN: r106490 2005-11-04 16:08:18 +01:00			`if ($name ne '' && $lower ne 'none' && $current)`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`{`
PR libgcj/14358, libgcj/24552: * gnu/gcj/convert/IOConverter.java: Regenerate aliases. Add aliases for 'euc_jp' and 'eucjp'. * scripts/encodings.pl: Recognize 'none', not 'NONE'. Include canonical names in output. (%map): Added UnicodeLittle and UnicodeBig. From-SVN: r106490 2005-11-04 16:08:18 +01:00			`$output{$lower} = $current;`
encodings.pl: New file. * scripts/encodings.pl: New file. * Makefile.in: Rebuilt. * Makefile.am (convert_source_files): Added IOConverter.java. * gnu/gcj/convert/UnicodeToBytes.java (UnicodeToBytes): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getEncoder): Likewise. * gnu/gcj/convert/BytesToUnicode.java (BytesToUnicode): Extend IOConverter. (getDefaultDecodingClass): Canonicalize default encoding name. (getDecoder): Likewise. * gnu/gcj/convert/IOConverter.java: New file. From-SVN: r35432 2000-08-02 21:56:53 +02:00			`}`
			`}`
			`}`

			`close (INPUT);`
PR libgcj/14358, libgcj/24552: * gnu/gcj/convert/IOConverter.java: Regenerate aliases. Add aliases for 'euc_jp' and 'eucjp'. * scripts/encodings.pl: Recognize 'none', not 'NONE'. Include canonical names in output. (%map): Added UnicodeLittle and UnicodeBig. From-SVN: r106490 2005-11-04 16:08:18 +01:00
			`foreach $key (sort keys %output)`
			`{`
			`print " hash.put (\"$key\", \"$output{$key}\");\n";`
			`}`