mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
LangModels: add Polish support.
With the following encodings: ISO-8859-2, ISO-8859-13, ISO-8859-16, Windows-1250, IBM852, MAC-CENTRALEUROPE. Test text from https://pl.wikipedia.org/wiki/Zofia_Holszańska
This commit is contained in:
parent
f314b76c0a
commit
3401ac70d0
154
script/BuildLangModelLogs/LangPolishModel.log
Normal file
154
script/BuildLangModelLogs/LangPolishModel.log
Normal file
@ -0,0 +1,154 @@
|
|||||||
|
= Logs of language model for Polish (pl) =
|
||||||
|
|
||||||
|
- Generated by BuildLangModel.py
|
||||||
|
- Started: 2016-09-21 17:06:43.735784
|
||||||
|
- Maximum depth: 5
|
||||||
|
- Max number of pages: 100
|
||||||
|
|
||||||
|
== Parsed pages ==
|
||||||
|
|
||||||
|
Krasnyj Krym (revision 46884814)
|
||||||
|
1913 (revision 46708474)
|
||||||
|
1915 (revision 46743905)
|
||||||
|
1917 (revision 46559521)
|
||||||
|
1925 (revision 46809935)
|
||||||
|
1928 (revision 46875978)
|
||||||
|
1929 (revision 46760445)
|
||||||
|
1935 (revision 46487358)
|
||||||
|
1936 (revision 46874348)
|
||||||
|
1939 (revision 46789269)
|
||||||
|
1941 (revision 46856112)
|
||||||
|
1942 (revision 46851808)
|
||||||
|
1943 (revision 46768330)
|
||||||
|
1944 (revision 46866229)
|
||||||
|
1949 (revision 46882598)
|
||||||
|
1953 (revision 46437607)
|
||||||
|
1957 (revision 46591716)
|
||||||
|
1959 (revision 46255886)
|
||||||
|
Admirał Butakow (revision 45993412)
|
||||||
|
Admirał Spiridow (revision 45993412)
|
||||||
|
Aparat torpedowy (revision 46633263)
|
||||||
|
Askold (revision 45787848)
|
||||||
|
Avro 504 (revision 44668646)
|
||||||
|
Ałmaz (1903) (revision 46472283)
|
||||||
|
Batumi (revision 46594611)
|
||||||
|
Bomba głębinowa (revision 46011227)
|
||||||
|
Brest (revision 45771242)
|
||||||
|
Burta (revision 45569092)
|
||||||
|
Cagliari (revision 46235605)
|
||||||
|
Cesariewicz (revision 40031486)
|
||||||
|
Czerwona Ukraina (revision 45993524)
|
||||||
|
Daty nowego i starego porządku (revision 45622575)
|
||||||
|
Drednot (revision 45789788)
|
||||||
|
Działo przeciwlotnicze (revision 45160162)
|
||||||
|
Flota Bałtycka Marynarki Wojennej Rosji (revision 45700667)
|
||||||
|
Gromoboj (revision 44328986)
|
||||||
|
Hulk (okręt) (revision 46020688)
|
||||||
|
II wojna światowa (revision 46871591)
|
||||||
|
I wojna światowa (revision 46869119)
|
||||||
|
Imperator Nikołaj I (okręt lotniczy) (revision 45520638)
|
||||||
|
Imperium Rosyjskie (revision 46604959)
|
||||||
|
Impierator Nikołaj I (1916) (revision 46534166)
|
||||||
|
Język rosyjski (revision 46433952)
|
||||||
|
Kanonierka (revision 41091952)
|
||||||
|
Kanonierki typu Ardagan (revision 46534166)
|
||||||
|
Kanonierki typu Bobr (revision 45788694)
|
||||||
|
Kanonierki typu Chiwiniec (revision 46534166)
|
||||||
|
Kanonierki typu Groziaszczij (revision 46534166)
|
||||||
|
Kanonierki typu Mandżur (revision 46534166)
|
||||||
|
Karabin maszynowy DSzK (revision 45587452)
|
||||||
|
Karabin maszynowy Vickers 12,7 mm (revision 44572918)
|
||||||
|
Kocioł parowy (revision 46716473)
|
||||||
|
Konstrukcyjna linia wodna (revision 37082620)
|
||||||
|
Kontrtorpedowce typu Biesstrasznyj (revision 46534166)
|
||||||
|
Kontrtorpedowce typu Brawyj (revision 46534166)
|
||||||
|
Kontrtorpedowce typu Grozowoj (revision 46534166)
|
||||||
|
Kontrtorpedowce typu Prytkij (revision 46534166)
|
||||||
|
Koń mechaniczny (revision 44722357)
|
||||||
|
Krab (1915) (revision 42791389)
|
||||||
|
Kronsztad (revision 46425497)
|
||||||
|
Krążownik lekki (revision 40661490)
|
||||||
|
Krążownik liniowy (revision 40601776)
|
||||||
|
Krążownik pancernopokładowy (revision 40055901)
|
||||||
|
Krążownik pancerny (revision 40324458)
|
||||||
|
Krążowniki lekkie typu Swietłana (revision 45993412)
|
||||||
|
Krążowniki liniowe typu Borodino (revision 45990866)
|
||||||
|
Krążowniki typu Admirał Nachimow (revision 45993521)
|
||||||
|
Krążowniki typu Bajan (revision 45991279)
|
||||||
|
Krążowniki typu Diana (revision 45991349)
|
||||||
|
Krążowniki typu Izumrud (revision 45991349)
|
||||||
|
Lend-Lease Act (revision 46877263)
|
||||||
|
Marynarka Wojenna Związku Socjalistycznych Republik Radzieckich (revision 45795993)
|
||||||
|
Maszyna sterowa (revision 28497888)
|
||||||
|
Mecidiye (1903) (revision 43956539)
|
||||||
|
Mila morska (revision 45754209)
|
||||||
|
Mina morska (revision 45781427)
|
||||||
|
Morze Czarne (revision 46729213)
|
||||||
|
Nadbudówka (revision 45292731)
|
||||||
|
Neapol (revision 46823083)
|
||||||
|
Niszczyciel (revision 45799132)
|
||||||
|
Niszczyciele rakietowe projektu 61 (revision 46498775)
|
||||||
|
Niszczyciele typu Finn (revision 46620140)
|
||||||
|
Niszczyciele typu Lejtienant Szestakow (revision 46620140)
|
||||||
|
Niszczyciele typu Ochotnik (revision 46620140)
|
||||||
|
Niszczyciele typu Ukraina (revision 46620140)
|
||||||
|
Noworosyjsk (revision 44721836)
|
||||||
|
Odessa (revision 45629804)
|
||||||
|
Oerlikon 20 mm (revision 45493862)
|
||||||
|
Okres międzywojenny (revision 46668249)
|
||||||
|
Okręt-baza wodnosamolotów (revision 45115462)
|
||||||
|
|
||||||
|
== End of Parsed pages ==
|
||||||
|
|
||||||
|
- Wikipedia parsing ended at: 2016-09-21 17:21:04.404471
|
||||||
|
|
||||||
|
78 characters appeared 1159291 times.
|
||||||
|
|
||||||
|
First 37 characters:
|
||||||
|
[ 0] Char a: 9.685575062689178 %
|
||||||
|
[ 1] Char i: 8.815819324052374 %
|
||||||
|
[ 2] Char o: 7.920185699707839 %
|
||||||
|
[ 3] Char e: 6.871613770830621 %
|
||||||
|
[ 4] Char r: 5.8672067668945935 %
|
||||||
|
[ 5] Char n: 5.763608964444647 %
|
||||||
|
[ 6] Char s: 4.736688199942896 %
|
||||||
|
[ 7] Char k: 4.722196583946568 %
|
||||||
|
[ 8] Char z: 4.519227700378939 %
|
||||||
|
[ 9] Char w: 4.279512219106333 %
|
||||||
|
[10] Char t: 4.0191806888865695 %
|
||||||
|
[11] Char c: 3.6891513864939864 %
|
||||||
|
[12] Char y: 3.565282573572986 %
|
||||||
|
[13] Char p: 3.0190004062828053 %
|
||||||
|
[14] Char d: 2.851052928039638 %
|
||||||
|
[15] Char l: 2.7930002044352973 %
|
||||||
|
[16] Char m: 2.7530620008263673 %
|
||||||
|
[17] Char u: 2.348504387595522 %
|
||||||
|
[18] Char j: 1.881236031332944 %
|
||||||
|
[19] Char ł: 1.6885320424293815 %
|
||||||
|
[20] Char b: 1.394559260789569 %
|
||||||
|
[21] Char g: 1.3928340684090534 %
|
||||||
|
[22] Char h: 1.163901039514669 %
|
||||||
|
[23] Char ę: 0.8066136975099435 %
|
||||||
|
[24] Char ó: 0.5971753425153823 %
|
||||||
|
[25] Char ą: 0.563275312238256 %
|
||||||
|
[26] Char f: 0.5245447432956868 %
|
||||||
|
[27] Char ż: 0.4545019326467643 %
|
||||||
|
[28] Char ś: 0.39567287247119143 %
|
||||||
|
[29] Char ń: 0.3857530162832283 %
|
||||||
|
[30] Char ć: 0.1397405828217419 %
|
||||||
|
[31] Char v: 0.12455888987320698 %
|
||||||
|
[32] Char ź: 0.10204512930748191 %
|
||||||
|
[33] Char x: 0.05468859846233603 %
|
||||||
|
[34] Char é: 0.020961087423261287 %
|
||||||
|
[35] Char á: 0.01707940456710179 %
|
||||||
|
[36] Char q: 0.011386269711401192 %
|
||||||
|
|
||||||
|
The first 37 characters have an accumulated ratio of 0.9993892818972973.
|
||||||
|
|
||||||
|
1321 sequences found.
|
||||||
|
|
||||||
|
First 512 (typical positive ratio): 0.9894531815946438
|
||||||
|
Next 512 (512-1024): 1.7251923805153322e-06
|
||||||
|
Rest: 0.0003530230403650733
|
||||||
|
|
||||||
|
- Processing end: 2016-09-21 17:21:04.878014
|
||||||
83
script/charsets/iso-8859-16.py
Normal file
83
script/charsets/iso-8859-16.py
Normal file
@ -0,0 +1,83 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
# ##### BEGIN LICENSE BLOCK #####
|
||||||
|
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
#
|
||||||
|
# The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
# the License. You may obtain a copy of the License at
|
||||||
|
# http://www.mozilla.org/MPL/
|
||||||
|
#
|
||||||
|
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
# for the specific language governing rights and limitations under the
|
||||||
|
# License.
|
||||||
|
#
|
||||||
|
# The Original Code is Mozilla Universal charset detector code.
|
||||||
|
#
|
||||||
|
# The Initial Developer of the Original Code is
|
||||||
|
# Netscape Communications Corporation.
|
||||||
|
# Portions created by the Initial Developer are Copyright (C) 2001
|
||||||
|
# the Initial Developer. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Contributor(s):
|
||||||
|
# Jehan <jehan@girinstud.io>
|
||||||
|
#
|
||||||
|
# Alternatively, the contents of this file may be used under the terms of
|
||||||
|
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
# of those above. If you wish to allow use of your version of this file only
|
||||||
|
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
# use your version of this file under the terms of the MPL, indicate your
|
||||||
|
# decision by deleting the provisions above and replace them with the notice
|
||||||
|
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
# the provisions above, a recipient may use your version of this file under
|
||||||
|
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
#
|
||||||
|
# ##### END LICENSE BLOCK #####
|
||||||
|
|
||||||
|
from codepoints import *
|
||||||
|
|
||||||
|
# ISO-8859-1 is the full 8-bit range, IANA-defined, superset of ISO/CEI 8859-1.
|
||||||
|
# It is basically the same as ISO/CEI 8859-1, but with control characters.
|
||||||
|
# As far as I can see, `iconv` has no support for the ISO/CEI 8859-1 subset,
|
||||||
|
# so there is no need for us to support it anyway.
|
||||||
|
|
||||||
|
name = 'ISO-8859-16'
|
||||||
|
aliases = ['ISO_8859-16:2001', 'ISO_8859-16', 'iso-ir-226',
|
||||||
|
'csISO885916', 'latin10', 'l10']
|
||||||
|
|
||||||
|
language = \
|
||||||
|
{
|
||||||
|
# Languages with complete coverage.
|
||||||
|
# Some languages actually have several alphabets and only one of them is
|
||||||
|
# compatible with ISO-8859-1 (ex: Kurdish).
|
||||||
|
# Some don't have a ISO language code (like Leonese, for which I used
|
||||||
|
# a Glottolog code).
|
||||||
|
'complete': [ 'sq', 'hr', 'hu', 'pl', 'ro', 'sr', 'sl',
|
||||||
|
'fr', 'de', 'it', 'ga' ],
|
||||||
|
'incomplete': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF #
|
||||||
|
charmap = \
|
||||||
|
[
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, # 0X
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, # 1X
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, # 2X
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, # 3X
|
||||||
|
SYM,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET, # 4X
|
||||||
|
LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,SYM,SYM,SYM,SYM,SYM, # 5X
|
||||||
|
SYM,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET, # 6X
|
||||||
|
LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,SYM,SYM,SYM,SYM,CTR, # 7X
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, # 8X
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, # 9X
|
||||||
|
SYM,LET,LET,LET,SYM,SYM,LET,SYM,LET,SYM,LET,SYM,LET,SYM,LET,LET, # AX
|
||||||
|
SYM,SYM,LET,LET,LET,SYM,SYM,SYM,LET,LET,LET,SYM,LET,LET,LET,LET, # BX
|
||||||
|
LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET, # CX
|
||||||
|
LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET, # DX
|
||||||
|
LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET, # EX
|
||||||
|
LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET,LET, # FX
|
||||||
|
]
|
||||||
81
script/langs/pl.py
Normal file
81
script/langs/pl.py
Normal file
@ -0,0 +1,81 @@
|
|||||||
|
#!/bin/python3
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
# ##### BEGIN LICENSE BLOCK #####
|
||||||
|
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
#
|
||||||
|
# The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
# the License. You may obtain a copy of the License at
|
||||||
|
# http://www.mozilla.org/MPL/
|
||||||
|
#
|
||||||
|
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
# for the specific language governing rights and limitations under the
|
||||||
|
# License.
|
||||||
|
#
|
||||||
|
# The Original Code is Mozilla Universal charset detector code.
|
||||||
|
#
|
||||||
|
# The Initial Developer of the Original Code is
|
||||||
|
# Netscape Communications Corporation.
|
||||||
|
# Portions created by the Initial Developer are Copyright (C) 2001
|
||||||
|
# the Initial Developer. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Contributor(s):
|
||||||
|
# Jehan <jehan@girinstud.io>
|
||||||
|
#
|
||||||
|
# Alternatively, the contents of this file may be used under the terms of
|
||||||
|
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
# of those above. If you wish to allow use of your version of this file only
|
||||||
|
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
# use your version of this file under the terms of the MPL, indicate your
|
||||||
|
# decision by deleting the provisions above and replace them with the notice
|
||||||
|
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
# the provisions above, a recipient may use your version of this file under
|
||||||
|
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
#
|
||||||
|
# ##### END LICENSE BLOCK #####
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
## Mandatory Properties ##
|
||||||
|
|
||||||
|
# The human name for the language, in English.
|
||||||
|
name = 'Polish'
|
||||||
|
# Use 2-letter ISO 639-1 if possible, 3-letter ISO code otherwise,
|
||||||
|
# or use another catalog as a last resort.
|
||||||
|
code = 'pl'
|
||||||
|
# ASCII characters are also used in French.
|
||||||
|
use_ascii = True
|
||||||
|
# The charsets we want to support and create data for.
|
||||||
|
charsets = ['ISO-8859-2', 'ISO-8859-13', 'ISO-8859-16',
|
||||||
|
'Windows-1250', 'IBM852', 'MAC-CENTRALEUROPE']
|
||||||
|
|
||||||
|
## Optional Properties ##
|
||||||
|
|
||||||
|
# Alphabet characters.
|
||||||
|
# If use_ascii=True, there is no need to add any ASCII characters.
|
||||||
|
# If case_mapping=True, there is no need to add several cases of a same
|
||||||
|
# character (provided Python algorithms know the right cases).
|
||||||
|
alphabet = 'ąćęłńóśźż'
|
||||||
|
# The starred page which was rewarded on the main page when I created
|
||||||
|
# the data.
|
||||||
|
start_pages = ['Krasnyj Krym']
|
||||||
|
# give possibility to select another code for the Wikipedia URL.
|
||||||
|
wikipedia_code = code
|
||||||
|
# 'a' and 'A' will be considered the same character, and so on.
|
||||||
|
# This uses Python algorithm to determine upper/lower-case of a given
|
||||||
|
# character.
|
||||||
|
case_mapping = True
|
||||||
|
|
||||||
|
# A function to clean content returned by the `wikipedia` python lib,
|
||||||
|
# in case some unwanted data has been overlooked.
|
||||||
|
# Note that we are already cleaning away the '=' from the title syntax
|
||||||
|
# of Wikipedia, as well as double spaces. But sometimes, Wikipedia in
|
||||||
|
# some language may return weird syntax or UI text which should be
|
||||||
|
# discarded. If you encounter one of these cases, use this function.
|
||||||
|
def clean_wikipedia_content(content):
|
||||||
|
# Do your garbage text cleaning here.
|
||||||
|
return content
|
||||||
@ -20,6 +20,7 @@ set(
|
|||||||
LangModels/LangLithuanianModel.cpp
|
LangModels/LangLithuanianModel.cpp
|
||||||
LangModels/LangLatvianModel.cpp
|
LangModels/LangLatvianModel.cpp
|
||||||
LangModels/LangMalteseModel.cpp
|
LangModels/LangMalteseModel.cpp
|
||||||
|
LangModels/LangPolishModel.cpp
|
||||||
LangModels/LangPortugueseModel.cpp
|
LangModels/LangPortugueseModel.cpp
|
||||||
LangModels/LangRussianModel.cpp
|
LangModels/LangRussianModel.cpp
|
||||||
LangModels/LangSlovakModel.cpp
|
LangModels/LangSlovakModel.cpp
|
||||||
|
|||||||
298
src/LangModels/LangPolishModel.cpp
Normal file
298
src/LangModels/LangPolishModel.cpp
Normal file
@ -0,0 +1,298 @@
|
|||||||
|
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
|
||||||
|
/* ***** BEGIN LICENSE BLOCK *****
|
||||||
|
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
*
|
||||||
|
* The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
* the License. You may obtain a copy of the License at
|
||||||
|
* http://www.mozilla.org/MPL/
|
||||||
|
*
|
||||||
|
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
* for the specific language governing rights and limitations under the
|
||||||
|
* License.
|
||||||
|
*
|
||||||
|
* The Original Code is Mozilla Communicator client code.
|
||||||
|
*
|
||||||
|
* The Initial Developer of the Original Code is
|
||||||
|
* Netscape Communications Corporation.
|
||||||
|
* Portions created by the Initial Developer are Copyright (C) 1998
|
||||||
|
* the Initial Developer. All Rights Reserved.
|
||||||
|
*
|
||||||
|
* Contributor(s):
|
||||||
|
*
|
||||||
|
* Alternatively, the contents of this file may be used under the terms of
|
||||||
|
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
* of those above. If you wish to allow use of your version of this file only
|
||||||
|
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
* use your version of this file under the terms of the MPL, indicate your
|
||||||
|
* decision by deleting the provisions above and replace them with the notice
|
||||||
|
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
* the provisions above, a recipient may use your version of this file under
|
||||||
|
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
*
|
||||||
|
* ***** END LICENSE BLOCK ***** */
|
||||||
|
|
||||||
|
#include "../nsSBCharSetProber.h"
|
||||||
|
|
||||||
|
/********* Language model for: Polish *********/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generated by BuildLangModel.py
|
||||||
|
* On: 2016-09-21 17:21:04.405363
|
||||||
|
**/
|
||||||
|
|
||||||
|
/* Character Mapping Table:
|
||||||
|
* ILL: illegal character.
|
||||||
|
* CTR: control character specific to the charset.
|
||||||
|
* RET: carriage/return.
|
||||||
|
* SYM: symbol (punctuation) that does not belong to word.
|
||||||
|
* NUM: 0 - 9.
|
||||||
|
*
|
||||||
|
* Other characters are ordered by probabilities
|
||||||
|
* (0 is the most common character in the language).
|
||||||
|
*
|
||||||
|
* Orders are generic to a language. So the codepoint with order X in
|
||||||
|
* CHARSET1 maps to the same character as the codepoint with the same
|
||||||
|
* order X in CHARSET2 for the same language.
|
||||||
|
* As such, it is possible to get missing order. For instance the
|
||||||
|
* ligature of 'o' and 'e' exists in ISO-8859-15 but not in ISO-8859-1
|
||||||
|
* even though they are both used for French. Same for the euro sign.
|
||||||
|
*/
|
||||||
|
static const unsigned char Ibm852_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 4X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 6X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
47, 39, 34, 54, 40, 78, 30, 47, 19, 58, 49, 49, 77, 32, 40, 30, /* 8X */
|
||||||
|
34, 79, 80, 55, 38, 74, 74, 28, 28, 38, 39, 76, 76, 19,SYM, 44, /* 9X */
|
||||||
|
35, 37, 24, 51, 25, 25, 45, 45, 23, 23,SYM, 32, 44, 56,SYM,SYM, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM, 35, 54, 46, 56,SYM,SYM,SYM,SYM, 27, 27,SYM, /* BX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM, 53, 53,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* CX */
|
||||||
|
70, 70, 69, 58, 69, 81, 37, 77, 46,SYM,SYM,SYM,SYM, 65, 82,SYM, /* DX */
|
||||||
|
24, 57, 55, 29, 29, 83, 41, 41, 84, 51, 85, 86, 60, 60, 65,SYM, /* EX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, 87, 50, 50,SYM,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_16_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 4X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 6X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM, 25, 25, 19,SYM,SYM, 41,SYM, 41,SYM, 62,SYM, 32,SYM, 32, 27, /* AX */
|
||||||
|
SYM,SYM, 44, 19, 45,SYM,SYM,SYM, 45, 44, 62,SYM, 75, 75, 88, 27, /* BX */
|
||||||
|
61, 35, 54, 53, 40, 30, 89, 47, 43, 34, 64, 58, 90, 37, 77, 91, /* CX */
|
||||||
|
70, 29, 66, 24, 55, 49, 38, 28, 92, 68, 51, 93, 39, 23, 72, 57, /* DX */
|
||||||
|
61, 35, 54, 53, 40, 30, 94, 47, 43, 34, 64, 58, 95, 37, 77, 96, /* EX */
|
||||||
|
70, 29, 66, 24, 55, 49, 38, 28, 97, 68, 51, 98, 39, 23, 72, 99, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_2_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 4X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 6X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM, 25,SYM, 19,SYM, 74, 28,SYM,SYM, 41, 56, 76, 32,SYM, 45, 27, /* AX */
|
||||||
|
SYM, 25,SYM, 19,SYM, 74, 28,SYM,SYM, 41, 56, 76, 32,SYM, 45, 27, /* BX */
|
||||||
|
100, 35, 54, 53, 40,101, 30, 47, 44, 34, 23, 58, 46, 37, 77, 69, /* CX */
|
||||||
|
70, 29,102, 24, 55, 49, 38,SYM, 50,103, 51,104, 39, 60, 65, 57, /* DX */
|
||||||
|
105, 35, 54, 53, 40,106, 30, 47, 44, 34, 23, 58, 46, 37, 77, 69, /* EX */
|
||||||
|
70, 29,107, 24, 55, 49, 38,SYM, 50,108, 51,109, 39, 60, 65,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Mac_Centraleurope_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 4X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 6X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
40, 63, 63, 34, 25, 38, 39, 35, 25, 44, 40, 44, 30, 30, 34, 32, /* 8X */
|
||||||
|
32, 69, 37, 69,110,111, 71, 24, 71, 55, 38, 67, 51, 46, 46, 39, /* 9X */
|
||||||
|
SYM,SYM, 23,SYM,SYM,SYM,SYM, 57,SYM,SYM,SYM, 23,SYM,SYM,112,113, /* AX */
|
||||||
|
114, 73,SYM,SYM, 73,115,SYM,SYM, 19,116,117, 74, 74,118,119,120, /* BX */
|
||||||
|
121, 29,SYM,SYM, 29,122,SYM,SYM,SYM,SYM,SYM,123, 49, 67, 49, 42, /* CX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, 42,124,125, 50,SYM,SYM, 50,126, /* DX */
|
||||||
|
127, 41,SYM,SYM, 41, 28, 28, 35, 76, 76, 37, 45, 45, 59, 24, 55, /* EX */
|
||||||
|
59,128, 51,129,130,131,132,133, 60, 60,134, 27, 19, 27,135,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_13_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 4X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 6X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, 48,SYM,136,SYM,SYM,SYM,SYM,137, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, 48,SYM,138,SYM,SYM,SYM,SYM,139, /* BX */
|
||||||
|
25,140, 63, 30, 40, 52, 23,141, 44, 34, 32, 71,142,143, 73,144, /* CX */
|
||||||
|
41, 29,145, 24, 42, 67, 38,SYM,146, 19, 28, 59, 39, 27, 45, 57, /* DX */
|
||||||
|
25,147, 63, 30, 40, 52, 23,148, 44, 34, 32, 71,149,150, 73,151, /* EX */
|
||||||
|
41, 29,152, 24, 42, 67, 38,SYM,153, 19, 28, 59, 39, 27, 45,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Windows_1250_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 4X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 20, 11, 14, 3, 26, 21, 22, 1, 18, 7, 15, 16, 5, 2, /* 6X */
|
||||||
|
13, 36, 4, 6, 10, 17, 31, 9, 33, 12, 8,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
SYM,ILL,SYM,ILL,SYM,SYM,SYM,SYM,ILL,SYM, 41,SYM, 28, 76, 45, 32, /* 8X */
|
||||||
|
ILL,SYM,SYM,SYM,SYM,SYM,SYM,SYM,ILL,SYM, 41,SYM, 28, 76, 45, 32, /* 9X */
|
||||||
|
SYM,SYM,SYM, 19,SYM, 25,SYM,SYM,SYM,SYM, 56,SYM,SYM,SYM,SYM, 27, /* AX */
|
||||||
|
SYM,SYM,SYM, 19,SYM,SYM,SYM,SYM,SYM, 25, 56,SYM, 74,SYM, 74, 27, /* BX */
|
||||||
|
154, 35, 54, 53, 40,155, 30, 47, 44, 34, 23, 58, 46, 37, 77, 69, /* CX */
|
||||||
|
70, 29,156, 24, 55, 49, 38,SYM, 50,157, 51,158, 39, 60, 65, 57, /* DX */
|
||||||
|
159, 35, 54, 53, 40,160, 30, 47, 44, 34, 23, 58, 46, 37, 77, 69, /* EX */
|
||||||
|
70, 29,161, 24, 55, 49, 38,SYM, 50,162, 51,163, 39, 60, 65,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
|
||||||
|
/* Model Table:
|
||||||
|
* Total sequences: 1321
|
||||||
|
* First 512 sequences: 0.9894531815946438
|
||||||
|
* Next 512 sequences (512-1024): 0.010193795364991133
|
||||||
|
* Rest: 0.0003530230403650733
|
||||||
|
* Negative sequences: TODO
|
||||||
|
*/
|
||||||
|
static const PRUint8 PolishLangModel[] =
|
||||||
|
{
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,2,0,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,2,3,3,3,2,3,2,3,3,3,2,2,2,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,0,0,0,3,3,3,3,2,3,2,2,0,0,1,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,0,2,2,3,3,3,3,2,3,2,2,0,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,2,3,3,2,3,3,3,2,2,2,0,2,2,0,1,2,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,2,3,3,3,2,3,2,2,3,2,1,2,3,2,3,3,3,3,2,0,0,0,2,0,2,2,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,2,3,2,3,3,2,0,0,0,0,2,0,0,2,2,2,
|
||||||
|
3,3,3,3,3,2,3,3,1,3,3,3,2,2,2,3,3,3,2,3,2,2,2,3,3,3,2,3,2,0,0,2,0,1,2,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,2,3,3,3,3,3,3,2,3,2,3,1,2,0,0,0,2,0,0,2,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,2,2,3,3,3,2,3,2,3,0,3,2,2,2,3,3,3,1,0,2,0,0,0,0,0,1,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,2,3,2,2,3,2,3,2,3,2,2,3,3,3,3,2,1,0,0,0,2,0,0,2,2,0,
|
||||||
|
3,3,3,3,2,3,2,3,3,2,3,2,3,1,2,3,2,3,3,2,1,2,3,2,3,3,0,0,0,0,0,2,0,0,2,0,2,
|
||||||
|
3,2,2,2,3,3,3,3,3,3,3,3,0,3,3,3,3,2,3,3,3,3,2,0,0,0,3,3,3,3,3,2,2,0,0,0,0,
|
||||||
|
3,3,3,3,3,3,3,2,2,2,2,3,3,3,2,3,2,3,1,3,2,2,3,2,3,2,2,0,0,0,0,2,0,0,2,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,2,2,2,3,3,2,2,1,0,0,2,2,0,2,2,1,
|
||||||
|
3,3,3,3,2,3,3,3,2,3,3,3,2,2,3,3,3,3,2,0,3,3,2,3,2,3,3,2,0,0,0,2,0,1,2,2,0,
|
||||||
|
3,3,3,3,2,3,3,2,1,2,2,3,3,3,2,2,3,3,2,2,3,2,1,3,3,2,2,1,1,0,0,1,0,0,2,2,0,
|
||||||
|
3,3,2,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,3,3,3,2,0,1,0,3,3,2,3,2,2,2,2,2,2,2,
|
||||||
|
3,3,3,3,2,3,3,2,2,3,2,3,0,2,3,2,3,3,2,2,2,2,1,3,3,3,1,1,3,1,0,1,0,0,0,2,0,
|
||||||
|
3,0,3,3,1,3,2,3,2,2,3,2,3,2,2,0,2,3,0,2,2,3,1,3,3,3,0,2,0,0,0,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,2,2,1,2,3,2,2,3,2,3,2,3,2,1,2,3,2,2,2,0,0,0,0,0,0,0,2,2,0,
|
||||||
|
3,3,3,3,3,3,2,2,2,3,2,1,2,2,3,3,2,3,2,3,2,2,3,2,3,2,2,1,0,0,0,2,0,0,2,2,2,
|
||||||
|
3,3,3,3,3,3,2,2,1,2,3,2,3,2,2,3,3,3,2,2,2,1,1,2,2,2,2,0,2,0,0,2,0,0,2,2,0,
|
||||||
|
0,0,0,0,0,0,3,3,3,1,3,3,0,3,3,2,0,0,0,3,3,3,0,0,0,0,0,3,3,0,2,0,2,0,0,0,0,
|
||||||
|
0,0,0,0,3,2,2,2,3,3,2,3,1,2,3,3,2,0,3,3,3,2,0,0,0,0,2,3,1,0,0,1,3,0,0,0,0,
|
||||||
|
0,0,0,0,0,0,2,2,3,2,3,3,0,3,3,0,0,0,0,3,2,3,0,0,0,0,0,3,0,0,2,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,2,2,2,1,2,2,2,2,1,1,3,2,3,2,2,2,2,2,2,2,2,3,0,0,0,0,1,0,0,2,1,1,
|
||||||
|
3,2,3,3,0,3,2,2,0,2,0,2,3,0,2,3,2,3,0,0,3,1,0,2,2,3,0,0,0,0,0,0,0,0,0,0,0,
|
||||||
|
0,0,0,0,3,3,0,2,0,3,0,3,0,2,0,3,3,0,0,0,0,0,0,0,0,0,0,0,0,1,3,0,0,0,0,0,0,
|
||||||
|
1,0,0,0,0,0,3,2,0,0,0,3,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||||
|
1,0,0,1,0,0,0,2,0,2,0,0,0,0,2,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,2,2,2,2,1,0,0,0,2,2,1,2,0,2,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,2,2,0,
|
||||||
|
0,0,0,0,2,3,2,0,0,2,0,2,0,0,3,2,2,0,0,0,2,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,
|
||||||
|
2,3,2,2,0,1,0,1,0,1,1,0,1,2,1,2,1,2,1,0,2,0,2,0,0,0,2,0,0,0,0,2,0,2,0,0,0,
|
||||||
|
2,1,2,2,2,2,2,0,1,0,2,2,1,1,2,2,2,1,1,0,1,2,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,
|
||||||
|
0,1,0,1,2,2,2,2,2,0,2,2,0,2,1,2,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,
|
||||||
|
1,2,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
const SequenceModel Ibm852PolishModel =
|
||||||
|
{
|
||||||
|
Ibm852_CharToOrderMap,
|
||||||
|
PolishLangModel,
|
||||||
|
37,
|
||||||
|
(float)0.9894531815946438,
|
||||||
|
PR_TRUE,
|
||||||
|
"IBM852"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_16PolishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_16_CharToOrderMap,
|
||||||
|
PolishLangModel,
|
||||||
|
37,
|
||||||
|
(float)0.9894531815946438,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-16"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_2PolishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_2_CharToOrderMap,
|
||||||
|
PolishLangModel,
|
||||||
|
37,
|
||||||
|
(float)0.9894531815946438,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-2"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Mac_CentraleuropePolishModel =
|
||||||
|
{
|
||||||
|
Mac_Centraleurope_CharToOrderMap,
|
||||||
|
PolishLangModel,
|
||||||
|
37,
|
||||||
|
(float)0.9894531815946438,
|
||||||
|
PR_TRUE,
|
||||||
|
"MAC-CENTRALEUROPE"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_13PolishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_13_CharToOrderMap,
|
||||||
|
PolishLangModel,
|
||||||
|
37,
|
||||||
|
(float)0.9894531815946438,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-13"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Windows_1250PolishModel =
|
||||||
|
{
|
||||||
|
Windows_1250_CharToOrderMap,
|
||||||
|
PolishLangModel,
|
||||||
|
37,
|
||||||
|
(float)0.9894531815946438,
|
||||||
|
PR_TRUE,
|
||||||
|
"WINDOWS-1250"
|
||||||
|
};
|
||||||
@ -136,6 +136,13 @@ nsSBCSGroupProber::nsSBCSGroupProber()
|
|||||||
mProbers[52] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSlovakModel);
|
mProbers[52] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSlovakModel);
|
||||||
mProbers[53] = new nsSingleByteCharSetProber(&Ibm852SlovakModel);
|
mProbers[53] = new nsSingleByteCharSetProber(&Ibm852SlovakModel);
|
||||||
|
|
||||||
|
mProbers[54] = new nsSingleByteCharSetProber(&Windows_1250PolishModel);
|
||||||
|
mProbers[55] = new nsSingleByteCharSetProber(&Iso_8859_2PolishModel);
|
||||||
|
mProbers[56] = new nsSingleByteCharSetProber(&Iso_8859_13PolishModel);
|
||||||
|
mProbers[57] = new nsSingleByteCharSetProber(&Iso_8859_16PolishModel);
|
||||||
|
mProbers[58] = new nsSingleByteCharSetProber(&Mac_CentraleuropePolishModel);
|
||||||
|
mProbers[59] = new nsSingleByteCharSetProber(&Ibm852PolishModel);
|
||||||
|
|
||||||
Reset();
|
Reset();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -40,7 +40,7 @@
|
|||||||
#define nsSBCSGroupProber_h__
|
#define nsSBCSGroupProber_h__
|
||||||
|
|
||||||
|
|
||||||
#define NUM_OF_SBCS_PROBERS 54
|
#define NUM_OF_SBCS_PROBERS 60
|
||||||
|
|
||||||
class nsCharSetProber;
|
class nsCharSetProber;
|
||||||
class nsSBCSGroupProber: public nsCharSetProber {
|
class nsSBCSGroupProber: public nsCharSetProber {
|
||||||
|
|||||||
@ -197,5 +197,12 @@ extern const SequenceModel Iso_8859_2SlovakModel;
|
|||||||
extern const SequenceModel Ibm852SlovakModel;
|
extern const SequenceModel Ibm852SlovakModel;
|
||||||
extern const SequenceModel Mac_CentraleuropeSlovakModel;
|
extern const SequenceModel Mac_CentraleuropeSlovakModel;
|
||||||
|
|
||||||
|
extern const SequenceModel Windows_1250PolishModel;
|
||||||
|
extern const SequenceModel Iso_8859_2PolishModel;
|
||||||
|
extern const SequenceModel Iso_8859_13PolishModel;
|
||||||
|
extern const SequenceModel Iso_8859_16PolishModel;
|
||||||
|
extern const SequenceModel Ibm852PolishModel;
|
||||||
|
extern const SequenceModel Mac_CentraleuropePolishModel;
|
||||||
|
|
||||||
#endif /* nsSingleByteCharSetProber_h__ */
|
#endif /* nsSingleByteCharSetProber_h__ */
|
||||||
|
|
||||||
|
|||||||
3
test/pl/ibm852.txt
Normal file
3
test/pl/ibm852.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) Holszaäska herbu Hippocentaurus (ur. ok. 1405, zm. 21 wrze<7A>nia 1461 w Krakowie)
|
||||||
|
ksi©ľniczka litewska, kr˘lowa Polski, od 1422 roku czwarta i ostatnia ľona W<>adys<79>awa II
|
||||||
|
Jagie<EFBFBD><EFBFBD>y.
|
||||||
3
test/pl/iso-8859-13.txt
Normal file
3
test/pl/iso-8859-13.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) Holszańska herbu Hippocentaurus (ur. ok. 1405, zm. 21 wrzeúnia 1461 w Krakowie)
|
||||||
|
ksićýniczka litewska, królowa Polski, od 1422 roku czwarta i ostatnia ýona Wůadysůawa II
|
||||||
|
Jagieůůy.
|
||||||
3
test/pl/iso-8859-16.txt
Normal file
3
test/pl/iso-8859-16.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) Holszańska herbu Hippocentaurus (ur. ok. 1405, zm. 21 wrze÷nia 1461 w Krakowie)
|
||||||
|
ksiýżniczka litewska, królowa Polski, od 1422 roku czwarta i ostatnia żona Władysława II
|
||||||
|
Jagiełły.
|
||||||
3
test/pl/iso-8859-2.txt
Normal file
3
test/pl/iso-8859-2.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) Holszańska herbu Hippocentaurus (ur. ok. 1405, zm. 21 września 1461 w Krakowie)
|
||||||
|
księżniczka litewska, królowa Polski, od 1422 roku czwarta i ostatnia żona Władysława II
|
||||||
|
Jagiełły.
|
||||||
3
test/pl/mac-centraleurope.txt
Normal file
3
test/pl/mac-centraleurope.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) HolszaÄska herbu Hippocentaurus (ur. ok. 1405, zm. 21 wrzećnia 1461 w Krakowie)
|
||||||
|
ksi«ýniczka litewska, kr—lowa Polski, od 1422 roku czwarta i ostatnia ýona W¸adys¸awa II
|
||||||
|
Jagie¸¸y.
|
||||||
3
test/pl/utf-8.txt
Normal file
3
test/pl/utf-8.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) Holszańska herbu Hippocentaurus (ur. ok. 1405, zm. 21 września 1461 w Krakowie)
|
||||||
|
księżniczka litewska, królowa Polski, od 1422 roku czwarta i ostatnia żona Władysława II
|
||||||
|
Jagiełły.
|
||||||
3
test/pl/windows-1250.txt
Normal file
3
test/pl/windows-1250.txt
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Zofia (Sonka) Holszańska herbu Hippocentaurus (ur. ok. 1405, zm. 21 września 1461 w Krakowie)
|
||||||
|
księżniczka litewska, królowa Polski, od 1422 roku czwarta i ostatnia żona Władysława II
|
||||||
|
Jagiełły.
|
||||||
Loading…
x
Reference in New Issue
Block a user