mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
LangModels: add Slovene support.
Encodings: ISO-8859-2, ISO-8859-16, Windows-1250, IBM852 and MAC-CENTRALEUROPE. Test text from https://sl.wikipedia.org/wiki/Naseljivi_planet
This commit is contained in:
parent
fbd2efdbe9
commit
d62154bd6e
@ -132,6 +132,12 @@ Techniques used by universalchardet are described at http://www.mozilla.org/proj
|
|||||||
* ISO-8859-2
|
* ISO-8859-2
|
||||||
* IBM852
|
* IBM852
|
||||||
* MAC-CENTRALEUROPE
|
* MAC-CENTRALEUROPE
|
||||||
|
* Slovene
|
||||||
|
* ISO-8859-2
|
||||||
|
* ISO-8859-16
|
||||||
|
* Windows-1250
|
||||||
|
* IBM852
|
||||||
|
* MAC-CENTRALEUROPE
|
||||||
* Spanish
|
* Spanish
|
||||||
* ISO-8859-1
|
* ISO-8859-1
|
||||||
* ISO-8859-15
|
* ISO-8859-15
|
||||||
|
|||||||
148
script/BuildLangModelLogs/LangSloveneModel.log
Normal file
148
script/BuildLangModelLogs/LangSloveneModel.log
Normal file
@ -0,0 +1,148 @@
|
|||||||
|
= Logs of language model for Slovene (sl) =
|
||||||
|
|
||||||
|
- Generated by BuildLangModel.py
|
||||||
|
- Started: 2016-09-28 22:00:35.243966
|
||||||
|
- Maximum depth: 5
|
||||||
|
- Max number of pages: 100
|
||||||
|
|
||||||
|
== Parsed pages ==
|
||||||
|
|
||||||
|
XCOM: Enemy Unknown (revision 4704271)
|
||||||
|
1UP.com (revision 4547348)
|
||||||
|
2K Games (revision 4110089)
|
||||||
|
Android (operacijski sistem) (revision 4619359)
|
||||||
|
Animator videoigre (revision 4702643)
|
||||||
|
App Store (revision 3903089)
|
||||||
|
Artefakt (revision 4484504)
|
||||||
|
Athlon (revision 4524746)
|
||||||
|
Avstralazija (revision 4623530)
|
||||||
|
Avtopsija (revision 4541344)
|
||||||
|
Bralno-pisalni pomnilnik (revision 4256388)
|
||||||
|
Civilization (serija) (revision 4645770)
|
||||||
|
Deus Ex: Human Revolution (revision 4694860)
|
||||||
|
Digitalna distribucija (revision 4696215)
|
||||||
|
DirectX (revision 4477913)
|
||||||
|
Dishonored (revision 4619444)
|
||||||
|
Edge (magazine) (revision 4690049)
|
||||||
|
Electronic Entertainment Expo (revision 4538691)
|
||||||
|
Enoigralska videoigra (revision 4610359)
|
||||||
|
Eurogamer (revision 4694860)
|
||||||
|
Evropa (revision 4687833)
|
||||||
|
Fantasy Flight Games (revision 4649361)
|
||||||
|
Firaxis Games (revision 4110089)
|
||||||
|
GameRankings (revision 3934020)
|
||||||
|
GameSpot (revision 4238015)
|
||||||
|
GameSpy (revision 4538691)
|
||||||
|
GameTrailers (revision 4704271)
|
||||||
|
Game Informer (revision 4704271)
|
||||||
|
GamesTM (revision 4704271)
|
||||||
|
Grafična kartica (revision 4257980)
|
||||||
|
Granata (revision 3859332)
|
||||||
|
Holograf (revision 4477482)
|
||||||
|
IGN (revision 4576233)
|
||||||
|
IOS (revision 4597264)
|
||||||
|
Igra igranja vlog (revision 4642276)
|
||||||
|
Igra na deski (revision 4649363)
|
||||||
|
Igralna konzola (revision 4649866)
|
||||||
|
Igralni pogon (revision 4622773)
|
||||||
|
Intel (revision 4626025)
|
||||||
|
International Standard Book Number (revision 4015087)
|
||||||
|
Izdelovalec videoigre (revision 3851747)
|
||||||
|
Joker (revija) (revision 3867772)
|
||||||
|
Kotaku (revision 4613535)
|
||||||
|
Kristal (revision 4156234)
|
||||||
|
Linux (revision 4524740)
|
||||||
|
Lovec prestreznik (revision 4102792)
|
||||||
|
MTV (revision 4621758)
|
||||||
|
Mac OS X (revision 4601645)
|
||||||
|
Machinima (revision 4601716)
|
||||||
|
Major (revision 4245802)
|
||||||
|
Mednarodna različica (revision 4116054)
|
||||||
|
Metacritic (revision 3934020)
|
||||||
|
Michael McCann (skladatelj) (revision 4694860)
|
||||||
|
MicroProse (revision 4382810)
|
||||||
|
Microsoft Windows (revision 4691357)
|
||||||
|
Nezemeljsko življenje (revision 4620576)
|
||||||
|
NowGamer (revision 4704271)
|
||||||
|
OS X (revision 4601645)
|
||||||
|
Ognjena ekipa (revision 4694450)
|
||||||
|
Operacijski sistem (revision 4698515)
|
||||||
|
Ostrostrelec (revision 4529694)
|
||||||
|
Pilot (revision 4069093)
|
||||||
|
PlayStation 3 (revision 4382944)
|
||||||
|
PlayStation Network (revision 4382944)
|
||||||
|
PlayStation Vita (revision 3944025)
|
||||||
|
Pogon igre (revision 4622773)
|
||||||
|
Procesor (revision 4702518)
|
||||||
|
Producent videoiger (revision 4599904)
|
||||||
|
Razvijalec videoiger (revision 4093281)
|
||||||
|
Računalniška miška (revision 4385579)
|
||||||
|
Računalniška platforma (revision 4673669)
|
||||||
|
Severna Amerika (revision 4643798)
|
||||||
|
Sid Meier (revision 4061487)
|
||||||
|
Stealth (revision 4618630)
|
||||||
|
Steam (revision 4696215)
|
||||||
|
Strateška videoigra (revision 4236795)
|
||||||
|
Tablični računalnik (revision 4409985)
|
||||||
|
Take-Two Interactive (revision 4110089)
|
||||||
|
Telepatija (revision 4481192)
|
||||||
|
The Bureau: XCOM Declassified (revision 4704271)
|
||||||
|
The Guardian (revision 3929479)
|
||||||
|
Trdi disk (revision 4644623)
|
||||||
|
UFO: Enemy Unknown (revision 4704271)
|
||||||
|
Unreal Engine (revision 4622773)
|
||||||
|
Unreal Engine 3 (revision 4622773)
|
||||||
|
Uporabniški vmesnik (revision 4552473)
|
||||||
|
Valve Corporation (revision 4110105)
|
||||||
|
Večigralska videoigra (revision 4618639)
|
||||||
|
VideoGamer.com (revision 4704271)
|
||||||
|
Vohunski satelit (revision 4215166)
|
||||||
|
Vojaška taktika (revision 3970259)
|
||||||
|
Vojaški čini (revision 4363026)
|
||||||
|
|
||||||
|
== End of Parsed pages ==
|
||||||
|
|
||||||
|
- Wikipedia parsing ended at: 2016-09-28 22:06:46.133919
|
||||||
|
|
||||||
|
41 characters appeared 411226 times.
|
||||||
|
|
||||||
|
First 29 characters:
|
||||||
|
[ 0] Char a: 10.090315301075321 %
|
||||||
|
[ 1] Char e: 9.90477255815537 %
|
||||||
|
[ 2] Char i: 9.666703953543793 %
|
||||||
|
[ 3] Char o: 9.177921629468953 %
|
||||||
|
[ 4] Char n: 7.28309980400072 %
|
||||||
|
[ 5] Char r: 5.808241696779873 %
|
||||||
|
[ 6] Char s: 4.575586174025961 %
|
||||||
|
[ 7] Char t: 4.4963110309173056 %
|
||||||
|
[ 8] Char j: 4.343840126840229 %
|
||||||
|
[ 9] Char l: 4.2672399118732764 %
|
||||||
|
[10] Char v: 3.802775116359374 %
|
||||||
|
[11] Char p: 3.5216644861949393 %
|
||||||
|
[12] Char k: 3.5136397017698293 %
|
||||||
|
[13] Char d: 3.0387183689747244 %
|
||||||
|
[14] Char m: 2.9487435132992563 %
|
||||||
|
[15] Char z: 2.350775485985808 %
|
||||||
|
[16] Char u: 1.9719083910064055 %
|
||||||
|
[17] Char g: 1.9342162217369525 %
|
||||||
|
[18] Char b: 1.5392995579073308 %
|
||||||
|
[19] Char c: 1.2924766430138173 %
|
||||||
|
[20] Char h: 1.1864522184881305 %
|
||||||
|
[21] Char č: 1.137087635509428 %
|
||||||
|
[22] Char š: 0.6932927392723223 %
|
||||||
|
[23] Char ž: 0.45303555709026183 %
|
||||||
|
[24] Char f: 0.40707542811009034 %
|
||||||
|
[25] Char x: 0.19381070263067024 %
|
||||||
|
[26] Char y: 0.19040624863213904 %
|
||||||
|
[27] Char w: 0.18919037220409216 %
|
||||||
|
[28] Char q: 0.011186063138031156 %
|
||||||
|
|
||||||
|
The first 29 characters have an accumulated ratio of 0.9998978663800442.
|
||||||
|
|
||||||
|
727 sequences found.
|
||||||
|
|
||||||
|
First 512 (typical positive ratio): 0.9983524317161332
|
||||||
|
Next 512 (512-1024): 2.4317528560937295e-06
|
||||||
|
Rest: -3.859759734048396e-17
|
||||||
|
|
||||||
|
- Processing end: 2016-09-28 22:06:46.601266
|
||||||
59
script/langs/sl.py
Normal file
59
script/langs/sl.py
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
#!/bin/python3
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
# ##### BEGIN LICENSE BLOCK #####
|
||||||
|
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
#
|
||||||
|
# The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
# the License. You may obtain a copy of the License at
|
||||||
|
# http://www.mozilla.org/MPL/
|
||||||
|
#
|
||||||
|
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
# for the specific language governing rights and limitations under the
|
||||||
|
# License.
|
||||||
|
#
|
||||||
|
# The Original Code is Mozilla Universal charset detector code.
|
||||||
|
#
|
||||||
|
# The Initial Developer of the Original Code is
|
||||||
|
# Netscape Communications Corporation.
|
||||||
|
# Portions created by the Initial Developer are Copyright (C) 2001
|
||||||
|
# the Initial Developer. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Contributor(s):
|
||||||
|
# Jehan <jehan@girinstud.io>
|
||||||
|
#
|
||||||
|
# Alternatively, the contents of this file may be used under the terms of
|
||||||
|
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
# of those above. If you wish to allow use of your version of this file only
|
||||||
|
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
# use your version of this file under the terms of the MPL, indicate your
|
||||||
|
# decision by deleting the provisions above and replace them with the notice
|
||||||
|
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
# the provisions above, a recipient may use your version of this file under
|
||||||
|
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
#
|
||||||
|
# ##### END LICENSE BLOCK #####
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
## Mandatory Properties ##
|
||||||
|
|
||||||
|
name = 'Slovene'
|
||||||
|
code = 'sl'
|
||||||
|
use_ascii = True
|
||||||
|
charsets = ['ISO-8859-2', 'ISO-8859-16',
|
||||||
|
'Windows-1250', 'IBM852', 'MAC-CENTRALEUROPE']
|
||||||
|
|
||||||
|
## Optional Properties ##
|
||||||
|
|
||||||
|
# Alphabet characters.
|
||||||
|
alphabet = 'čšž'
|
||||||
|
# The starred page which was rewarded on the main page when I created
|
||||||
|
# the data.
|
||||||
|
start_pages = ['XCOM: Enemy Unknown']
|
||||||
|
wikipedia_code = code
|
||||||
|
case_mapping = True
|
||||||
@ -30,6 +30,7 @@ set(
|
|||||||
LangModels/LangRomanianModel.cpp
|
LangModels/LangRomanianModel.cpp
|
||||||
LangModels/LangRussianModel.cpp
|
LangModels/LangRussianModel.cpp
|
||||||
LangModels/LangSlovakModel.cpp
|
LangModels/LangSlovakModel.cpp
|
||||||
|
LangModels/LangSloveneModel.cpp
|
||||||
LangModels/LangSpanishModel.cpp
|
LangModels/LangSpanishModel.cpp
|
||||||
LangModels/LangThaiModel.cpp
|
LangModels/LangThaiModel.cpp
|
||||||
LangModels/LangTurkishModel.cpp
|
LangModels/LangTurkishModel.cpp
|
||||||
|
|||||||
259
src/LangModels/LangSloveneModel.cpp
Normal file
259
src/LangModels/LangSloveneModel.cpp
Normal file
@ -0,0 +1,259 @@
|
|||||||
|
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
|
||||||
|
/* ***** BEGIN LICENSE BLOCK *****
|
||||||
|
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
*
|
||||||
|
* The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
* the License. You may obtain a copy of the License at
|
||||||
|
* http://www.mozilla.org/MPL/
|
||||||
|
*
|
||||||
|
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
* for the specific language governing rights and limitations under the
|
||||||
|
* License.
|
||||||
|
*
|
||||||
|
* The Original Code is Mozilla Communicator client code.
|
||||||
|
*
|
||||||
|
* The Initial Developer of the Original Code is
|
||||||
|
* Netscape Communications Corporation.
|
||||||
|
* Portions created by the Initial Developer are Copyright (C) 1998
|
||||||
|
* the Initial Developer. All Rights Reserved.
|
||||||
|
*
|
||||||
|
* Contributor(s):
|
||||||
|
*
|
||||||
|
* Alternatively, the contents of this file may be used under the terms of
|
||||||
|
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
* of those above. If you wish to allow use of your version of this file only
|
||||||
|
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
* use your version of this file under the terms of the MPL, indicate your
|
||||||
|
* decision by deleting the provisions above and replace them with the notice
|
||||||
|
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
* the provisions above, a recipient may use your version of this file under
|
||||||
|
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
*
|
||||||
|
* ***** END LICENSE BLOCK ***** */
|
||||||
|
|
||||||
|
#include "../nsSBCharSetProber.h"
|
||||||
|
|
||||||
|
/********* Language model for: Slovene *********/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generated by BuildLangModel.py
|
||||||
|
* On: 2016-09-28 22:06:46.134717
|
||||||
|
**/
|
||||||
|
|
||||||
|
/* Character Mapping Table:
|
||||||
|
* ILL: illegal character.
|
||||||
|
* CTR: control character specific to the charset.
|
||||||
|
* RET: carriage/return.
|
||||||
|
* SYM: symbol (punctuation) that does not belong to word.
|
||||||
|
* NUM: 0 - 9.
|
||||||
|
*
|
||||||
|
* Other characters are ordered by probabilities
|
||||||
|
* (0 is the most common character in the language).
|
||||||
|
*
|
||||||
|
* Orders are generic to a language. So the codepoint with order X in
|
||||||
|
* CHARSET1 maps to the same character as the codepoint with the same
|
||||||
|
* order X in CHARSET2 for the same language.
|
||||||
|
* As such, it is possible to get missing order. For instance the
|
||||||
|
* ligature of 'o' and 'e' exists in ISO-8859-15 but not in ISO-8859-1
|
||||||
|
* even though they are both used for French. Same for the euro sign.
|
||||||
|
*/
|
||||||
|
static const unsigned char Iso_8859_2_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM, 41,SYM, 42,SYM, 43, 44,SYM,SYM, 22, 45, 46, 47,SYM, 23, 48, /* AX */
|
||||||
|
SYM, 49,SYM, 50,SYM, 51, 52,SYM,SYM, 22, 53, 54, 55,SYM, 23, 56, /* BX */
|
||||||
|
57, 32, 58, 59, 60, 61, 37, 34, 21, 29, 62, 36, 63, 30, 64, 65, /* CX */
|
||||||
|
66, 67, 68, 31, 35, 69, 70,SYM, 71, 72, 39, 73, 74, 40, 75, 76, /* DX */
|
||||||
|
77, 32, 78, 79, 80, 81, 37, 34, 21, 29, 82, 36, 83, 30, 84, 85, /* EX */
|
||||||
|
86, 87, 88, 31, 35, 89, 90,SYM, 91, 92, 39, 93, 94, 40, 95,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_16_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM, 96, 97, 98,SYM,SYM, 22,SYM, 22,SYM, 99,SYM,100,SYM,101,102, /* AX */
|
||||||
|
SYM,SYM, 21,103, 23,SYM,SYM,SYM, 23, 21,104,SYM,105,106,107,108, /* BX */
|
||||||
|
109, 32,110,111,112, 37,113, 34,114, 29, 33, 36,115, 30,116,117, /* CX */
|
||||||
|
118,119,120, 31, 35,121,122,123,124,125, 39,126,127,128,129,130, /* DX */
|
||||||
|
131, 32,132,133,134, 37,135, 34,136, 29, 33, 36,137, 30,138,139, /* EX */
|
||||||
|
140,141,142, 31, 35,143,144,145,146,147, 39,148,149,150,151,152, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Windows_1250_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
SYM,ILL,SYM,ILL,SYM,SYM,SYM,SYM,ILL,SYM, 22,SYM,153,154, 23,155, /* 8X */
|
||||||
|
ILL,SYM,SYM,SYM,SYM,SYM,SYM,SYM,ILL,SYM, 22,SYM,156,157, 23,158, /* 9X */
|
||||||
|
SYM,SYM,SYM,159,SYM,160,SYM,SYM,SYM,SYM,161,SYM,SYM,SYM,SYM,162, /* AX */
|
||||||
|
SYM,SYM,SYM,163,SYM,SYM,SYM,SYM,SYM,164,165,SYM,166,SYM,167,168, /* BX */
|
||||||
|
169, 32,170,171,172,173, 37, 34, 21, 29,174, 36,175, 30,176,177, /* CX */
|
||||||
|
178,179,180, 31, 35,181,182,SYM,183,184, 39,185,186, 40,187,188, /* DX */
|
||||||
|
189, 32,190,191,192,193, 37, 34, 21, 29,194, 36,195, 30,196,197, /* EX */
|
||||||
|
198,199,200, 31, 35,201,202,SYM,203,204, 39,205,206, 40,207,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Mac_Centraleurope_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
208,209,210, 29,211,212,213, 32,214, 21,215, 21, 37, 37, 29,216, /* 8X */
|
||||||
|
217,218, 30,219, 38, 38,220, 31,221, 35,222,223, 39,224,225,226, /* 9X */
|
||||||
|
SYM,SYM,227,SYM,SYM,SYM,SYM,228,SYM,SYM,SYM,229,SYM,SYM,230,231, /* AX */
|
||||||
|
232,233,SYM,SYM,234,235,SYM,SYM,236,237,238,239,240,241,242,243, /* BX */
|
||||||
|
244,245,SYM,SYM,246,247,SYM,SYM,SYM,SYM,SYM,248,249,249,249,249, /* CX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,249,249,249,249,SYM,SYM,249,249, /* DX */
|
||||||
|
249, 22,SYM,SYM, 22,249,249, 32,249,249, 30, 23, 23,249, 31, 35, /* EX */
|
||||||
|
249,249, 39,249,249,249,249,249, 40, 40,249,249,249,249,249,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Ibm852_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
|
||||||
|
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
34,249, 29,249,249,249, 37, 34,249, 36,249,249,249,249,249, 37, /* 8X */
|
||||||
|
29,249,249, 35,249,249,249,249,249,249,249,249,249,249,SYM, 21, /* 9X */
|
||||||
|
32, 30, 31, 39,249,249, 23, 23,249,249,SYM,249, 21,249,SYM,SYM, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM, 32,249,249,249,SYM,SYM,SYM,SYM,249,249,SYM, /* BX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,249,249,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* CX */
|
||||||
|
249,249,249, 36,249,249, 30,249,249,SYM,SYM,SYM,SYM,249,249,SYM, /* DX */
|
||||||
|
31,249, 35,249,249,249, 22, 22,249, 39,249,249, 40, 40,249,SYM, /* EX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,249,249,249,SYM,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
|
||||||
|
/* Model Table:
|
||||||
|
* Total sequences: 727
|
||||||
|
* First 512 sequences: 0.9983524317161332
|
||||||
|
* Next 512 sequences (512-1024): 0.0016475682838668457
|
||||||
|
* Rest: -3.859759734048396e-17
|
||||||
|
* Negative sequences: TODO
|
||||||
|
*/
|
||||||
|
static const PRUint8 SloveneLangModel[] =
|
||||||
|
{
|
||||||
|
2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,2,3,2,3,3,3,2,0,0,3,2,3,3,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,0,0,0,3,2,3,3,0,
|
||||||
|
3,3,3,3,3,2,3,3,0,0,3,3,3,3,3,2,3,2,3,3,3,2,3,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,2,3,3,2,3,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,0,3,3,3,3,3,3,3,2,3,3,3,3,2,2,2,2,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,2,3,0,3,3,3,2,2,3,3,3,3,3,2,2,0,0,0,3,2,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,0,2,3,3,2,3,0,2,3,3,0,3,0,2,0,3,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,2,3,2,2,3,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,3,2,3,3,2,2,2,0,2,2,3,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,2,0,2,0,0,0,
|
||||||
|
3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,2,0,0,
|
||||||
|
3,3,3,3,3,3,3,2,0,3,3,3,2,2,2,0,3,2,3,2,3,0,0,0,2,2,2,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,2,2,3,3,0,3,0,2,2,0,3,3,2,2,0,3,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,0,3,2,3,3,3,2,2,3,2,2,3,3,0,0,0,2,2,3,2,2,
|
||||||
|
3,3,3,3,3,3,2,3,0,3,3,3,3,2,2,2,3,0,2,0,0,2,0,0,2,0,2,2,0,
|
||||||
|
3,3,3,3,3,3,0,0,3,3,2,2,3,2,0,0,3,0,2,2,0,0,2,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,2,0,3,3,3,2,3,3,0,0,0,3,0,0,0,0,3,0,2,0,0,0,0,0,
|
||||||
|
3,3,3,2,3,2,0,2,3,3,2,0,3,0,0,0,3,2,3,2,0,0,0,2,0,0,0,0,0,
|
||||||
|
3,3,3,3,2,3,3,3,0,3,0,0,0,2,2,0,3,2,0,2,2,0,0,0,3,2,2,2,0,
|
||||||
|
3,3,3,3,2,2,2,3,0,0,2,3,0,2,2,0,3,2,3,3,2,0,0,0,2,2,2,2,0,
|
||||||
|
3,3,2,3,3,2,3,3,3,3,0,2,2,2,2,0,2,2,2,3,2,0,0,0,0,2,0,2,0,
|
||||||
|
3,3,3,3,3,0,3,0,0,2,0,0,0,0,2,0,2,2,2,0,2,0,0,0,2,0,2,3,0,
|
||||||
|
0,0,0,0,2,0,0,2,0,2,0,0,0,0,0,0,3,0,0,2,0,0,0,0,0,0,0,0,0,
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_2SloveneModel =
|
||||||
|
{
|
||||||
|
Iso_8859_2_CharToOrderMap,
|
||||||
|
SloveneLangModel,
|
||||||
|
29,
|
||||||
|
(float)0.9983524317161332,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-2"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_16SloveneModel =
|
||||||
|
{
|
||||||
|
Iso_8859_16_CharToOrderMap,
|
||||||
|
SloveneLangModel,
|
||||||
|
29,
|
||||||
|
(float)0.9983524317161332,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-16"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Windows_1250SloveneModel =
|
||||||
|
{
|
||||||
|
Windows_1250_CharToOrderMap,
|
||||||
|
SloveneLangModel,
|
||||||
|
29,
|
||||||
|
(float)0.9983524317161332,
|
||||||
|
PR_TRUE,
|
||||||
|
"WINDOWS-1250"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Mac_CentraleuropeSloveneModel =
|
||||||
|
{
|
||||||
|
Mac_Centraleurope_CharToOrderMap,
|
||||||
|
SloveneLangModel,
|
||||||
|
29,
|
||||||
|
(float)0.9983524317161332,
|
||||||
|
PR_TRUE,
|
||||||
|
"MAC-CENTRALEUROPE"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Ibm852SloveneModel =
|
||||||
|
{
|
||||||
|
Ibm852_CharToOrderMap,
|
||||||
|
SloveneLangModel,
|
||||||
|
29,
|
||||||
|
(float)0.9983524317161332,
|
||||||
|
PR_TRUE,
|
||||||
|
"IBM852"
|
||||||
|
};
|
||||||
@ -179,6 +179,12 @@ nsSBCSGroupProber::nsSBCSGroupProber()
|
|||||||
mProbers[87] = new nsSingleByteCharSetProber(&Iso_8859_16RomanianModel);
|
mProbers[87] = new nsSingleByteCharSetProber(&Iso_8859_16RomanianModel);
|
||||||
mProbers[88] = new nsSingleByteCharSetProber(&Ibm852RomanianModel);
|
mProbers[88] = new nsSingleByteCharSetProber(&Ibm852RomanianModel);
|
||||||
|
|
||||||
|
mProbers[89] = new nsSingleByteCharSetProber(&Windows_1250SloveneModel);
|
||||||
|
mProbers[90] = new nsSingleByteCharSetProber(&Iso_8859_2SloveneModel);
|
||||||
|
mProbers[91] = new nsSingleByteCharSetProber(&Iso_8859_16SloveneModel);
|
||||||
|
mProbers[92] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSloveneModel);
|
||||||
|
mProbers[93] = new nsSingleByteCharSetProber(&Ibm852SloveneModel);
|
||||||
|
|
||||||
Reset();
|
Reset();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -40,7 +40,7 @@
|
|||||||
#define nsSBCSGroupProber_h__
|
#define nsSBCSGroupProber_h__
|
||||||
|
|
||||||
|
|
||||||
#define NUM_OF_SBCS_PROBERS 89
|
#define NUM_OF_SBCS_PROBERS 94
|
||||||
|
|
||||||
class nsCharSetProber;
|
class nsCharSetProber;
|
||||||
class nsSBCSGroupProber: public nsCharSetProber {
|
class nsSBCSGroupProber: public nsCharSetProber {
|
||||||
|
|||||||
@ -240,5 +240,11 @@ extern const SequenceModel Iso_8859_2RomanianModel;
|
|||||||
extern const SequenceModel Iso_8859_16RomanianModel;
|
extern const SequenceModel Iso_8859_16RomanianModel;
|
||||||
extern const SequenceModel Ibm852RomanianModel;
|
extern const SequenceModel Ibm852RomanianModel;
|
||||||
|
|
||||||
|
extern const SequenceModel Windows_1250SloveneModel;
|
||||||
|
extern const SequenceModel Iso_8859_2SloveneModel;
|
||||||
|
extern const SequenceModel Iso_8859_16SloveneModel;
|
||||||
|
extern const SequenceModel Ibm852SloveneModel;
|
||||||
|
extern const SequenceModel Mac_CentraleuropeSloveneModel;
|
||||||
|
|
||||||
#endif /* nsSingleByteCharSetProber_h__ */
|
#endif /* nsSingleByteCharSetProber_h__ */
|
||||||
|
|
||||||
|
|||||||
9
test/sl/ibm852.txt
Normal file
9
test/sl/ibm852.txt
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Naselj」i plan<61> je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
|
||||||
|
zmo呈n razviti in ohranjati 告vljenje.
|
||||||
|
|
||||||
|
Ker je obstoj nezemeljskega 告vljenja trenutno negotov, je raziskovanje
|
||||||
|
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in zna齚lnosti
|
||||||
|
Sonca in celotnega Oson啳a, ki govorijo v prid razvitju 告vljenja. 熰 posebej so
|
||||||
|
pomembni faktorji, ki so ohranili zapletene, mnogoceli俲e organizme in ne le
|
||||||
|
preprosta, enoceli俲a 告va bitja, mikroorganizme. Raziskovanje in teorija v tej
|
||||||
|
smeri je del planetologije in razvijajo鬉 astrobiologije.
|
||||||
9
test/sl/iso-8859-16.txt
Normal file
9
test/sl/iso-8859-16.txt
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
|
||||||
|
zmo¸en razviti in ohranjati ¸ivljenje.
|
||||||
|
|
||||||
|
Ker je obstoj nezemeljskega ¸ivljenja trenutno negotov, je raziskovanje
|
||||||
|
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in zna¹ilnosti
|
||||||
|
Sonca in celotnega Oson¹ja, ki govorijo v prid razvitju ¸ivljenja. ¦e posebej so
|
||||||
|
pomembni faktorji, ki so ohranili zapletene, mnogoceli¹ne organizme in ne le
|
||||||
|
preprosta, enoceli¹na ¸iva bitja, mikroorganizme. Raziskovanje in teorija v tej
|
||||||
|
smeri je del planetologije in razvijajo¹e astrobiologije.
|
||||||
9
test/sl/iso-8859-2.txt
Normal file
9
test/sl/iso-8859-2.txt
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
|
||||||
|
zmožen razviti in ohranjati življenje.
|
||||||
|
|
||||||
|
Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
|
||||||
|
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
|
||||||
|
Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
|
||||||
|
pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
|
||||||
|
preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
|
||||||
|
smeri je del planetologije in razvijajoče astrobiologije.
|
||||||
9
test/sl/mac-centraleurope.txt
Normal file
9
test/sl/mac-centraleurope.txt
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Naselj’vi planŽt je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
|
||||||
|
zmoěen razviti in ohranjati ěivljenje.
|
||||||
|
|
||||||
|
Ker je obstoj nezemeljskega ěivljenja trenutno negotov, je raziskovanje
|
||||||
|
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in zna‹ilnosti
|
||||||
|
Sonca in celotnega Oson‹ja, ki govorijo v prid razvitju ěivljenja. áe posebej so
|
||||||
|
pomembni faktorji, ki so ohranili zapletene, mnogoceli‹ne organizme in ne le
|
||||||
|
preprosta, enoceli‹na ěiva bitja, mikroorganizme. Raziskovanje in teorija v tej
|
||||||
|
smeri je del planetologije in razvijajo‹e astrobiologije.
|
||||||
9
test/sl/utf-8.txt
Normal file
9
test/sl/utf-8.txt
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
|
||||||
|
zmožen razviti in ohranjati življenje.
|
||||||
|
|
||||||
|
Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
|
||||||
|
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
|
||||||
|
Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
|
||||||
|
pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
|
||||||
|
preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
|
||||||
|
smeri je del planetologije in razvijajoče astrobiologije.
|
||||||
9
test/sl/windows-1250.txt
Normal file
9
test/sl/windows-1250.txt
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
|
||||||
|
zmožen razviti in ohranjati življenje.
|
||||||
|
|
||||||
|
Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
|
||||||
|
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
|
||||||
|
Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
|
||||||
|
pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
|
||||||
|
preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
|
||||||
|
smeri je del planetologije in razvijajoče astrobiologije.
|
||||||
Loading…
x
Reference in New Issue
Block a user