LangModels: add Slovene support.

Encodings: ISO-8859-2, ISO-8859-16, Windows-1250, IBM852 and
MAC-CENTRALEUROPE.
Test text from https://sl.wikipedia.org/wiki/Naseljivi_planet
This commit is contained in:
Jehan 2016-09-28 22:11:19 +02:00
parent fbd2efdbe9
commit d62154bd6e
14 changed files with 540 additions and 1 deletions

View File

@ -132,6 +132,12 @@ Techniques used by universalchardet are described at http://www.mozilla.org/proj
* ISO-8859-2
* IBM852
* MAC-CENTRALEUROPE
* Slovene
* ISO-8859-2
* ISO-8859-16
* Windows-1250
* IBM852
* MAC-CENTRALEUROPE
* Spanish
* ISO-8859-1
* ISO-8859-15

View File

@ -0,0 +1,148 @@
= Logs of language model for Slovene (sl) =
- Generated by BuildLangModel.py
- Started: 2016-09-28 22:00:35.243966
- Maximum depth: 5
- Max number of pages: 100
== Parsed pages ==
XCOM: Enemy Unknown (revision 4704271)
1UP.com (revision 4547348)
2K Games (revision 4110089)
Android (operacijski sistem) (revision 4619359)
Animator videoigre (revision 4702643)
App Store (revision 3903089)
Artefakt (revision 4484504)
Athlon (revision 4524746)
Avstralazija (revision 4623530)
Avtopsija (revision 4541344)
Bralno-pisalni pomnilnik (revision 4256388)
Civilization (serija) (revision 4645770)
Deus Ex: Human Revolution (revision 4694860)
Digitalna distribucija (revision 4696215)
DirectX (revision 4477913)
Dishonored (revision 4619444)
Edge (magazine) (revision 4690049)
Electronic Entertainment Expo (revision 4538691)
Enoigralska videoigra (revision 4610359)
Eurogamer (revision 4694860)
Evropa (revision 4687833)
Fantasy Flight Games (revision 4649361)
Firaxis Games (revision 4110089)
GameRankings (revision 3934020)
GameSpot (revision 4238015)
GameSpy (revision 4538691)
GameTrailers (revision 4704271)
Game Informer (revision 4704271)
GamesTM (revision 4704271)
Grafična kartica (revision 4257980)
Granata (revision 3859332)
Holograf (revision 4477482)
IGN (revision 4576233)
IOS (revision 4597264)
Igra igranja vlog (revision 4642276)
Igra na deski (revision 4649363)
Igralna konzola (revision 4649866)
Igralni pogon (revision 4622773)
Intel (revision 4626025)
International Standard Book Number (revision 4015087)
Izdelovalec videoigre (revision 3851747)
Joker (revija) (revision 3867772)
Kotaku (revision 4613535)
Kristal (revision 4156234)
Linux (revision 4524740)
Lovec prestreznik (revision 4102792)
MTV (revision 4621758)
Mac OS X (revision 4601645)
Machinima (revision 4601716)
Major (revision 4245802)
Mednarodna različica (revision 4116054)
Metacritic (revision 3934020)
Michael McCann (skladatelj) (revision 4694860)
MicroProse (revision 4382810)
Microsoft Windows (revision 4691357)
Nezemeljsko življenje (revision 4620576)
NowGamer (revision 4704271)
OS X (revision 4601645)
Ognjena ekipa (revision 4694450)
Operacijski sistem (revision 4698515)
Ostrostrelec (revision 4529694)
Pilot (revision 4069093)
PlayStation 3 (revision 4382944)
PlayStation Network (revision 4382944)
PlayStation Vita (revision 3944025)
Pogon igre (revision 4622773)
Procesor (revision 4702518)
Producent videoiger (revision 4599904)
Razvijalec videoiger (revision 4093281)
Računalniška miška (revision 4385579)
Računalniška platforma (revision 4673669)
Severna Amerika (revision 4643798)
Sid Meier (revision 4061487)
Stealth (revision 4618630)
Steam (revision 4696215)
Strateška videoigra (revision 4236795)
Tablični računalnik (revision 4409985)
Take-Two Interactive (revision 4110089)
Telepatija (revision 4481192)
The Bureau: XCOM Declassified (revision 4704271)
The Guardian (revision 3929479)
Trdi disk (revision 4644623)
UFO: Enemy Unknown (revision 4704271)
Unreal Engine (revision 4622773)
Unreal Engine 3 (revision 4622773)
Uporabniški vmesnik (revision 4552473)
Valve Corporation (revision 4110105)
Večigralska videoigra (revision 4618639)
VideoGamer.com (revision 4704271)
Vohunski satelit (revision 4215166)
Vojaška taktika (revision 3970259)
Vojaški čini (revision 4363026)
== End of Parsed pages ==
- Wikipedia parsing ended at: 2016-09-28 22:06:46.133919
41 characters appeared 411226 times.
First 29 characters:
[ 0] Char a: 10.090315301075321 %
[ 1] Char e: 9.90477255815537 %
[ 2] Char i: 9.666703953543793 %
[ 3] Char o: 9.177921629468953 %
[ 4] Char n: 7.28309980400072 %
[ 5] Char r: 5.808241696779873 %
[ 6] Char s: 4.575586174025961 %
[ 7] Char t: 4.4963110309173056 %
[ 8] Char j: 4.343840126840229 %
[ 9] Char l: 4.2672399118732764 %
[10] Char v: 3.802775116359374 %
[11] Char p: 3.5216644861949393 %
[12] Char k: 3.5136397017698293 %
[13] Char d: 3.0387183689747244 %
[14] Char m: 2.9487435132992563 %
[15] Char z: 2.350775485985808 %
[16] Char u: 1.9719083910064055 %
[17] Char g: 1.9342162217369525 %
[18] Char b: 1.5392995579073308 %
[19] Char c: 1.2924766430138173 %
[20] Char h: 1.1864522184881305 %
[21] Char č: 1.137087635509428 %
[22] Char š: 0.6932927392723223 %
[23] Char ž: 0.45303555709026183 %
[24] Char f: 0.40707542811009034 %
[25] Char x: 0.19381070263067024 %
[26] Char y: 0.19040624863213904 %
[27] Char w: 0.18919037220409216 %
[28] Char q: 0.011186063138031156 %
The first 29 characters have an accumulated ratio of 0.9998978663800442.
727 sequences found.
First 512 (typical positive ratio): 0.9983524317161332
Next 512 (512-1024): 2.4317528560937295e-06
Rest: -3.859759734048396e-17
- Processing end: 2016-09-28 22:06:46.601266

59
script/langs/sl.py Normal file
View File

@ -0,0 +1,59 @@
#!/bin/python3
# -*- coding: utf-8 -*-
# ##### BEGIN LICENSE BLOCK #####
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
#
# The contents of this file are subject to the Mozilla Public License Version
# 1.1 (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS IS" basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
#
# The Original Code is Mozilla Universal charset detector code.
#
# The Initial Developer of the Original Code is
# Netscape Communications Corporation.
# Portions created by the Initial Developer are Copyright (C) 2001
# the Initial Developer. All Rights Reserved.
#
# Contributor(s):
# Jehan <jehan@girinstud.io>
#
# Alternatively, the contents of this file may be used under the terms of
# either the GNU General Public License Version 2 or later (the "GPL"), or
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
# in which case the provisions of the GPL or the LGPL are applicable instead
# of those above. If you wish to allow use of your version of this file only
# under the terms of either the GPL or the LGPL, and not to allow others to
# use your version of this file under the terms of the MPL, indicate your
# decision by deleting the provisions above and replace them with the notice
# and other provisions required by the GPL or the LGPL. If you do not delete
# the provisions above, a recipient may use your version of this file under
# the terms of any one of the MPL, the GPL or the LGPL.
#
# ##### END LICENSE BLOCK #####
import re
## Mandatory Properties ##
name = 'Slovene'
code = 'sl'
use_ascii = True
charsets = ['ISO-8859-2', 'ISO-8859-16',
'Windows-1250', 'IBM852', 'MAC-CENTRALEUROPE']
## Optional Properties ##
# Alphabet characters.
alphabet = 'čšž'
# The starred page which was rewarded on the main page when I created
# the data.
start_pages = ['XCOM: Enemy Unknown']
wikipedia_code = code
case_mapping = True

View File

@ -30,6 +30,7 @@ set(
LangModels/LangRomanianModel.cpp
LangModels/LangRussianModel.cpp
LangModels/LangSlovakModel.cpp
LangModels/LangSloveneModel.cpp
LangModels/LangSpanishModel.cpp
LangModels/LangThaiModel.cpp
LangModels/LangTurkishModel.cpp

View File

@ -0,0 +1,259 @@
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Original Code is Mozilla Communicator client code.
*
* The Initial Developer of the Original Code is
* Netscape Communications Corporation.
* Portions created by the Initial Developer are Copyright (C) 1998
* the Initial Developer. All Rights Reserved.
*
* Contributor(s):
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include "../nsSBCharSetProber.h"
/********* Language model for: Slovene *********/
/**
* Generated by BuildLangModel.py
* On: 2016-09-28 22:06:46.134717
**/
/* Character Mapping Table:
* ILL: illegal character.
* CTR: control character specific to the charset.
* RET: carriage/return.
* SYM: symbol (punctuation) that does not belong to word.
* NUM: 0 - 9.
*
* Other characters are ordered by probabilities
* (0 is the most common character in the language).
*
* Orders are generic to a language. So the codepoint with order X in
* CHARSET1 maps to the same character as the codepoint with the same
* order X in CHARSET2 for the same language.
* As such, it is possible to get missing order. For instance the
* ligature of 'o' and 'e' exists in ISO-8859-15 but not in ISO-8859-1
* even though they are both used for French. Same for the euro sign.
*/
static const unsigned char Iso_8859_2_CharToOrderMap[] =
{
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
SYM, 41,SYM, 42,SYM, 43, 44,SYM,SYM, 22, 45, 46, 47,SYM, 23, 48, /* AX */
SYM, 49,SYM, 50,SYM, 51, 52,SYM,SYM, 22, 53, 54, 55,SYM, 23, 56, /* BX */
57, 32, 58, 59, 60, 61, 37, 34, 21, 29, 62, 36, 63, 30, 64, 65, /* CX */
66, 67, 68, 31, 35, 69, 70,SYM, 71, 72, 39, 73, 74, 40, 75, 76, /* DX */
77, 32, 78, 79, 80, 81, 37, 34, 21, 29, 82, 36, 83, 30, 84, 85, /* EX */
86, 87, 88, 31, 35, 89, 90,SYM, 91, 92, 39, 93, 94, 40, 95,SYM, /* FX */
};
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
static const unsigned char Iso_8859_16_CharToOrderMap[] =
{
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
SYM, 96, 97, 98,SYM,SYM, 22,SYM, 22,SYM, 99,SYM,100,SYM,101,102, /* AX */
SYM,SYM, 21,103, 23,SYM,SYM,SYM, 23, 21,104,SYM,105,106,107,108, /* BX */
109, 32,110,111,112, 37,113, 34,114, 29, 33, 36,115, 30,116,117, /* CX */
118,119,120, 31, 35,121,122,123,124,125, 39,126,127,128,129,130, /* DX */
131, 32,132,133,134, 37,135, 34,136, 29, 33, 36,137, 30,138,139, /* EX */
140,141,142, 31, 35,143,144,145,146,147, 39,148,149,150,151,152, /* FX */
};
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
static const unsigned char Windows_1250_CharToOrderMap[] =
{
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
SYM,ILL,SYM,ILL,SYM,SYM,SYM,SYM,ILL,SYM, 22,SYM,153,154, 23,155, /* 8X */
ILL,SYM,SYM,SYM,SYM,SYM,SYM,SYM,ILL,SYM, 22,SYM,156,157, 23,158, /* 9X */
SYM,SYM,SYM,159,SYM,160,SYM,SYM,SYM,SYM,161,SYM,SYM,SYM,SYM,162, /* AX */
SYM,SYM,SYM,163,SYM,SYM,SYM,SYM,SYM,164,165,SYM,166,SYM,167,168, /* BX */
169, 32,170,171,172,173, 37, 34, 21, 29,174, 36,175, 30,176,177, /* CX */
178,179,180, 31, 35,181,182,SYM,183,184, 39,185,186, 40,187,188, /* DX */
189, 32,190,191,192,193, 37, 34, 21, 29,194, 36,195, 30,196,197, /* EX */
198,199,200, 31, 35,201,202,SYM,203,204, 39,205,206, 40,207,SYM, /* FX */
};
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
static const unsigned char Mac_Centraleurope_CharToOrderMap[] =
{
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
208,209,210, 29,211,212,213, 32,214, 21,215, 21, 37, 37, 29,216, /* 8X */
217,218, 30,219, 38, 38,220, 31,221, 35,222,223, 39,224,225,226, /* 9X */
SYM,SYM,227,SYM,SYM,SYM,SYM,228,SYM,SYM,SYM,229,SYM,SYM,230,231, /* AX */
232,233,SYM,SYM,234,235,SYM,SYM,236,237,238,239,240,241,242,243, /* BX */
244,245,SYM,SYM,246,247,SYM,SYM,SYM,SYM,SYM,248,249,249,249,249, /* CX */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,249,249,249,249,SYM,SYM,249,249, /* DX */
249, 22,SYM,SYM, 22,249,249, 32,249,249, 30, 23, 23,249, 31, 35, /* EX */
249,249, 39,249,249,249,249,249, 40, 40,249,249,249,249,249,SYM, /* FX */
};
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
static const unsigned char Ibm852_CharToOrderMap[] =
{
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
34,249, 29,249,249,249, 37, 34,249, 36,249,249,249,249,249, 37, /* 8X */
29,249,249, 35,249,249,249,249,249,249,249,249,249,249,SYM, 21, /* 9X */
32, 30, 31, 39,249,249, 23, 23,249,249,SYM,249, 21,249,SYM,SYM, /* AX */
SYM,SYM,SYM,SYM,SYM, 32,249,249,249,SYM,SYM,SYM,SYM,249,249,SYM, /* BX */
SYM,SYM,SYM,SYM,SYM,SYM,249,249,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* CX */
249,249,249, 36,249,249, 30,249,249,SYM,SYM,SYM,SYM,249,249,SYM, /* DX */
31,249, 35,249,249,249, 22, 22,249, 39,249,249, 40, 40,249,SYM, /* EX */
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,249,249,249,SYM,SYM, /* FX */
};
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
/* Model Table:
* Total sequences: 727
* First 512 sequences: 0.9983524317161332
* Next 512 sequences (512-1024): 0.0016475682838668457
* Rest: -3.859759734048396e-17
* Negative sequences: TODO
*/
static const PRUint8 SloveneLangModel[] =
{
2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,2,0,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,2,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,2,2,
3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,2,3,2,3,3,3,2,0,0,3,2,3,3,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,0,0,0,3,2,3,3,0,
3,3,3,3,3,2,3,3,0,0,3,3,3,3,3,2,3,2,3,3,3,2,3,0,0,0,0,0,0,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,2,3,3,2,3,2,0,
3,3,3,3,3,3,3,3,3,3,0,3,3,3,3,3,3,3,2,3,3,3,3,2,2,2,2,0,0,
3,3,3,3,3,3,3,3,2,3,0,3,3,3,2,2,3,3,3,3,3,2,2,0,0,0,3,2,2,
3,3,3,3,3,3,3,3,3,3,3,0,2,3,3,2,3,0,2,3,3,0,3,0,2,0,3,2,0,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,2,3,2,2,3,2,0,
3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,3,2,3,3,2,2,2,0,2,2,3,2,0,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,2,0,2,0,0,0,
3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,2,0,0,
3,3,3,3,3,3,3,2,0,3,3,3,2,2,2,0,3,2,3,2,3,0,0,0,2,2,2,2,0,
3,3,3,3,3,3,3,3,3,3,3,2,2,3,3,0,3,0,2,2,0,3,3,2,2,0,3,0,0,
3,3,3,3,3,3,3,3,0,3,2,3,3,3,2,2,3,2,2,3,3,0,0,0,2,2,3,2,2,
3,3,3,3,3,3,2,3,0,3,3,3,3,2,2,2,3,0,2,0,0,2,0,0,2,0,2,2,0,
3,3,3,3,3,3,0,0,3,3,2,2,3,2,0,0,3,0,2,2,0,0,2,0,0,0,0,0,0,
3,3,3,3,3,2,0,3,3,3,2,3,3,0,0,0,3,0,0,0,0,3,0,2,0,0,0,0,0,
3,3,3,2,3,2,0,2,3,3,2,0,3,0,0,0,3,2,3,2,0,0,0,2,0,0,0,0,0,
3,3,3,3,2,3,3,3,0,3,0,0,0,2,2,0,3,2,0,2,2,0,0,0,3,2,2,2,0,
3,3,3,3,2,2,2,3,0,0,2,3,0,2,2,0,3,2,3,3,2,0,0,0,2,2,2,2,0,
3,3,2,3,3,2,3,3,3,3,0,2,2,2,2,0,2,2,2,3,2,0,0,0,0,2,0,2,0,
3,3,3,3,3,0,3,0,0,2,0,0,0,0,2,0,2,2,2,0,2,0,0,0,2,0,2,3,0,
0,0,0,0,2,0,0,2,0,2,0,0,0,0,0,0,3,0,0,2,0,0,0,0,0,0,0,0,0,
};
const SequenceModel Iso_8859_2SloveneModel =
{
Iso_8859_2_CharToOrderMap,
SloveneLangModel,
29,
(float)0.9983524317161332,
PR_TRUE,
"ISO-8859-2"
};
const SequenceModel Iso_8859_16SloveneModel =
{
Iso_8859_16_CharToOrderMap,
SloveneLangModel,
29,
(float)0.9983524317161332,
PR_TRUE,
"ISO-8859-16"
};
const SequenceModel Windows_1250SloveneModel =
{
Windows_1250_CharToOrderMap,
SloveneLangModel,
29,
(float)0.9983524317161332,
PR_TRUE,
"WINDOWS-1250"
};
const SequenceModel Mac_CentraleuropeSloveneModel =
{
Mac_Centraleurope_CharToOrderMap,
SloveneLangModel,
29,
(float)0.9983524317161332,
PR_TRUE,
"MAC-CENTRALEUROPE"
};
const SequenceModel Ibm852SloveneModel =
{
Ibm852_CharToOrderMap,
SloveneLangModel,
29,
(float)0.9983524317161332,
PR_TRUE,
"IBM852"
};

View File

@ -179,6 +179,12 @@ nsSBCSGroupProber::nsSBCSGroupProber()
mProbers[87] = new nsSingleByteCharSetProber(&Iso_8859_16RomanianModel);
mProbers[88] = new nsSingleByteCharSetProber(&Ibm852RomanianModel);
mProbers[89] = new nsSingleByteCharSetProber(&Windows_1250SloveneModel);
mProbers[90] = new nsSingleByteCharSetProber(&Iso_8859_2SloveneModel);
mProbers[91] = new nsSingleByteCharSetProber(&Iso_8859_16SloveneModel);
mProbers[92] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSloveneModel);
mProbers[93] = new nsSingleByteCharSetProber(&Ibm852SloveneModel);
Reset();
}

View File

@ -40,7 +40,7 @@
#define nsSBCSGroupProber_h__
#define NUM_OF_SBCS_PROBERS 89
#define NUM_OF_SBCS_PROBERS 94
class nsCharSetProber;
class nsSBCSGroupProber: public nsCharSetProber {

View File

@ -240,5 +240,11 @@ extern const SequenceModel Iso_8859_2RomanianModel;
extern const SequenceModel Iso_8859_16RomanianModel;
extern const SequenceModel Ibm852RomanianModel;
extern const SequenceModel Windows_1250SloveneModel;
extern const SequenceModel Iso_8859_2SloveneModel;
extern const SequenceModel Iso_8859_16SloveneModel;
extern const SequenceModel Ibm852SloveneModel;
extern const SequenceModel Mac_CentraleuropeSloveneModel;
#endif /* nsSingleByteCharSetProber_h__ */

9
test/sl/ibm852.txt Normal file
View File

@ -0,0 +1,9 @@
Naselj」i plan<61> je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
zmo呈n razviti in ohranjati 告vljenje.
Ker je obstoj nezemeljskega 告vljenja trenutno negotov, je raziskovanje
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in zna齚lnosti
Sonca in celotnega Oson啳a, ki govorijo v prid razvitju 告vljenja. 熰 posebej so
pomembni faktorji, ki so ohranili zapletene, mnogoceli俲e organizme in ne le
preprosta, enoceli俲a 告va bitja, mikroorganizme. Raziskovanje in teorija v tej
smeri je del planetologije in razvijajo鬉 astrobiologije.

9
test/sl/iso-8859-16.txt Normal file
View File

@ -0,0 +1,9 @@
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
zmo¸en razviti in ohranjati ¸ivljenje.
Ker je obstoj nezemeljskega ¸ivljenja trenutno negotov, je raziskovanje
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in zna¹ilnosti
Sonca in celotnega Oson¹ja, ki govorijo v prid razvitju ¸ivljenja. ¦e posebej so
pomembni faktorji, ki so ohranili zapletene, mnogoceli¹ne organizme in ne le
preprosta, enoceli¹na ¸iva bitja, mikroorganizme. Raziskovanje in teorija v tej
smeri je del planetologije in razvijajo¹e astrobiologije.

9
test/sl/iso-8859-2.txt Normal file
View File

@ -0,0 +1,9 @@
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
zmožen razviti in ohranjati življenje.
Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
smeri je del planetologije in razvijajoče astrobiologije.

View File

@ -0,0 +1,9 @@
Naseljvi planŽt je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
zmoěen razviti in ohranjati ěivljenje.
Ker je obstoj nezemeljskega ěivljenja trenutno negotov, je raziskovanje
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in znailnosti
Sonca in celotnega Osonja, ki govorijo v prid razvitju ěivljenja. áe posebej so
pomembni faktorji, ki so ohranili zapletene, mnogoceline organizme in ne le
preprosta, enocelina ěiva bitja, mikroorganizme. Raziskovanje in teorija v tej
smeri je del planetologije in razvijajoe astrobiologije.

9
test/sl/utf-8.txt Normal file
View File

@ -0,0 +1,9 @@
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
zmožen razviti in ohranjati življenje.
Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
smeri je del planetologije in razvijajoče astrobiologije.

9
test/sl/windows-1250.txt Normal file
View File

@ -0,0 +1,9 @@
Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
zmožen razviti in ohranjati življenje.
Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
smeri je del planetologije in razvijajoče astrobiologije.