mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
LangModels: add Swedish support.
Encodings: ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-15 and WINDOWS-1252. Test text from https://sv.wikipedia.org/wiki/Mölle
This commit is contained in:
parent
d62154bd6e
commit
119fed7e8d
@ -142,6 +142,12 @@ Techniques used by universalchardet are described at http://www.mozilla.org/proj
|
|||||||
* ISO-8859-1
|
* ISO-8859-1
|
||||||
* ISO-8859-15
|
* ISO-8859-15
|
||||||
* WINDOWS-1252
|
* WINDOWS-1252
|
||||||
|
* Swedish
|
||||||
|
* ISO-8859-1
|
||||||
|
* ISO-8859-4
|
||||||
|
* ISO-8859-9
|
||||||
|
* ISO-8859-15
|
||||||
|
* WINDOWS-1252
|
||||||
* Thai
|
* Thai
|
||||||
* TIS-620
|
* TIS-620
|
||||||
* ISO-8859-11
|
* ISO-8859-11
|
||||||
|
|||||||
151
script/BuildLangModelLogs/LangSwedishModel.log
Normal file
151
script/BuildLangModelLogs/LangSwedishModel.log
Normal file
@ -0,0 +1,151 @@
|
|||||||
|
= Logs of language model for Swedish (sv) =
|
||||||
|
|
||||||
|
- Generated by BuildLangModel.py
|
||||||
|
- Started: 2016-09-28 22:26:37.221506
|
||||||
|
- Maximum depth: 5
|
||||||
|
- Max number of pages: 100
|
||||||
|
|
||||||
|
== Parsed pages ==
|
||||||
|
|
||||||
|
Kakapo (revision 36509929)
|
||||||
|
Akut hotad (revision 32517788)
|
||||||
|
Aotearoa (revision 36575359)
|
||||||
|
Art (revision 36771341)
|
||||||
|
Artepitet (revision 36771341)
|
||||||
|
Auckland (revision 35752058)
|
||||||
|
Auktorsnamn (revision 35976965)
|
||||||
|
BBC (revision 36508743)
|
||||||
|
Basalomsättning (revision 30567523)
|
||||||
|
Beilschmiedia tawa (revision 29101923)
|
||||||
|
Berguv (revision 36295501)
|
||||||
|
Betesmark (revision 34292168)
|
||||||
|
Biotop (revision 35528052)
|
||||||
|
BirdLife International (revision 36124283)
|
||||||
|
Bonaparte (revision 37325183)
|
||||||
|
British Museum (revision 36420244)
|
||||||
|
Bröstben (revision 30602527)
|
||||||
|
Dacrydium cupressinum (revision 32986501)
|
||||||
|
Digital object identifier (revision 27637223)
|
||||||
|
Djur (revision 37300775)
|
||||||
|
Djurpark (revision 37147093)
|
||||||
|
Domän (biologi) (revision 33377709)
|
||||||
|
Don Merton (revision 36509929)
|
||||||
|
Douglas Adams (revision 36556245)
|
||||||
|
Däggdjur (revision 37328286)
|
||||||
|
Ekologisk nisch (revision 33898643)
|
||||||
|
Ekosystem (revision 36598266)
|
||||||
|
Endemisk (revision 30647109)
|
||||||
|
Eukaryoter (revision 37095313)
|
||||||
|
Evolution (revision 37093592)
|
||||||
|
Familj (biologi) (revision 30280200)
|
||||||
|
Femininum (revision 30597527)
|
||||||
|
Fjäder (biologi) (revision 36364943)
|
||||||
|
Fjäderdräkt (revision 36364943)
|
||||||
|
Fladdermöss (revision 37307257)
|
||||||
|
Flygg (revision 36479633)
|
||||||
|
Frukter (revision 34088588)
|
||||||
|
Frö (revision 37333131)
|
||||||
|
Fågelläte (revision 34034723)
|
||||||
|
Fåglar (revision 37387306)
|
||||||
|
Fåglarnas liv (revision 36509929)
|
||||||
|
Genitiv (revision 37388438)
|
||||||
|
George Edward Grey (revision 36509929)
|
||||||
|
George Robert Gray (revision 20426710)
|
||||||
|
Haasts örn (revision 29175076)
|
||||||
|
Hauturu/Little Barrier Island (revision 36509929)
|
||||||
|
Hermelin (revision 36578682)
|
||||||
|
Hertz (revision 37104488)
|
||||||
|
Hjortdjur (revision 36493550)
|
||||||
|
Hund (revision 37351832)
|
||||||
|
Husdjur (revision 37384850)
|
||||||
|
Huskatt (revision 32922967)
|
||||||
|
Hāngi (revision 29609696)
|
||||||
|
IUCN (revision 30570280)
|
||||||
|
Iller (revision 30663158)
|
||||||
|
Infraröd (revision 36770733)
|
||||||
|
Internationella naturvårdsunionen (revision 30570280)
|
||||||
|
Jordbruk (revision 37352625)
|
||||||
|
Kahurangi National Park (revision 35956142)
|
||||||
|
Kamouflage (revision 36579595)
|
||||||
|
Kaniner (revision 36877621)
|
||||||
|
Kapiti Island (revision 37395588)
|
||||||
|
Katt (revision 36734686)
|
||||||
|
Kelp (revision 30312471)
|
||||||
|
Kivier (revision 36373234)
|
||||||
|
Klass (biologi) (revision 30280201)
|
||||||
|
Kroppsfett (revision 35066611)
|
||||||
|
Könsdimorfism (revision 30816932)
|
||||||
|
Könsfördelning (revision 24769321)
|
||||||
|
Lamm- och fårkött (revision 36187205)
|
||||||
|
Lek (fortplantningsbeteende) (revision 30508235)
|
||||||
|
Mandel (revision 36577529)
|
||||||
|
Maori (revision 32560474)
|
||||||
|
Maorier (revision 35862066)
|
||||||
|
Maoripapegojor (revision 36545138)
|
||||||
|
Mark Carwardine (revision 20375916)
|
||||||
|
Markpapegoja (revision 36295722)
|
||||||
|
Maskulinum (revision 32704551)
|
||||||
|
Masterton (revision 29859631)
|
||||||
|
Metrosideros umbellata (revision 29071212)
|
||||||
|
Milford Sound (revision 20284758)
|
||||||
|
Morrhår (revision 36533839)
|
||||||
|
Muskelmage (revision 31196380)
|
||||||
|
Mustela (revision 20934105)
|
||||||
|
Mårddjur (revision 37306347)
|
||||||
|
Māori (revision 32560474)
|
||||||
|
NHNZ (revision 36509929)
|
||||||
|
Nattpapegoja (revision 33486517)
|
||||||
|
Nordön (revision 24810231)
|
||||||
|
Nya Zeeland (revision 36575359)
|
||||||
|
Näbb (revision 23648463)
|
||||||
|
Ollonår (revision 36509929)
|
||||||
|
Ordning (biologi) (revision 30280196)
|
||||||
|
|
||||||
|
== End of Parsed pages ==
|
||||||
|
|
||||||
|
- Wikipedia parsing ended at: 2016-09-28 22:29:21.480287
|
||||||
|
|
||||||
|
48 characters appeared 594415 times.
|
||||||
|
|
||||||
|
First 31 characters:
|
||||||
|
[ 0] Char a: 10.070741821791172 %
|
||||||
|
[ 1] Char e: 9.737136512369304 %
|
||||||
|
[ 2] Char r: 9.110638190489809 %
|
||||||
|
[ 3] Char n: 8.378826240925951 %
|
||||||
|
[ 4] Char t: 7.481305148759705 %
|
||||||
|
[ 5] Char s: 5.828587771169974 %
|
||||||
|
[ 6] Char i: 5.359891658184939 %
|
||||||
|
[ 7] Char l: 5.173489901836259 %
|
||||||
|
[ 8] Char o: 4.694195133029954 %
|
||||||
|
[ 9] Char d: 4.597293136949774 %
|
||||||
|
[10] Char k: 3.297359588839447 %
|
||||||
|
[11] Char m: 3.1898589369379975 %
|
||||||
|
[12] Char g: 3.004466576381821 %
|
||||||
|
[13] Char v: 2.2324470277499726 %
|
||||||
|
[14] Char f: 2.1988005013332437 %
|
||||||
|
[15] Char p: 2.06017681249632 %
|
||||||
|
[16] Char u: 2.0499146219392173 %
|
||||||
|
[17] Char ä: 2.0475593650900468 %
|
||||||
|
[18] Char h: 2.028380845032511 %
|
||||||
|
[19] Char å: 1.5443755625278637 %
|
||||||
|
[20] Char c: 1.442594820117258 %
|
||||||
|
[21] Char ö: 1.3515809661600062 %
|
||||||
|
[22] Char b: 1.268642278542769 %
|
||||||
|
[23] Char j: 0.7302978558751041 %
|
||||||
|
[24] Char y: 0.6699023409570755 %
|
||||||
|
[25] Char x: 0.2111319532649748 %
|
||||||
|
[26] Char w: 0.10262190557102362 %
|
||||||
|
[27] Char z: 0.09151855185350302 %
|
||||||
|
[28] Char é: 0.021197311642539303 %
|
||||||
|
[29] Char ā: 0.011103353717520588 %
|
||||||
|
[30] Char q: 0.007570468443764037 %
|
||||||
|
|
||||||
|
The first 31 characters have an accumulated ratio of 0.999936071599808.
|
||||||
|
|
||||||
|
748 sequences found.
|
||||||
|
|
||||||
|
First 512 (typical positive ratio): 0.997323508584682
|
||||||
|
Next 512 (512-1024): 1.6823263208364526e-06
|
||||||
|
Rest: 1.7780915628762273e-17
|
||||||
|
|
||||||
|
- Processing end: 2016-09-28 22:29:21.590354
|
||||||
56
script/langs/sv.py
Normal file
56
script/langs/sv.py
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
#!/bin/python3
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
# ##### BEGIN LICENSE BLOCK #####
|
||||||
|
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
#
|
||||||
|
# The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
# the License. You may obtain a copy of the License at
|
||||||
|
# http://www.mozilla.org/MPL/
|
||||||
|
#
|
||||||
|
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
# for the specific language governing rights and limitations under the
|
||||||
|
# License.
|
||||||
|
#
|
||||||
|
# The Original Code is Mozilla Universal charset detector code.
|
||||||
|
#
|
||||||
|
# The Initial Developer of the Original Code is
|
||||||
|
# Netscape Communications Corporation.
|
||||||
|
# Portions created by the Initial Developer are Copyright (C) 2001
|
||||||
|
# the Initial Developer. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Contributor(s):
|
||||||
|
# Jehan <jehan@girinstud.io>
|
||||||
|
#
|
||||||
|
# Alternatively, the contents of this file may be used under the terms of
|
||||||
|
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
# of those above. If you wish to allow use of your version of this file only
|
||||||
|
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
# use your version of this file under the terms of the MPL, indicate your
|
||||||
|
# decision by deleting the provisions above and replace them with the notice
|
||||||
|
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
# the provisions above, a recipient may use your version of this file under
|
||||||
|
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
#
|
||||||
|
# ##### END LICENSE BLOCK #####
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
## Mandatory Properties ##
|
||||||
|
|
||||||
|
name = 'Swedish'
|
||||||
|
code = 'sv'
|
||||||
|
use_ascii = True
|
||||||
|
charsets = ['ISO-8859-1', 'ISO-8859-4', 'ISO-8859-9',
|
||||||
|
'ISO-8859-15', 'WINDOWS-1252']
|
||||||
|
|
||||||
|
## Optional Properties ##
|
||||||
|
|
||||||
|
alphabet = 'åäö'
|
||||||
|
start_pages = ['Kakapo']
|
||||||
|
wikipedia_code = code
|
||||||
|
case_mapping = True
|
||||||
@ -31,6 +31,7 @@ set(
|
|||||||
LangModels/LangRussianModel.cpp
|
LangModels/LangRussianModel.cpp
|
||||||
LangModels/LangSlovakModel.cpp
|
LangModels/LangSlovakModel.cpp
|
||||||
LangModels/LangSloveneModel.cpp
|
LangModels/LangSloveneModel.cpp
|
||||||
|
LangModels/LangSwedishModel.cpp
|
||||||
LangModels/LangSpanishModel.cpp
|
LangModels/LangSpanishModel.cpp
|
||||||
LangModels/LangThaiModel.cpp
|
LangModels/LangThaiModel.cpp
|
||||||
LangModels/LangTurkishModel.cpp
|
LangModels/LangTurkishModel.cpp
|
||||||
|
|||||||
261
src/LangModels/LangSwedishModel.cpp
Normal file
261
src/LangModels/LangSwedishModel.cpp
Normal file
@ -0,0 +1,261 @@
|
|||||||
|
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
|
||||||
|
/* ***** BEGIN LICENSE BLOCK *****
|
||||||
|
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||||
|
*
|
||||||
|
* The contents of this file are subject to the Mozilla Public License Version
|
||||||
|
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||||
|
* the License. You may obtain a copy of the License at
|
||||||
|
* http://www.mozilla.org/MPL/
|
||||||
|
*
|
||||||
|
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||||
|
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||||
|
* for the specific language governing rights and limitations under the
|
||||||
|
* License.
|
||||||
|
*
|
||||||
|
* The Original Code is Mozilla Communicator client code.
|
||||||
|
*
|
||||||
|
* The Initial Developer of the Original Code is
|
||||||
|
* Netscape Communications Corporation.
|
||||||
|
* Portions created by the Initial Developer are Copyright (C) 1998
|
||||||
|
* the Initial Developer. All Rights Reserved.
|
||||||
|
*
|
||||||
|
* Contributor(s):
|
||||||
|
*
|
||||||
|
* Alternatively, the contents of this file may be used under the terms of
|
||||||
|
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||||
|
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||||
|
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||||
|
* of those above. If you wish to allow use of your version of this file only
|
||||||
|
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||||
|
* use your version of this file under the terms of the MPL, indicate your
|
||||||
|
* decision by deleting the provisions above and replace them with the notice
|
||||||
|
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||||
|
* the provisions above, a recipient may use your version of this file under
|
||||||
|
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||||
|
*
|
||||||
|
* ***** END LICENSE BLOCK ***** */
|
||||||
|
|
||||||
|
#include "../nsSBCharSetProber.h"
|
||||||
|
|
||||||
|
/********* Language model for: Swedish *********/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generated by BuildLangModel.py
|
||||||
|
* On: 2016-09-28 22:29:21.480940
|
||||||
|
**/
|
||||||
|
|
||||||
|
/* Character Mapping Table:
|
||||||
|
* ILL: illegal character.
|
||||||
|
* CTR: control character specific to the charset.
|
||||||
|
* RET: carriage/return.
|
||||||
|
* SYM: symbol (punctuation) that does not belong to word.
|
||||||
|
* NUM: 0 - 9.
|
||||||
|
*
|
||||||
|
* Other characters are ordered by probabilities
|
||||||
|
* (0 is the most common character in the language).
|
||||||
|
*
|
||||||
|
* Orders are generic to a language. So the codepoint with order X in
|
||||||
|
* CHARSET1 maps to the same character as the codepoint with the same
|
||||||
|
* order X in CHARSET2 for the same language.
|
||||||
|
* As such, it is possible to get missing order. For instance the
|
||||||
|
* ligature of 'o' and 'e' exists in ISO-8859-15 but not in ISO-8859-1
|
||||||
|
* even though they are both used for French. Same for the euro sign.
|
||||||
|
*/
|
||||||
|
static const unsigned char Windows_1252_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
SYM,ILL,SYM, 34,SYM,SYM,SYM,SYM,SYM,SYM, 48,SYM, 49,ILL, 50,ILL, /* 8X */
|
||||||
|
ILL,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, 51,SYM, 52,ILL, 53, 54, /* 9X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM, 55,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* BX */
|
||||||
|
56, 44, 57, 58, 17, 19, 38, 40, 32, 28, 45, 59, 60, 61, 47, 62, /* CX */
|
||||||
|
63, 64, 65, 66, 35, 67, 21,SYM, 37, 68, 69, 70, 31, 71, 72, 73, /* DX */
|
||||||
|
74, 44, 75, 76, 17, 19, 38, 40, 32, 28, 45, 77, 78, 79, 47, 80, /* EX */
|
||||||
|
81, 82, 83, 84, 35, 85, 21,SYM, 37, 86, 87, 88, 31, 89, 90, 91, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_9_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM, 92,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* BX */
|
||||||
|
93, 44, 94, 95, 17, 19, 38, 40, 32, 28, 45, 96, 97, 98, 47, 99, /* CX */
|
||||||
|
100,101,102,103, 35,104, 21,SYM, 37,105,106,107, 31,108,109,110, /* DX */
|
||||||
|
111, 44,112,113, 17, 19, 38, 40, 32, 28, 45,114,115,116, 47,117, /* EX */
|
||||||
|
118,119,120,121, 35,122, 21,SYM, 37,123,124,125, 31, 42,126,127, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_1_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,128,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* BX */
|
||||||
|
129, 44,130,131, 17, 19, 38, 40, 32, 28, 45,132,133,134, 47,135, /* CX */
|
||||||
|
136,137,138,139, 35,140, 21,SYM, 37,141,142,143, 31,144,145,146, /* DX */
|
||||||
|
147, 44,148,149, 17, 19, 38, 40, 32, 28, 45,150,151,152, 47,153, /* EX */
|
||||||
|
154,155,156,157, 35,158, 21,SYM, 37,159,160,161, 31,162,163,164, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_4_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM,165,166,167,SYM,168,169,SYM,SYM,170,171,172,173,SYM,174,SYM, /* AX */
|
||||||
|
SYM,175,SYM,176,SYM,177,178,SYM,SYM,179,180,181,182, 43,183, 43, /* BX */
|
||||||
|
29, 44,184,185, 17, 19, 38,186,187, 28,188,189, 39,190, 47, 41, /* CX */
|
||||||
|
191,192, 33,193, 35,194, 21,SYM, 37, 36,195,196, 31,197, 46,198, /* DX */
|
||||||
|
29, 44,199,200, 17, 19, 38,201,202, 28,203,204, 39,205, 47, 41, /* EX */
|
||||||
|
206,207, 33,208, 35,209, 21,SYM, 37, 36,210,211, 31,212, 46,SYM, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
static const unsigned char Iso_8859_15_CharToOrderMap[] =
|
||||||
|
{
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||||
|
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||||
|
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||||
|
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||||
|
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||||
|
SYM,SYM,SYM,SYM,SYM,SYM,213,SYM,214,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||||
|
SYM,SYM,SYM,SYM,215,216,SYM,SYM,217,SYM,SYM,SYM,218,219,220,SYM, /* BX */
|
||||||
|
221, 44,222,223, 17, 19, 38, 40, 32, 28, 45,224,225,226, 47,227, /* CX */
|
||||||
|
228,229,230,231, 35,232, 21,SYM, 37,233,234,235, 31,236,237,238, /* DX */
|
||||||
|
239, 44,240,241, 17, 19, 38, 40, 32, 28, 45,242,243,244, 47,245, /* EX */
|
||||||
|
246,247,248,249, 35,249, 21,SYM, 37,249,249,249, 31,249,249,249, /* FX */
|
||||||
|
};
|
||||||
|
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||||
|
|
||||||
|
|
||||||
|
/* Model Table:
|
||||||
|
* Total sequences: 748
|
||||||
|
* First 512 sequences: 0.997323508584682
|
||||||
|
* Next 512 sequences (512-1024): 0.0026764914153179875
|
||||||
|
* Rest: 1.7780915628762273e-17
|
||||||
|
* Negative sequences: TODO
|
||||||
|
*/
|
||||||
|
static const PRUint8 SwedishLangModel[] =
|
||||||
|
{
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,0,3,2,3,3,3,3,3,2,0,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,2,3,2,3,3,3,3,3,3,0,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,2,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,2,2,2,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,2,2,2,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,2,2,2,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,3,3,2,2,2,2,3,0,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,2,2,2,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,2,3,3,2,3,3,2,2,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,3,3,3,0,2,0,2,0,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,2,3,0,2,0,2,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,2,0,3,3,2,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,3,3,3,0,2,2,0,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,3,3,2,2,2,3,2,3,3,3,3,0,2,3,2,0,0,0,2,0,0,0,
|
||||||
|
3,3,3,2,3,2,3,3,3,2,0,2,2,2,3,2,3,3,0,3,2,3,0,3,3,0,0,0,2,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,2,3,2,3,3,3,3,3,3,3,3,2,2,2,2,3,2,0,2,3,2,0,
|
||||||
|
3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,2,0,2,0,3,2,3,2,0,3,0,0,0,2,0,
|
||||||
|
2,2,3,3,3,3,0,3,0,3,3,3,3,3,3,3,2,2,0,0,3,0,3,0,0,3,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,2,3,2,3,2,2,2,2,0,0,0,3,3,2,3,2,3,2,3,3,0,0,3,0,2,0,
|
||||||
|
2,3,3,3,3,3,2,3,0,3,3,3,3,3,2,0,0,0,2,0,0,2,3,0,0,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,2,3,3,3,2,3,2,2,2,2,0,3,0,3,0,3,2,2,0,3,0,0,2,2,0,2,
|
||||||
|
3,3,3,3,3,3,2,3,2,3,3,3,3,3,2,3,0,2,2,0,3,2,2,3,0,0,0,0,0,0,0,
|
||||||
|
3,3,3,3,3,3,3,3,3,2,2,2,0,2,2,2,3,3,2,3,3,3,3,3,3,0,0,2,2,0,0,
|
||||||
|
3,3,0,2,2,3,2,3,3,3,2,0,0,0,2,0,3,3,0,0,0,3,2,0,0,0,0,0,2,0,0,
|
||||||
|
3,2,3,3,3,3,2,3,3,3,3,3,3,2,3,3,2,0,2,0,3,0,3,2,0,3,0,2,0,0,0,
|
||||||
|
3,3,0,3,3,0,3,2,3,0,2,2,0,0,2,3,2,0,2,0,0,0,2,0,2,2,0,0,0,0,0,
|
||||||
|
3,3,2,2,2,3,3,2,3,2,2,0,0,0,0,0,2,0,2,0,0,0,0,0,2,0,2,2,0,0,0,
|
||||||
|
3,3,0,2,2,0,2,0,3,0,2,0,0,0,0,0,2,0,2,0,0,0,2,0,2,0,0,2,0,0,0,
|
||||||
|
0,3,2,2,0,2,0,2,2,2,0,0,0,2,0,2,0,0,2,0,0,2,2,0,0,0,0,0,0,0,0,
|
||||||
|
0,0,0,2,0,0,2,0,3,0,2,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||||
|
0,0,0,0,0,0,2,0,0,0,0,0,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
const SequenceModel Windows_1252SwedishModel =
|
||||||
|
{
|
||||||
|
Windows_1252_CharToOrderMap,
|
||||||
|
SwedishLangModel,
|
||||||
|
31,
|
||||||
|
(float)0.997323508584682,
|
||||||
|
PR_TRUE,
|
||||||
|
"WINDOWS-1252"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_9SwedishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_9_CharToOrderMap,
|
||||||
|
SwedishLangModel,
|
||||||
|
31,
|
||||||
|
(float)0.997323508584682,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-9"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_1SwedishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_1_CharToOrderMap,
|
||||||
|
SwedishLangModel,
|
||||||
|
31,
|
||||||
|
(float)0.997323508584682,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-1"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_4SwedishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_4_CharToOrderMap,
|
||||||
|
SwedishLangModel,
|
||||||
|
31,
|
||||||
|
(float)0.997323508584682,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-4"
|
||||||
|
};
|
||||||
|
|
||||||
|
const SequenceModel Iso_8859_15SwedishModel =
|
||||||
|
{
|
||||||
|
Iso_8859_15_CharToOrderMap,
|
||||||
|
SwedishLangModel,
|
||||||
|
31,
|
||||||
|
(float)0.997323508584682,
|
||||||
|
PR_TRUE,
|
||||||
|
"ISO-8859-15"
|
||||||
|
};
|
||||||
@ -185,6 +185,12 @@ nsSBCSGroupProber::nsSBCSGroupProber()
|
|||||||
mProbers[92] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSloveneModel);
|
mProbers[92] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSloveneModel);
|
||||||
mProbers[93] = new nsSingleByteCharSetProber(&Ibm852SloveneModel);
|
mProbers[93] = new nsSingleByteCharSetProber(&Ibm852SloveneModel);
|
||||||
|
|
||||||
|
mProbers[94] = new nsSingleByteCharSetProber(&Iso_8859_1SwedishModel);
|
||||||
|
mProbers[95] = new nsSingleByteCharSetProber(&Iso_8859_4SwedishModel);
|
||||||
|
mProbers[96] = new nsSingleByteCharSetProber(&Iso_8859_9SwedishModel);
|
||||||
|
mProbers[97] = new nsSingleByteCharSetProber(&Iso_8859_15SwedishModel);
|
||||||
|
mProbers[98] = new nsSingleByteCharSetProber(&Windows_1252SwedishModel);
|
||||||
|
|
||||||
Reset();
|
Reset();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -40,7 +40,7 @@
|
|||||||
#define nsSBCSGroupProber_h__
|
#define nsSBCSGroupProber_h__
|
||||||
|
|
||||||
|
|
||||||
#define NUM_OF_SBCS_PROBERS 94
|
#define NUM_OF_SBCS_PROBERS 99
|
||||||
|
|
||||||
class nsCharSetProber;
|
class nsCharSetProber;
|
||||||
class nsSBCSGroupProber: public nsCharSetProber {
|
class nsSBCSGroupProber: public nsCharSetProber {
|
||||||
|
|||||||
@ -246,5 +246,11 @@ extern const SequenceModel Iso_8859_16SloveneModel;
|
|||||||
extern const SequenceModel Ibm852SloveneModel;
|
extern const SequenceModel Ibm852SloveneModel;
|
||||||
extern const SequenceModel Mac_CentraleuropeSloveneModel;
|
extern const SequenceModel Mac_CentraleuropeSloveneModel;
|
||||||
|
|
||||||
|
extern const SequenceModel Iso_8859_1SwedishModel;
|
||||||
|
extern const SequenceModel Iso_8859_4SwedishModel;
|
||||||
|
extern const SequenceModel Iso_8859_9SwedishModel;
|
||||||
|
extern const SequenceModel Iso_8859_15SwedishModel;
|
||||||
|
extern const SequenceModel Windows_1252SwedishModel;
|
||||||
|
|
||||||
#endif /* nsSingleByteCharSetProber_h__ */
|
#endif /* nsSingleByteCharSetProber_h__ */
|
||||||
|
|
||||||
|
|||||||
10
test/sv/iso-8859-1.txt
Normal file
10
test/sv/iso-8859-1.txt
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
Mölle är en tätort på Kullahalvön i Brunnby socken i Höganäs kommun, Skåne län.
|
||||||
|
|
||||||
|
Samhället var från början ett fiskeläge, men kom att spela en stor roll i den
|
||||||
|
framväxande turismen i Sverige i slutet av 1800-talet. Till detta bidrog - och
|
||||||
|
bidrar - Mölles natursköna läge invid Öresunds norra utlopp, med Kullaberg som
|
||||||
|
bakgrund. Gemensamhetsbad för män och kvinnor introducerades i Ransvik i början
|
||||||
|
av 1900-talet. Storhetstiden som turistort inträffade strax före första
|
||||||
|
världskriget, men även under mellankrigstiden var turistströmmarna stora.
|
||||||
|
Fortfarande är Mölle en populär turistort med en tredubbling av invånarantalet
|
||||||
|
under sommarmånaderna.
|
||||||
10
test/sv/utf-8.txt
Normal file
10
test/sv/utf-8.txt
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
Mölle är en tätort på Kullahalvön i Brunnby socken i Höganäs kommun, Skåne län.
|
||||||
|
|
||||||
|
Samhället var från början ett fiskeläge, men kom att spela en stor roll i den
|
||||||
|
framväxande turismen i Sverige i slutet av 1800-talet. Till detta bidrog – och
|
||||||
|
bidrar – Mölles natursköna läge invid Öresunds norra utlopp, med Kullaberg som
|
||||||
|
bakgrund. Gemensamhetsbad för män och kvinnor introducerades i Ransvik i början
|
||||||
|
av 1900-talet. Storhetstiden som turistort inträffade strax före första
|
||||||
|
världskriget, men även under mellankrigstiden var turistströmmarna stora.
|
||||||
|
Fortfarande är Mölle en populär turistort med en tredubbling av invånarantalet
|
||||||
|
under sommarmånaderna.
|
||||||
10
test/sv/windows-1252.txt
Normal file
10
test/sv/windows-1252.txt
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
Mölle är en tätort på Kullahalvön i Brunnby socken i Höganäs kommun, Skåne län.
|
||||||
|
|
||||||
|
Samhället var från början ett fiskeläge, men kom att spela en stor roll i den
|
||||||
|
framväxande turismen i Sverige i slutet av 1800-talet. Till detta bidrog – och
|
||||||
|
bidrar – Mölles natursköna läge invid Öresunds norra utlopp, med Kullaberg som
|
||||||
|
bakgrund. Gemensamhetsbad för män och kvinnor introducerades i Ransvik i början
|
||||||
|
av 1900-talet. Storhetstiden som turistort inträffade strax före första
|
||||||
|
världskriget, men även under mellankrigstiden var turistströmmarna stora.
|
||||||
|
Fortfarande är Mölle en populär turistort med en tredubbling av invånarantalet
|
||||||
|
under sommarmånaderna.
|
||||||
Loading…
x
Reference in New Issue
Block a user