mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 08:46:40 +08:00
LangModels: add Swedish support.
Encodings: ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-15 and WINDOWS-1252. Test text from https://sv.wikipedia.org/wiki/Mölle
This commit is contained in:
parent
d62154bd6e
commit
119fed7e8d
@ -142,6 +142,12 @@ Techniques used by universalchardet are described at http://www.mozilla.org/proj
|
||||
* ISO-8859-1
|
||||
* ISO-8859-15
|
||||
* WINDOWS-1252
|
||||
* Swedish
|
||||
* ISO-8859-1
|
||||
* ISO-8859-4
|
||||
* ISO-8859-9
|
||||
* ISO-8859-15
|
||||
* WINDOWS-1252
|
||||
* Thai
|
||||
* TIS-620
|
||||
* ISO-8859-11
|
||||
|
||||
151
script/BuildLangModelLogs/LangSwedishModel.log
Normal file
151
script/BuildLangModelLogs/LangSwedishModel.log
Normal file
@ -0,0 +1,151 @@
|
||||
= Logs of language model for Swedish (sv) =
|
||||
|
||||
- Generated by BuildLangModel.py
|
||||
- Started: 2016-09-28 22:26:37.221506
|
||||
- Maximum depth: 5
|
||||
- Max number of pages: 100
|
||||
|
||||
== Parsed pages ==
|
||||
|
||||
Kakapo (revision 36509929)
|
||||
Akut hotad (revision 32517788)
|
||||
Aotearoa (revision 36575359)
|
||||
Art (revision 36771341)
|
||||
Artepitet (revision 36771341)
|
||||
Auckland (revision 35752058)
|
||||
Auktorsnamn (revision 35976965)
|
||||
BBC (revision 36508743)
|
||||
Basalomsättning (revision 30567523)
|
||||
Beilschmiedia tawa (revision 29101923)
|
||||
Berguv (revision 36295501)
|
||||
Betesmark (revision 34292168)
|
||||
Biotop (revision 35528052)
|
||||
BirdLife International (revision 36124283)
|
||||
Bonaparte (revision 37325183)
|
||||
British Museum (revision 36420244)
|
||||
Bröstben (revision 30602527)
|
||||
Dacrydium cupressinum (revision 32986501)
|
||||
Digital object identifier (revision 27637223)
|
||||
Djur (revision 37300775)
|
||||
Djurpark (revision 37147093)
|
||||
Domän (biologi) (revision 33377709)
|
||||
Don Merton (revision 36509929)
|
||||
Douglas Adams (revision 36556245)
|
||||
Däggdjur (revision 37328286)
|
||||
Ekologisk nisch (revision 33898643)
|
||||
Ekosystem (revision 36598266)
|
||||
Endemisk (revision 30647109)
|
||||
Eukaryoter (revision 37095313)
|
||||
Evolution (revision 37093592)
|
||||
Familj (biologi) (revision 30280200)
|
||||
Femininum (revision 30597527)
|
||||
Fjäder (biologi) (revision 36364943)
|
||||
Fjäderdräkt (revision 36364943)
|
||||
Fladdermöss (revision 37307257)
|
||||
Flygg (revision 36479633)
|
||||
Frukter (revision 34088588)
|
||||
Frö (revision 37333131)
|
||||
Fågelläte (revision 34034723)
|
||||
Fåglar (revision 37387306)
|
||||
Fåglarnas liv (revision 36509929)
|
||||
Genitiv (revision 37388438)
|
||||
George Edward Grey (revision 36509929)
|
||||
George Robert Gray (revision 20426710)
|
||||
Haasts örn (revision 29175076)
|
||||
Hauturu/Little Barrier Island (revision 36509929)
|
||||
Hermelin (revision 36578682)
|
||||
Hertz (revision 37104488)
|
||||
Hjortdjur (revision 36493550)
|
||||
Hund (revision 37351832)
|
||||
Husdjur (revision 37384850)
|
||||
Huskatt (revision 32922967)
|
||||
Hāngi (revision 29609696)
|
||||
IUCN (revision 30570280)
|
||||
Iller (revision 30663158)
|
||||
Infraröd (revision 36770733)
|
||||
Internationella naturvårdsunionen (revision 30570280)
|
||||
Jordbruk (revision 37352625)
|
||||
Kahurangi National Park (revision 35956142)
|
||||
Kamouflage (revision 36579595)
|
||||
Kaniner (revision 36877621)
|
||||
Kapiti Island (revision 37395588)
|
||||
Katt (revision 36734686)
|
||||
Kelp (revision 30312471)
|
||||
Kivier (revision 36373234)
|
||||
Klass (biologi) (revision 30280201)
|
||||
Kroppsfett (revision 35066611)
|
||||
Könsdimorfism (revision 30816932)
|
||||
Könsfördelning (revision 24769321)
|
||||
Lamm- och fårkött (revision 36187205)
|
||||
Lek (fortplantningsbeteende) (revision 30508235)
|
||||
Mandel (revision 36577529)
|
||||
Maori (revision 32560474)
|
||||
Maorier (revision 35862066)
|
||||
Maoripapegojor (revision 36545138)
|
||||
Mark Carwardine (revision 20375916)
|
||||
Markpapegoja (revision 36295722)
|
||||
Maskulinum (revision 32704551)
|
||||
Masterton (revision 29859631)
|
||||
Metrosideros umbellata (revision 29071212)
|
||||
Milford Sound (revision 20284758)
|
||||
Morrhår (revision 36533839)
|
||||
Muskelmage (revision 31196380)
|
||||
Mustela (revision 20934105)
|
||||
Mårddjur (revision 37306347)
|
||||
Māori (revision 32560474)
|
||||
NHNZ (revision 36509929)
|
||||
Nattpapegoja (revision 33486517)
|
||||
Nordön (revision 24810231)
|
||||
Nya Zeeland (revision 36575359)
|
||||
Näbb (revision 23648463)
|
||||
Ollonår (revision 36509929)
|
||||
Ordning (biologi) (revision 30280196)
|
||||
|
||||
== End of Parsed pages ==
|
||||
|
||||
- Wikipedia parsing ended at: 2016-09-28 22:29:21.480287
|
||||
|
||||
48 characters appeared 594415 times.
|
||||
|
||||
First 31 characters:
|
||||
[ 0] Char a: 10.070741821791172 %
|
||||
[ 1] Char e: 9.737136512369304 %
|
||||
[ 2] Char r: 9.110638190489809 %
|
||||
[ 3] Char n: 8.378826240925951 %
|
||||
[ 4] Char t: 7.481305148759705 %
|
||||
[ 5] Char s: 5.828587771169974 %
|
||||
[ 6] Char i: 5.359891658184939 %
|
||||
[ 7] Char l: 5.173489901836259 %
|
||||
[ 8] Char o: 4.694195133029954 %
|
||||
[ 9] Char d: 4.597293136949774 %
|
||||
[10] Char k: 3.297359588839447 %
|
||||
[11] Char m: 3.1898589369379975 %
|
||||
[12] Char g: 3.004466576381821 %
|
||||
[13] Char v: 2.2324470277499726 %
|
||||
[14] Char f: 2.1988005013332437 %
|
||||
[15] Char p: 2.06017681249632 %
|
||||
[16] Char u: 2.0499146219392173 %
|
||||
[17] Char ä: 2.0475593650900468 %
|
||||
[18] Char h: 2.028380845032511 %
|
||||
[19] Char å: 1.5443755625278637 %
|
||||
[20] Char c: 1.442594820117258 %
|
||||
[21] Char ö: 1.3515809661600062 %
|
||||
[22] Char b: 1.268642278542769 %
|
||||
[23] Char j: 0.7302978558751041 %
|
||||
[24] Char y: 0.6699023409570755 %
|
||||
[25] Char x: 0.2111319532649748 %
|
||||
[26] Char w: 0.10262190557102362 %
|
||||
[27] Char z: 0.09151855185350302 %
|
||||
[28] Char é: 0.021197311642539303 %
|
||||
[29] Char ā: 0.011103353717520588 %
|
||||
[30] Char q: 0.007570468443764037 %
|
||||
|
||||
The first 31 characters have an accumulated ratio of 0.999936071599808.
|
||||
|
||||
748 sequences found.
|
||||
|
||||
First 512 (typical positive ratio): 0.997323508584682
|
||||
Next 512 (512-1024): 1.6823263208364526e-06
|
||||
Rest: 1.7780915628762273e-17
|
||||
|
||||
- Processing end: 2016-09-28 22:29:21.590354
|
||||
56
script/langs/sv.py
Normal file
56
script/langs/sv.py
Normal file
@ -0,0 +1,56 @@
|
||||
#!/bin/python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# ##### BEGIN LICENSE BLOCK #####
|
||||
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
#
|
||||
# The contents of this file are subject to the Mozilla Public License Version
|
||||
# 1.1 (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
# http://www.mozilla.org/MPL/
|
||||
#
|
||||
# Software distributed under the License is distributed on an "AS IS" basis,
|
||||
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
# for the specific language governing rights and limitations under the
|
||||
# License.
|
||||
#
|
||||
# The Original Code is Mozilla Universal charset detector code.
|
||||
#
|
||||
# The Initial Developer of the Original Code is
|
||||
# Netscape Communications Corporation.
|
||||
# Portions created by the Initial Developer are Copyright (C) 2001
|
||||
# the Initial Developer. All Rights Reserved.
|
||||
#
|
||||
# Contributor(s):
|
||||
# Jehan <jehan@girinstud.io>
|
||||
#
|
||||
# Alternatively, the contents of this file may be used under the terms of
|
||||
# either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
# in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
# of those above. If you wish to allow use of your version of this file only
|
||||
# under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
# use your version of this file under the terms of the MPL, indicate your
|
||||
# decision by deleting the provisions above and replace them with the notice
|
||||
# and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
# the provisions above, a recipient may use your version of this file under
|
||||
# the terms of any one of the MPL, the GPL or the LGPL.
|
||||
#
|
||||
# ##### END LICENSE BLOCK #####
|
||||
|
||||
import re
|
||||
|
||||
## Mandatory Properties ##
|
||||
|
||||
name = 'Swedish'
|
||||
code = 'sv'
|
||||
use_ascii = True
|
||||
charsets = ['ISO-8859-1', 'ISO-8859-4', 'ISO-8859-9',
|
||||
'ISO-8859-15', 'WINDOWS-1252']
|
||||
|
||||
## Optional Properties ##
|
||||
|
||||
alphabet = 'åäö'
|
||||
start_pages = ['Kakapo']
|
||||
wikipedia_code = code
|
||||
case_mapping = True
|
||||
@ -31,6 +31,7 @@ set(
|
||||
LangModels/LangRussianModel.cpp
|
||||
LangModels/LangSlovakModel.cpp
|
||||
LangModels/LangSloveneModel.cpp
|
||||
LangModels/LangSwedishModel.cpp
|
||||
LangModels/LangSpanishModel.cpp
|
||||
LangModels/LangThaiModel.cpp
|
||||
LangModels/LangTurkishModel.cpp
|
||||
|
||||
261
src/LangModels/LangSwedishModel.cpp
Normal file
261
src/LangModels/LangSwedishModel.cpp
Normal file
@ -0,0 +1,261 @@
|
||||
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
|
||||
/* ***** BEGIN LICENSE BLOCK *****
|
||||
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
|
||||
*
|
||||
* The contents of this file are subject to the Mozilla Public License Version
|
||||
* 1.1 (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
* http://www.mozilla.org/MPL/
|
||||
*
|
||||
* Software distributed under the License is distributed on an "AS IS" basis,
|
||||
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
|
||||
* for the specific language governing rights and limitations under the
|
||||
* License.
|
||||
*
|
||||
* The Original Code is Mozilla Communicator client code.
|
||||
*
|
||||
* The Initial Developer of the Original Code is
|
||||
* Netscape Communications Corporation.
|
||||
* Portions created by the Initial Developer are Copyright (C) 1998
|
||||
* the Initial Developer. All Rights Reserved.
|
||||
*
|
||||
* Contributor(s):
|
||||
*
|
||||
* Alternatively, the contents of this file may be used under the terms of
|
||||
* either the GNU General Public License Version 2 or later (the "GPL"), or
|
||||
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
|
||||
* in which case the provisions of the GPL or the LGPL are applicable instead
|
||||
* of those above. If you wish to allow use of your version of this file only
|
||||
* under the terms of either the GPL or the LGPL, and not to allow others to
|
||||
* use your version of this file under the terms of the MPL, indicate your
|
||||
* decision by deleting the provisions above and replace them with the notice
|
||||
* and other provisions required by the GPL or the LGPL. If you do not delete
|
||||
* the provisions above, a recipient may use your version of this file under
|
||||
* the terms of any one of the MPL, the GPL or the LGPL.
|
||||
*
|
||||
* ***** END LICENSE BLOCK ***** */
|
||||
|
||||
#include "../nsSBCharSetProber.h"
|
||||
|
||||
/********* Language model for: Swedish *********/
|
||||
|
||||
/**
|
||||
* Generated by BuildLangModel.py
|
||||
* On: 2016-09-28 22:29:21.480940
|
||||
**/
|
||||
|
||||
/* Character Mapping Table:
|
||||
* ILL: illegal character.
|
||||
* CTR: control character specific to the charset.
|
||||
* RET: carriage/return.
|
||||
* SYM: symbol (punctuation) that does not belong to word.
|
||||
* NUM: 0 - 9.
|
||||
*
|
||||
* Other characters are ordered by probabilities
|
||||
* (0 is the most common character in the language).
|
||||
*
|
||||
* Orders are generic to a language. So the codepoint with order X in
|
||||
* CHARSET1 maps to the same character as the codepoint with the same
|
||||
* order X in CHARSET2 for the same language.
|
||||
* As such, it is possible to get missing order. For instance the
|
||||
* ligature of 'o' and 'e' exists in ISO-8859-15 but not in ISO-8859-1
|
||||
* even though they are both used for French. Same for the euro sign.
|
||||
*/
|
||||
static const unsigned char Windows_1252_CharToOrderMap[] =
|
||||
{
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||
SYM,ILL,SYM, 34,SYM,SYM,SYM,SYM,SYM,SYM, 48,SYM, 49,ILL, 50,ILL, /* 8X */
|
||||
ILL,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, 51,SYM, 52,ILL, 53, 54, /* 9X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||
SYM,SYM,SYM,SYM,SYM, 55,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* BX */
|
||||
56, 44, 57, 58, 17, 19, 38, 40, 32, 28, 45, 59, 60, 61, 47, 62, /* CX */
|
||||
63, 64, 65, 66, 35, 67, 21,SYM, 37, 68, 69, 70, 31, 71, 72, 73, /* DX */
|
||||
74, 44, 75, 76, 17, 19, 38, 40, 32, 28, 45, 77, 78, 79, 47, 80, /* EX */
|
||||
81, 82, 83, 84, 35, 85, 21,SYM, 37, 86, 87, 88, 31, 89, 90, 91, /* FX */
|
||||
};
|
||||
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||
|
||||
static const unsigned char Iso_8859_9_CharToOrderMap[] =
|
||||
{
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||
SYM,SYM,SYM,SYM,SYM, 92,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* BX */
|
||||
93, 44, 94, 95, 17, 19, 38, 40, 32, 28, 45, 96, 97, 98, 47, 99, /* CX */
|
||||
100,101,102,103, 35,104, 21,SYM, 37,105,106,107, 31,108,109,110, /* DX */
|
||||
111, 44,112,113, 17, 19, 38, 40, 32, 28, 45,114,115,116, 47,117, /* EX */
|
||||
118,119,120,121, 35,122, 21,SYM, 37,123,124,125, 31, 42,126,127, /* FX */
|
||||
};
|
||||
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||
|
||||
static const unsigned char Iso_8859_1_CharToOrderMap[] =
|
||||
{
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||
SYM,SYM,SYM,SYM,SYM,128,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* BX */
|
||||
129, 44,130,131, 17, 19, 38, 40, 32, 28, 45,132,133,134, 47,135, /* CX */
|
||||
136,137,138,139, 35,140, 21,SYM, 37,141,142,143, 31,144,145,146, /* DX */
|
||||
147, 44,148,149, 17, 19, 38, 40, 32, 28, 45,150,151,152, 47,153, /* EX */
|
||||
154,155,156,157, 35,158, 21,SYM, 37,159,160,161, 31,162,163,164, /* FX */
|
||||
};
|
||||
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||
|
||||
static const unsigned char Iso_8859_4_CharToOrderMap[] =
|
||||
{
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||
SYM,165,166,167,SYM,168,169,SYM,SYM,170,171,172,173,SYM,174,SYM, /* AX */
|
||||
SYM,175,SYM,176,SYM,177,178,SYM,SYM,179,180,181,182, 43,183, 43, /* BX */
|
||||
29, 44,184,185, 17, 19, 38,186,187, 28,188,189, 39,190, 47, 41, /* CX */
|
||||
191,192, 33,193, 35,194, 21,SYM, 37, 36,195,196, 31,197, 46,198, /* DX */
|
||||
29, 44,199,200, 17, 19, 38,201,202, 28,203,204, 39,205, 47, 41, /* EX */
|
||||
206,207, 33,208, 35,209, 21,SYM, 37, 36,210,211, 31,212, 46,SYM, /* FX */
|
||||
};
|
||||
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||
|
||||
static const unsigned char Iso_8859_15_CharToOrderMap[] =
|
||||
{
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
|
||||
NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 4X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,SYM, /* 5X */
|
||||
SYM, 0, 22, 20, 9, 1, 14, 12, 18, 6, 23, 10, 7, 11, 3, 8, /* 6X */
|
||||
15, 30, 2, 5, 4, 16, 13, 26, 25, 24, 27,SYM,SYM,SYM,SYM,CTR, /* 7X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
|
||||
CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
|
||||
SYM,SYM,SYM,SYM,SYM,SYM,213,SYM,214,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* AX */
|
||||
SYM,SYM,SYM,SYM,215,216,SYM,SYM,217,SYM,SYM,SYM,218,219,220,SYM, /* BX */
|
||||
221, 44,222,223, 17, 19, 38, 40, 32, 28, 45,224,225,226, 47,227, /* CX */
|
||||
228,229,230,231, 35,232, 21,SYM, 37,233,234,235, 31,236,237,238, /* DX */
|
||||
239, 44,240,241, 17, 19, 38, 40, 32, 28, 45,242,243,244, 47,245, /* EX */
|
||||
246,247,248,249, 35,249, 21,SYM, 37,249,249,249, 31,249,249,249, /* FX */
|
||||
};
|
||||
/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
|
||||
|
||||
|
||||
/* Model Table:
|
||||
* Total sequences: 748
|
||||
* First 512 sequences: 0.997323508584682
|
||||
* Next 512 sequences (512-1024): 0.0026764914153179875
|
||||
* Rest: 1.7780915628762273e-17
|
||||
* Negative sequences: TODO
|
||||
*/
|
||||
static const PRUint8 SwedishLangModel[] =
|
||||
{
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,0,3,2,3,3,3,3,3,2,0,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,2,3,2,3,3,3,3,3,3,0,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,2,0,0,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,2,2,2,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,2,2,2,2,0,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,2,2,2,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,3,3,2,2,2,2,3,0,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,2,2,2,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,2,3,3,2,3,3,2,2,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,3,3,3,0,2,0,2,0,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,2,3,0,2,0,2,2,0,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,2,0,3,3,2,
|
||||
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,3,3,3,0,2,2,0,2,0,
|
||||
3,3,3,3,3,3,3,3,3,3,3,2,2,2,3,2,3,3,3,3,0,2,3,2,0,0,0,2,0,0,0,
|
||||
3,3,3,2,3,2,3,3,3,2,0,2,2,2,3,2,3,3,0,3,2,3,0,3,3,0,0,0,2,0,0,
|
||||
3,3,3,3,3,3,3,3,3,2,3,2,3,3,3,3,3,3,3,3,2,2,2,2,3,2,0,2,3,2,0,
|
||||
3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,2,0,2,0,3,2,3,2,0,3,0,0,0,2,0,
|
||||
2,2,3,3,3,3,0,3,0,3,3,3,3,3,3,3,2,2,0,0,3,0,3,0,0,3,0,0,0,0,0,
|
||||
3,3,3,3,3,2,3,2,3,2,2,2,2,0,0,0,3,3,2,3,2,3,2,3,3,0,0,3,0,2,0,
|
||||
2,3,3,3,3,3,2,3,0,3,3,3,3,3,2,0,0,0,2,0,0,2,3,0,0,0,0,0,0,0,0,
|
||||
3,3,3,3,3,2,3,3,3,2,3,2,2,2,2,0,3,0,3,0,3,2,2,0,3,0,0,2,2,0,2,
|
||||
3,3,3,3,3,3,2,3,2,3,3,3,3,3,2,3,0,2,2,0,3,2,2,3,0,0,0,0,0,0,0,
|
||||
3,3,3,3,3,3,3,3,3,2,2,2,0,2,2,2,3,3,2,3,3,3,3,3,3,0,0,2,2,0,0,
|
||||
3,3,0,2,2,3,2,3,3,3,2,0,0,0,2,0,3,3,0,0,0,3,2,0,0,0,0,0,2,0,0,
|
||||
3,2,3,3,3,3,2,3,3,3,3,3,3,2,3,3,2,0,2,0,3,0,3,2,0,3,0,2,0,0,0,
|
||||
3,3,0,3,3,0,3,2,3,0,2,2,0,0,2,3,2,0,2,0,0,0,2,0,2,2,0,0,0,0,0,
|
||||
3,3,2,2,2,3,3,2,3,2,2,0,0,0,0,0,2,0,2,0,0,0,0,0,2,0,2,2,0,0,0,
|
||||
3,3,0,2,2,0,2,0,3,0,2,0,0,0,0,0,2,0,2,0,0,0,2,0,2,0,0,2,0,0,0,
|
||||
0,3,2,2,0,2,0,2,2,2,0,0,0,2,0,2,0,0,2,0,0,2,2,0,0,0,0,0,0,0,0,
|
||||
0,0,0,2,0,0,2,0,3,0,2,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||
0,0,0,0,0,0,2,0,0,0,0,0,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
|
||||
};
|
||||
|
||||
|
||||
const SequenceModel Windows_1252SwedishModel =
|
||||
{
|
||||
Windows_1252_CharToOrderMap,
|
||||
SwedishLangModel,
|
||||
31,
|
||||
(float)0.997323508584682,
|
||||
PR_TRUE,
|
||||
"WINDOWS-1252"
|
||||
};
|
||||
|
||||
const SequenceModel Iso_8859_9SwedishModel =
|
||||
{
|
||||
Iso_8859_9_CharToOrderMap,
|
||||
SwedishLangModel,
|
||||
31,
|
||||
(float)0.997323508584682,
|
||||
PR_TRUE,
|
||||
"ISO-8859-9"
|
||||
};
|
||||
|
||||
const SequenceModel Iso_8859_1SwedishModel =
|
||||
{
|
||||
Iso_8859_1_CharToOrderMap,
|
||||
SwedishLangModel,
|
||||
31,
|
||||
(float)0.997323508584682,
|
||||
PR_TRUE,
|
||||
"ISO-8859-1"
|
||||
};
|
||||
|
||||
const SequenceModel Iso_8859_4SwedishModel =
|
||||
{
|
||||
Iso_8859_4_CharToOrderMap,
|
||||
SwedishLangModel,
|
||||
31,
|
||||
(float)0.997323508584682,
|
||||
PR_TRUE,
|
||||
"ISO-8859-4"
|
||||
};
|
||||
|
||||
const SequenceModel Iso_8859_15SwedishModel =
|
||||
{
|
||||
Iso_8859_15_CharToOrderMap,
|
||||
SwedishLangModel,
|
||||
31,
|
||||
(float)0.997323508584682,
|
||||
PR_TRUE,
|
||||
"ISO-8859-15"
|
||||
};
|
||||
@ -185,6 +185,12 @@ nsSBCSGroupProber::nsSBCSGroupProber()
|
||||
mProbers[92] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSloveneModel);
|
||||
mProbers[93] = new nsSingleByteCharSetProber(&Ibm852SloveneModel);
|
||||
|
||||
mProbers[94] = new nsSingleByteCharSetProber(&Iso_8859_1SwedishModel);
|
||||
mProbers[95] = new nsSingleByteCharSetProber(&Iso_8859_4SwedishModel);
|
||||
mProbers[96] = new nsSingleByteCharSetProber(&Iso_8859_9SwedishModel);
|
||||
mProbers[97] = new nsSingleByteCharSetProber(&Iso_8859_15SwedishModel);
|
||||
mProbers[98] = new nsSingleByteCharSetProber(&Windows_1252SwedishModel);
|
||||
|
||||
Reset();
|
||||
}
|
||||
|
||||
|
||||
@ -40,7 +40,7 @@
|
||||
#define nsSBCSGroupProber_h__
|
||||
|
||||
|
||||
#define NUM_OF_SBCS_PROBERS 94
|
||||
#define NUM_OF_SBCS_PROBERS 99
|
||||
|
||||
class nsCharSetProber;
|
||||
class nsSBCSGroupProber: public nsCharSetProber {
|
||||
|
||||
@ -246,5 +246,11 @@ extern const SequenceModel Iso_8859_16SloveneModel;
|
||||
extern const SequenceModel Ibm852SloveneModel;
|
||||
extern const SequenceModel Mac_CentraleuropeSloveneModel;
|
||||
|
||||
extern const SequenceModel Iso_8859_1SwedishModel;
|
||||
extern const SequenceModel Iso_8859_4SwedishModel;
|
||||
extern const SequenceModel Iso_8859_9SwedishModel;
|
||||
extern const SequenceModel Iso_8859_15SwedishModel;
|
||||
extern const SequenceModel Windows_1252SwedishModel;
|
||||
|
||||
#endif /* nsSingleByteCharSetProber_h__ */
|
||||
|
||||
|
||||
10
test/sv/iso-8859-1.txt
Normal file
10
test/sv/iso-8859-1.txt
Normal file
@ -0,0 +1,10 @@
|
||||
Mölle är en tätort på Kullahalvön i Brunnby socken i Höganäs kommun, Skåne län.
|
||||
|
||||
Samhället var från början ett fiskeläge, men kom att spela en stor roll i den
|
||||
framväxande turismen i Sverige i slutet av 1800-talet. Till detta bidrog - och
|
||||
bidrar - Mölles natursköna läge invid Öresunds norra utlopp, med Kullaberg som
|
||||
bakgrund. Gemensamhetsbad för män och kvinnor introducerades i Ransvik i början
|
||||
av 1900-talet. Storhetstiden som turistort inträffade strax före första
|
||||
världskriget, men även under mellankrigstiden var turistströmmarna stora.
|
||||
Fortfarande är Mölle en populär turistort med en tredubbling av invånarantalet
|
||||
under sommarmånaderna.
|
||||
10
test/sv/utf-8.txt
Normal file
10
test/sv/utf-8.txt
Normal file
@ -0,0 +1,10 @@
|
||||
Mölle är en tätort på Kullahalvön i Brunnby socken i Höganäs kommun, Skåne län.
|
||||
|
||||
Samhället var från början ett fiskeläge, men kom att spela en stor roll i den
|
||||
framväxande turismen i Sverige i slutet av 1800-talet. Till detta bidrog – och
|
||||
bidrar – Mölles natursköna läge invid Öresunds norra utlopp, med Kullaberg som
|
||||
bakgrund. Gemensamhetsbad för män och kvinnor introducerades i Ransvik i början
|
||||
av 1900-talet. Storhetstiden som turistort inträffade strax före första
|
||||
världskriget, men även under mellankrigstiden var turistströmmarna stora.
|
||||
Fortfarande är Mölle en populär turistort med en tredubbling av invånarantalet
|
||||
under sommarmånaderna.
|
||||
10
test/sv/windows-1252.txt
Normal file
10
test/sv/windows-1252.txt
Normal file
@ -0,0 +1,10 @@
|
||||
Mölle är en tätort på Kullahalvön i Brunnby socken i Höganäs kommun, Skåne län.
|
||||
|
||||
Samhället var från början ett fiskeläge, men kom att spela en stor roll i den
|
||||
framväxande turismen i Sverige i slutet av 1800-talet. Till detta bidrog – och
|
||||
bidrar – Mölles natursköna läge invid Öresunds norra utlopp, med Kullaberg som
|
||||
bakgrund. Gemensamhetsbad för män och kvinnor introducerades i Ransvik i början
|
||||
av 1900-talet. Storhetstiden som turistort inträffade strax före första
|
||||
världskriget, men även under mellankrigstiden var turistströmmarna stora.
|
||||
Fortfarande är Mölle en populär turistort med en tredubbling av invånarantalet
|
||||
under sommarmånaderna.
|
||||
Loading…
x
Reference in New Issue
Block a user