初始化

This commit is contained in:
2023-12-29 00:08:10 +08:00
commit 5ed0fc646f
512 changed files with 54378 additions and 0 deletions

3
utils/text-encoding-0.6.3/.gitmodules vendored Normal file
View File

@@ -0,0 +1,3 @@
[submodule "test/testharness.js"]
path = test/testharness.js
url = https://github.com/w3c/testharness.js.git

View File

@@ -0,0 +1,29 @@
The encoding indexes, algorithms, and many comments in the code
derive from the Encoding Standard https://encoding.spec.whatwg.org/
Otherwise...
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org/>

View File

@@ -0,0 +1,117 @@
text-encoding
==============
This is a polyfill for the [Encoding Living
Standard](https://encoding.spec.whatwg.org/) API for the Web, allowing
encoding and decoding of textual data to and from Typed Array buffers
for binary data in JavaScript.
By default it adheres to the spec and does not support *encoding* to
legacy encodings, only *decoding*. It is also implemented to match the
specification's algorithms, rather than for performance. The intended
use is within Web pages, so it has no dependency on server frameworks
or particular module schemes.
Basic examples and tests are included.
### Install ###
There are a few ways you can get the `text-encoding` library.
#### Node ####
`text-encoding` is on `npm`. Simply run:
```js
npm install text-encoding
```
Or add it to your `package.json` dependencies.
#### Bower ####
`text-encoding` is on `bower` as well. Install with bower like so:
```js
bower install text-encoding
```
Or add it to your `bower.json` dependencies.
### HTML Page Usage ###
```html
<!-- Required for non-UTF encodings -->
<script src="encoding-indexes.js"></script>
<script src="encoding.js"></script>
```
### API Overview ###
Basic Usage
```js
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);
```
Streaming Decode
```js
var string = "", decoder = new TextDecoder(encoding), buffer;
while (buffer = next_chunk()) {
string += decoder.decode(buffer, {stream:true});
}
string += decoder.decode(); // finish the stream
```
### Encodings ###
All encodings from the Encoding specification are supported:
utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874
windows-1250 windows-1251 windows-1252 windows-1253 windows-1254
windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic
gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr
replacement utf-16be utf-16le x-user-defined
(Some encodings may be supported under other names, e.g. ascii,
iso-8859-1, etc. See [Encoding](https://encoding.spec.whatwg.org/) for
additional labels for each encoding.)
Encodings other than **utf-8**, **utf-16le** and **utf-16be** require
an additional `encoding-indexes.js` file to be included. It is rather
large (596kB uncompressed, 188kB gzipped); portions may be deleted if
support for some encodings is not required.
### Non-Standard Behavior ###
As required by the specification, only encoding to **utf-8** is
supported. If you want to try it out, you can force a non-standard
behavior by passing the `NONSTANDARD_allowLegacyEncoding` option to
TextEncoder and a label. For example:
```js
var uint8array = new TextEncoder(
'windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).encode(text);
```
But note that the above won't work if you're using the polyfill in a
browser that natively supports the TextEncoder API natively, since the
polyfill won't be used!
You can force the polyfill to be used by using this before the polyfill:
```html
<script>
window.TextEncoder = window.TextDecoder = null;
</script>
```
To support the legacy encodings (which may be stateful), the
TextEncoder `encode()` method accepts an optional dictionary and
`stream` option, e.g. `encoder.encode(string, {stream: true});` This
is not needed for standard encoding since the input is always in
complete code points.

View File

@@ -0,0 +1,30 @@
{
"name": "text-encoding",
"version": "0.6.3",
"homepage": "https://github.com/inexorabletash/text-encoding",
"authors": [
"Joshua Bell <inexorabletash@gmail.com>",
"Rick Eyre <rick.eyre@outlook.com>",
"Eugen Podaru <eugen.podaru@live.com>",
"Filip Dupanović <filip.dupanovic@gmail.com>",
"Anne van Kesteren <annevk@annevk.nl>",
"Author: Francis Avila <francisga@gmail.com>",
"Michael J. Ryan <tracker1@gmail.com>",
"Pierre Queinnec <pierre@queinnec.org>",
"Zack Weinberg <zackw@panix.com>"
],
"description": "Polyfill for the Encoding Living Standard's API",
"main": [ "lib/encoding.js", "lib/encoding-indexes.js" ],
"keywords": [
"decoding",
"encoding",
"living",
"standards"
],
"license": "Unlicense",
"ignore": [
"**/.*",
"test",
"*examples*.html"
]
}

View File

@@ -0,0 +1,58 @@
<!DOCTYPE html>
<script src="lib/encoding.js"></script>
<script>
function encodeArrayOfStrings(strings) {
var encoder, encoded, len, i, bytes, view, offset;
encoder = new TextEncoder();
encoded = [];
len = Uint32Array.BYTES_PER_ELEMENT;
for (i = 0; i < strings.length; i += 1) {
len += Uint32Array.BYTES_PER_ELEMENT;
encoded[i] = new TextEncoder().encode(strings[i]);
len += encoded[i].byteLength;
}
bytes = new Uint8Array(len);
view = new DataView(bytes.buffer);
offset = 0;
view.setUint32(offset, strings.length);
offset += Uint32Array.BYTES_PER_ELEMENT;
for (i = 0; i < encoded.length; i += 1) {
len = encoded[i].byteLength;
view.setUint32(offset, len);
offset += Uint32Array.BYTES_PER_ELEMENT;
bytes.set(encoded[i], offset);
offset += len;
}
return bytes.buffer;
}
function decodeArrayOfStrings(buffer, encoding) {
var decoder, view, offset, num_strings, strings, i, len;
decoder = new TextDecoder(encoding);
view = new DataView(buffer);
offset = 0;
strings = [];
num_strings = view.getUint32(offset);
offset += Uint32Array.BYTES_PER_ELEMENT;
for (i = 0; i < num_strings; i += 1) {
len = view.getUint32(offset);
offset += Uint32Array.BYTES_PER_ELEMENT;
strings[i] = decoder.decode(
new DataView(view.buffer, offset, len));
offset += len;
}
return strings;
}
var strings = ["Hello", "string", "encoding!"];
var buffer = encodeArrayOfStrings(strings);
var results = decodeArrayOfStrings(buffer, "utf-8");
document.write("Encoded " + JSON.stringify(strings) + "<br>");
document.write("Decoded " + JSON.stringify(results) + "<br>");
</script>

View File

@@ -0,0 +1,59 @@
<!DOCTYPE html>
<script src="lib/encoding-indexes.js"></script>
<script src="lib/encoding.js"></script>
<script>
function encodeArrayOfStrings(strings) {
var encoder, encoded, len, i, bytes, view, offset;
encoder = new TextEncoder();
encoded = [];
len = Uint32Array.BYTES_PER_ELEMENT;
for (i = 0; i < strings.length; i += 1) {
len += Uint32Array.BYTES_PER_ELEMENT;
encoded[i] = new TextEncoder().encode(strings[i]);
len += encoded[i].byteLength;
}
bytes = new Uint8Array(len);
view = new DataView(bytes.buffer);
offset = 0;
view.setUint32(offset, strings.length);
offset += Uint32Array.BYTES_PER_ELEMENT;
for (i = 0; i < encoded.length; i += 1) {
len = encoded[i].byteLength;
view.setUint32(offset, len);
offset += Uint32Array.BYTES_PER_ELEMENT;
bytes.set(encoded[i], offset);
offset += len;
}
return bytes.buffer;
}
function decodeArrayOfStrings(buffer, encoding) {
var decoder, view, offset, num_strings, strings, i, len;
decoder = new TextDecoder(encoding);
view = new DataView(buffer);
offset = 0;
strings = [];
num_strings = view.getUint32(offset);
offset += Uint32Array.BYTES_PER_ELEMENT;
for (i = 0; i < num_strings; i += 1) {
len = view.getUint32(offset);
offset += Uint32Array.BYTES_PER_ELEMENT;
strings[i] = decoder.decode(
new DataView(view.buffer, offset, len));
offset += len;
}
return strings;
}
var strings = ["Hello", "string", "encoding!"];
var buffer = encodeArrayOfStrings(strings);
var results = decodeArrayOfStrings(buffer, "utf-8");
document.write("Encoded " + JSON.stringify(strings) + "<br>");
document.write("Decoded " + JSON.stringify(results) + "<br>");
</script>

View File

@@ -0,0 +1,9 @@
// This is free and unencumbered software released into the public domain.
// See LICENSE.md for more information.
var encoding = require("./lib/encoding.js");
module.exports = {
TextEncoder: encoding.TextEncoder,
TextDecoder: encoding.TextDecoder,
};

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,37 @@
{
"name": "text-encoding",
"author": "Joshua Bell <inexorabletash@gmail.com>",
"contributors": [
"Joshua Bell <inexorabletash@gmail.com>",
"Rick Eyre <rick.eyre@outlook.com>",
"Eugen Podaru <eugen.podaru@live.com>",
"Filip Dupanović <filip.dupanovic@gmail.com>",
"Anne van Kesteren <annevk@annevk.nl>",
"Author: Francis Avila <francisga@gmail.com>",
"Michael J. Ryan <tracker1@gmail.com>",
"Pierre Queinnec <pierre@queinnec.org>",
"Zack Weinberg <zackw@panix.com>"
],
"version": "0.6.3",
"description": "Polyfill for the Encoding Living Standard's API.",
"main": "index.js",
"files": [
"index.js",
"lib/encoding.js",
"lib/encoding-indexes.js"
],
"repository": {
"type": "git",
"url": "https://github.com/inexorabletash/text-encoding.git"
},
"keywords": [
"encoding",
"decoding",
"living standard"
],
"bugs": {
"url": "https://github.com/inexorabletash/text-encoding/issues"
},
"homepage": "https://github.com/inexorabletash/text-encoding",
"license": "Unlicense"
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,13 @@
// This is free and unencumbered software released into the public domain.
// See LICENSE.md for more information.
test(function() {
var cases = [
{bytes: [148, 57, 218, 51], string: '\uD83D\uDCA9' } // U+1F4A9 PILE OF POO
];
cases.forEach(function(c) {
assert_equals(new TextDecoder('gb18030').decode(new Uint8Array(c.bytes)),
c.string);
});
}, 'gb18030 ranges');

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,393 @@
// This is free and unencumbered software released into the public domain.
// See LICENSE.md for more information.
var THE_ENCODING = ['utf-8'];
var LEGACY_ENCODINGS = [
'ibm866', 'iso-8859-2', 'iso-8859-3', 'iso-8859-4', 'iso-8859-5',
'iso-8859-6', 'iso-8859-7', 'iso-8859-8', 'iso-8859-10',
'iso-8859-13', 'iso-8859-14', 'iso-8859-15', 'iso-8859-16', 'koi8-r',
'koi8-u', 'macintosh', 'windows-874', 'windows-1250', 'windows-1251',
'windows-1252', 'windows-1253', 'windows-1254', 'windows-1255',
'windows-1256', 'windows-1257', 'windows-1258', 'x-mac-cyrillic',
'gbk', 'gb18030', 'big5', 'euc-jp', 'iso-2022-jp', 'shift_jis',
'euc-kr', 'utf-16le', 'utf-16be'
];
var ASCII_SUPERSETS = THE_ENCODING.concat(LEGACY_ENCODINGS)
.filter(function(e) {
return e !== 'utf-16le' && e !== 'utf-16be';
});
// Miscellaneous tests
test(function() {
assert_false(/\[native code\]/.test(String(TextDecoder)),
'Native implementation present - polyfill not tested.');
}, 'TextDecoder Polyfill (will fail if natively supported)');
test(function() {
assert_false(/\[native code\]/.test(String(TextEncoder)),
'Native implementation present - polyfill not tested.');
}, 'TextEncoder Polyfill (will fail if natively supported)');
test(function() {
assert_true('encoding' in new TextEncoder());
assert_equals(new TextEncoder().encoding, 'utf-8');
assert_true('encoding' in new TextDecoder());
assert_equals(new TextDecoder().encoding, 'utf-8');
assert_equals(new TextDecoder('utf-16le').encoding, 'utf-16le');
assert_true('fatal' in new TextDecoder());
assert_false(new TextDecoder('utf-8').fatal);
assert_true(new TextDecoder('utf-8', {fatal: true}).fatal);
assert_true('ignoreBOM' in new TextDecoder());
assert_false(new TextDecoder('utf-8').ignoreBOM);
assert_true(new TextDecoder('utf-8', {ignoreBOM: true}).ignoreBOM);
}, 'Attributes');
test(function() {
var badStrings = [
{ input: '\ud800', expected: '\ufffd' }, // Surrogate half
{ input: '\udc00', expected: '\ufffd' }, // Surrogate half
{ input: 'abc\ud800def', expected: 'abc\ufffddef' }, // Surrogate half
{ input: 'abc\udc00def', expected: 'abc\ufffddef' }, // Surrogate half
{ input: '\udc00\ud800', expected: '\ufffd\ufffd' } // Wrong order
];
badStrings.forEach(
function(t) {
var encoded = new TextEncoder().encode(t.input);
var decoded = new TextDecoder().decode(encoded);
assert_equals(t.expected, decoded);
});
}, 'bad data');
test(function() {
var bad = [
{ encoding: 'utf-8', input: [0xC0] }, // ends early
{ encoding: 'utf-8', input: [0xC0, 0x00] }, // invalid trail
{ encoding: 'utf-8', input: [0xC0, 0xC0] }, // invalid trail
{ encoding: 'utf-8', input: [0xE0] }, // ends early
{ encoding: 'utf-8', input: [0xE0, 0x00] }, // invalid trail
{ encoding: 'utf-8', input: [0xE0, 0xC0] }, // invalid trail
{ encoding: 'utf-8', input: [0xE0, 0x80, 0x00] }, // invalid trail
{ encoding: 'utf-8', input: [0xE0, 0x80, 0xC0] }, // invalid trail
{ encoding: 'utf-8', input: [0xFC, 0x80, 0x80, 0x80, 0x80, 0x80] }, // > 0x10FFFF
{ encoding: 'utf-16le', input: [0x00] }, // truncated code unit
{ encoding: 'utf-16le', input: [0x00, 0xd8] }, // surrogate half
{ encoding: 'utf-16le', input: [0x00, 0xd8, 0x00, 0x00] }, // surrogate half
{ encoding: 'utf-16le', input: [0x00, 0xdc, 0x00, 0x00] }, // trail surrogate
{ encoding: 'utf-16le', input: [0x00, 0xdc, 0x00, 0xd8] } // swapped surrogates
// TODO: Single byte encoding cases
];
bad.forEach(
function(t) {
assert_throws({name: 'TypeError'}, function() {
new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input));
});
});
}, 'fatal flag');
test(function() {
var encodings = [
{ label: 'utf-8', encoding: 'utf-8' },
{ label: 'utf-16', encoding: 'utf-16le' },
{ label: 'utf-16le', encoding: 'utf-16le' },
{ label: 'utf-16be', encoding: 'utf-16be' },
{ label: 'ascii', encoding: 'windows-1252' },
{ label: 'iso-8859-1', encoding: 'windows-1252' }
];
encodings.forEach(
function(test) {
assert_equals(new TextDecoder(test.label.toLowerCase()).encoding, test.encoding);
assert_equals(new TextDecoder(test.label.toUpperCase()).encoding, test.encoding);
});
}, 'Encoding names are case insensitive');
test(function() {
var utf8_bom = [0xEF, 0xBB, 0xBF];
var utf8 = [0x7A, 0xC2, 0xA2, 0xE6, 0xB0, 0xB4, 0xF0, 0x9D, 0x84, 0x9E, 0xF4, 0x8F, 0xBF, 0xBD];
var utf16le_bom = [0xff, 0xfe];
var utf16le = [0x7A, 0x00, 0xA2, 0x00, 0x34, 0x6C, 0x34, 0xD8, 0x1E, 0xDD, 0xFF, 0xDB, 0xFD, 0xDF];
var utf16be_bom = [0xfe, 0xff];
var utf16be = [0x00, 0x7A, 0x00, 0xA2, 0x6C, 0x34, 0xD8, 0x34, 0xDD, 0x1E, 0xDB, 0xFF, 0xDF, 0xFD];
var string = 'z\xA2\u6C34\uD834\uDD1E\uDBFF\uDFFD'; // z, cent, CJK water, G-Clef, Private-use character
// missing BOMs
assert_equals(new TextDecoder('utf-8').decode(new Uint8Array(utf8)), string);
assert_equals(new TextDecoder('utf-16le').decode(new Uint8Array(utf16le)), string);
assert_equals(new TextDecoder('utf-16be').decode(new Uint8Array(utf16be)), string);
// matching BOMs
assert_equals(new TextDecoder('utf-8').decode(new Uint8Array(utf8_bom.concat(utf8))), string);
assert_equals(new TextDecoder('utf-16le').decode(new Uint8Array(utf16le_bom.concat(utf16le))), string);
assert_equals(new TextDecoder('utf-16be').decode(new Uint8Array(utf16be_bom.concat(utf16be))), string);
// matching BOMs split
var decoder8 = new TextDecoder('utf-8');
assert_equals(decoder8.decode(new Uint8Array(utf8_bom.slice(0, 1)), {stream: true}), '');
assert_equals(decoder8.decode(new Uint8Array(utf8_bom.slice(1).concat(utf8))), string);
assert_equals(decoder8.decode(new Uint8Array(utf8_bom.slice(0, 2)), {stream: true}), '');
assert_equals(decoder8.decode(new Uint8Array(utf8_bom.slice(2).concat(utf8))), string);
var decoder16le = new TextDecoder('utf-16le');
assert_equals(decoder16le.decode(new Uint8Array(utf16le_bom.slice(0, 1)), {stream: true}), '');
assert_equals(decoder16le.decode(new Uint8Array(utf16le_bom.slice(1).concat(utf16le))), string);
var decoder16be = new TextDecoder('utf-16be');
assert_equals(decoder16be.decode(new Uint8Array(utf16be_bom.slice(0, 1)), {stream: true}), '');
assert_equals(decoder16be.decode(new Uint8Array(utf16be_bom.slice(1).concat(utf16be))), string);
// mismatching BOMs
assert_not_equals(new TextDecoder('utf-8').decode(new Uint8Array(utf16le_bom.concat(utf8))), string);
assert_not_equals(new TextDecoder('utf-8').decode(new Uint8Array(utf16be_bom.concat(utf8))), string);
assert_not_equals(new TextDecoder('utf-16le').decode(new Uint8Array(utf8_bom.concat(utf16le))), string);
assert_not_equals(new TextDecoder('utf-16le').decode(new Uint8Array(utf16be_bom.concat(utf16le))), string);
assert_not_equals(new TextDecoder('utf-16be').decode(new Uint8Array(utf8_bom.concat(utf16be))), string);
assert_not_equals(new TextDecoder('utf-16be').decode(new Uint8Array(utf16le_bom.concat(utf16be))), string);
// ignore BOMs
assert_equals(new TextDecoder('utf-8', {ignoreBOM: true})
.decode(new Uint8Array(utf8_bom.concat(utf8))),
'\uFEFF' + string);
assert_equals(new TextDecoder('utf-16le', {ignoreBOM: true})
.decode(new Uint8Array(utf16le_bom.concat(utf16le))),
'\uFEFF' + string);
assert_equals(new TextDecoder('utf-16be', {ignoreBOM: true})
.decode(new Uint8Array(utf16be_bom.concat(utf16be))),
'\uFEFF' + string);
}, 'Byte-order marks');
test(function() {
assert_equals(new TextDecoder('utf-8').encoding, 'utf-8'); // canonical case
assert_equals(new TextDecoder('UTF-16').encoding, 'utf-16le'); // canonical case and name
assert_equals(new TextDecoder('UTF-16BE').encoding, 'utf-16be'); // canonical case and name
assert_equals(new TextDecoder('iso8859-1').encoding, 'windows-1252'); // canonical case and name
assert_equals(new TextDecoder('iso-8859-1').encoding, 'windows-1252'); // canonical case and name
}, 'Encoding names');
test(function() {
var string = '\x00123ABCabc\x80\xFF\u0100\u1000\uFFFD\uD800\uDC00\uDBFF\uDFFF';
var cases = [
{
encoding: 'utf-8',
encoded: [0, 49, 50, 51, 65, 66, 67, 97, 98, 99, 194, 128, 195, 191, 196,
128, 225, 128, 128, 239, 191, 189, 240, 144, 128, 128, 244, 143,
191, 191]
},
{
encoding: 'utf-16le',
encoded: [0, 0, 49, 0, 50, 0, 51, 0, 65, 0, 66, 0, 67, 0, 97, 0, 98, 0,
99, 0, 128, 0, 255, 0, 0, 1, 0, 16, 253, 255, 0, 216, 0, 220,
255, 219, 255, 223]
},
{
encoding: 'utf-16be',
encoded: [0, 0, 0, 49, 0, 50, 0, 51, 0, 65, 0, 66, 0, 67, 0, 97, 0, 98, 0,
99, 0, 128, 0, 255, 1, 0, 16, 0, 255, 253, 216, 0, 220, 0, 219,
255, 223, 255]
}
];
cases.forEach(function(c) {
for (var len = 1; len <= 5; ++len) {
var out = '', decoder = new TextDecoder(c.encoding);
for (var i = 0; i < c.encoded.length; i += len) {
var sub = [];
for (var j = i; j < c.encoded.length && j < i + len; ++j) {
sub.push(c.encoded[j]);
}
out += decoder.decode(new Uint8Array(sub), {stream: true});
}
out += decoder.decode();
assert_equals(out, string, 'streaming decode ' + c.encoding);
}
});
}, 'Streaming Decode');
test(function() {
var jis = [0x82, 0xC9, 0x82, 0xD9, 0x82, 0xF1];
var expected = '\u306B\u307B\u3093'; // Nihon
assert_equals(new TextDecoder('shift_jis').decode(new Uint8Array(jis)), expected);
}, 'Shift_JIS Decode');
test(function() {
ASCII_SUPERSETS.forEach(function(encoding) {
var string = '', bytes = [];
for (var i = 0; i < 128; ++i) {
// Encodings that have escape codes in 0x00-0x7F
if (encoding === 'iso-2022-jp' &&
(i === 0x0E || i === 0x0F || i === 0x1B))
continue;
string += String.fromCharCode(i);
bytes.push(i);
}
var ascii_encoded = new TextEncoder().encode(string);
assert_equals(new TextDecoder(encoding).decode(ascii_encoded), string, encoding);
});
}, 'Supersets of ASCII decode ASCII correctly');
test(function() {
assert_throws({name: 'TypeError'}, function() { new TextDecoder('utf-8', {fatal: true}).decode(new Uint8Array([0xff])); });
// This should not hang:
new TextDecoder('utf-8').decode(new Uint8Array([0xff]));
assert_throws({name: 'TypeError'}, function() { new TextDecoder('utf-16le', {fatal: true}).decode(new Uint8Array([0x00])); });
// This should not hang:
new TextDecoder('utf-16le').decode(new Uint8Array([0x00]));
assert_throws({name: 'TypeError'}, function() { new TextDecoder('utf-16be', {fatal: true}).decode(new Uint8Array([0x00])); });
// This should not hang:
new TextDecoder('utf-16be').decode(new Uint8Array([0x00]));
}, 'Non-fatal errors at EOF');
test(function() {
LEGACY_ENCODINGS.forEach(function(encoding) {
assert_equals(new TextDecoder(encoding).encoding, encoding);
assert_equals(new TextEncoder(encoding).encoding, 'utf-8');
});
}, 'Legacy encodings supported only for decode, not encode');
test(function() {
[
'csiso2022kr',
'hz-gb-2312',
'iso-2022-cn',
'iso-2022-cn-ext',
'iso-2022-kr'
].forEach(function(encoding) {
assert_equals(new TextEncoder(encoding).encoding, 'utf-8');
assert_throws({name: 'RangeError'},
function() {
var decoder = new TextDecoder(encoding, {fatal: true});
});
assert_throws({name: 'RangeError'},
function() {
var decoder = new TextDecoder(encoding, {fatal: false});
});
});
}, 'Replacement encoding labels');
test(function() {
var decoder = new TextDecoder();
var bytes = [65, 66, 97, 98, 99, 100, 101, 102, 103, 104, 67, 68, 69, 70, 71, 72];
var chars = 'ABabcdefghCDEFGH';
var buffer = new Uint8Array(bytes).buffer;
assert_equals(decoder.decode(buffer), chars,
'Decoding from ArrayBuffer should match expected text.');
['Uint8Array', 'Int8Array', 'Uint8ClampedArray',
'Uint16Array', 'Int16Array',
'Uint32Array', 'Int32Array',
'Float32Array', 'Float64Array'].forEach(function(typeName) {
var type = self[typeName];
var array = new type(buffer);
assert_equals(decoder.decode(array), chars,
'Decoding from ' + typeName + ' should match expected text.');
var subset = new type(buffer, type.BYTES_PER_ELEMENT, 8 / type.BYTES_PER_ELEMENT);
assert_equals(decoder.decode(subset),
chars.substring(type.BYTES_PER_ELEMENT, type.BYTES_PER_ELEMENT + 8),
'Decoding from ' + typeName + ' should match expected text.');
});
}, 'ArrayBuffer, ArrayBufferView and buffer offsets');
test(function() {
assert_throws({name: 'RangeError'},
function() { new TextDecoder(null); },
'Null should coerce to "null" and be invalid encoding name.');
assert_throws({name: 'TypeError'},
function() { new TextDecoder('utf-8', ''); },
'String should not coerce to dictionary.');
assert_throws({name: 'TypeError'},
function() { new TextDecoder('utf-8').decode(null, ''); },
'String should not coerce to dictionary.');
}, 'Invalid parameters');
test(function() {
assert_array_equals(
[249,249,249,233,249,235,249,234,164,81,164,202],
new TextEncoder('big5', {NONSTANDARD_allowLegacyEncoding: true})
.encode('\u2550\u255E\u2561\u256A\u5341\u5345'));
}, 'NONSTANDARD - regression tests');
test(function() {
// Regression test for https://github.com/whatwg/encoding/issues/22
assert_equals(
new TextDecoder('gb18030').decode(new Uint8Array([
0xA8, 0xBC,
0x81, 0x35, 0xF4, 0x37
])), '\u1E3F\uE7C7');
}, 'GB 18030 2000 vs 2005: U+1E3F, U+E7C7 (decoding)');
test(function() {
// Regression test for https://github.com/whatwg/encoding/issues/22
assert_array_equals(
new TextEncoder('gb18030', {NONSTANDARD_allowLegacyEncoding: true})
.encode('\u1E3F\uE7C7'),
[
0xA8, 0xBC,
0x81, 0x35, 0xF4, 0x37
]);
}, 'NONSTANDARD - GB 18030 2000 vs 2005: U+1E3F, U+E7C7 (encoding)');
test(function() {
// Regression test for https://github.com/whatwg/encoding/issues/17
assert_throws(
new TypeError,
function() {
new TextEncoder('gb18030', {NONSTANDARD_allowLegacyEncoding: true})
.encode('\uE5E5');
});
}, 'NONSTANDARD - gb18030: U+E5E5 (encoding)');
test(function() {
// Regression test for https://github.com/whatwg/encoding/issues/15
var encoder =
new TextEncoder('iso-2022-jp', {NONSTANDARD_allowLegacyEncoding: true});
[
//'\u000E', '\u000F', '\u001B',
'\u00A5\u000E', //'\u00A5\u000F', '\u00A5\u001B'
].forEach(function(s) {
assert_throws(new TypeError, function() { encoder.encode(s); });
});
}, 'NONSTANDARD - iso-2022-jp encoding attack (encoding)');
['utf-16le', 'utf-16be'].forEach(function(encoding) {
test(function() {
var encoder = new TextEncoder(encoding, {NONSTANDARD_allowLegacyEncoding: true});
var decoder = new TextDecoder(encoding);
var sample = "z\xA2\u6C34\uD834\uDD1E\uDBFF\uDFFD";
assert_equals(decoder.decode(encoder.encode(sample)), sample);
}, 'NONSTANDARD - ' + encoding + ' (encoding)');
});
test(function() {
var encoder = new TextEncoder();
assert_array_equals([].slice.call(encoder.encode(false)), [102, 97, 108, 115, 101]);
assert_array_equals([].slice.call(encoder.encode(0)), [48]);
}, 'encode() called with falsy arguments (polyfill bindings)');
test(function() {
// Regression test for https://github.com/inexorabletash/text-encoding/issues/59
assert_array_equals(
new TextDecoder('windows-1255').decode(new Uint8Array([0xCA])), '\u05BA');
}, 'windows-1255 map 0xCA to U+05BA');

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,152 @@
// This is free and unencumbered software released into the public domain.
// See LICENSE.md for more information.
// Extension to testharness.js API which avoids logging enormous strings
// on a coding failure.
function assert_string_equals(actual, expected, description) {
// short circuit success case
if (actual === expected) {
assert_true(true, description + ": <actual> === <expected>");
return;
}
// length check
assert_equals(actual.length, expected.length,
description + ": string lengths");
for (var i = 0; i < actual.length; i++) {
var a = actual.charCodeAt(i);
var b = expected.charCodeAt(i);
if (a !== b)
assert_true(false,
description +
": code unit " + i.toString() + " unequal: " +
cpname(a) + " != " + cpname(b)); // doesn't return
}
// It should be impossible to get here, because the initial
// comparison failed, so either the length comparison or the
// codeunit-by-codeunit comparison should also fail.
assert_true(false, description + ": failed to detect string difference");
}
// Inspired by:
// http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html
function encode_utf8(string) {
var utf8 = unescape(encodeURIComponent(string));
var octets = new Uint8Array(utf8.length), i;
for (i = 0; i < utf8.length; i += 1) {
octets[i] = utf8.charCodeAt(i);
}
return octets;
}
function decode_utf8(octets) {
var utf8 = String.fromCharCode.apply(null, octets);
return decodeURIComponent(escape(utf8));
}
// Helpers for test_utf_roundtrip.
function cpname(n) {
if (n+0 !== n)
return n.toString();
var w = (n <= 0xFFFF) ? 4 : 6;
return 'U+' + ('000000' + n.toString(16).toUpperCase()).slice(-w);
}
function genblock(from, len, skip) {
var block = [];
for (var i = 0; i < len; i += skip) {
var cp = from + i;
if (0xD800 <= cp && cp <= 0xDFFF)
continue;
if (cp < 0x10000) {
block.push(String.fromCharCode(cp));
continue;
}
cp = cp - 0x10000;
block.push(String.fromCharCode(0xD800 + (cp >> 10)));
block.push(String.fromCharCode(0xDC00 + (cp & 0x3FF)));
}
return block.join('');
}
function encode_utf16le(s) { return encode_utf16(s, true); }
function encode_utf16be(s) { return encode_utf16(s, false); }
function encode_utf16(s, le) {
var a = new Uint8Array(s.length * 2), view = new DataView(a.buffer);
s.split('').forEach(function(c, i) {
view.setUint16(i * 2, c.charCodeAt(0), le);
});
return a;
}
function test_utf_roundtrip () {
var MIN_CODEPOINT = 0;
var MAX_CODEPOINT = 0x10FFFF;
var BLOCK_SIZE = 0x1000;
var SKIP_SIZE = 31;
var TD_U16LE = new TextDecoder("UTF-16LE");
var TD_U16BE = new TextDecoder("UTF-16BE");
var TE_U8 = new TextEncoder();
var TD_U8 = new TextDecoder("UTF-8");
for (var i = MIN_CODEPOINT; i < MAX_CODEPOINT; i += BLOCK_SIZE) {
var block_tag = cpname(i) + " - " + cpname(i + BLOCK_SIZE - 1);
var block = genblock(i, BLOCK_SIZE, SKIP_SIZE);
// test UTF-16LE, UTF-16BE, and UTF-8 encodings against themselves
var encoded = encode_utf16le(block);
var decoded = TD_U16LE.decode(encoded);
assert_string_equals(block, decoded, "UTF-16LE round trip " + block_tag);
encoded = encode_utf16be(block);
decoded = TD_U16BE.decode(encoded);
assert_string_equals(block, decoded, "UTF-16BE round trip " + block_tag);
encoded = TE_U8.encode(block);
decoded = TD_U8.decode(encoded);
assert_string_equals(block, decoded, "UTF-8 round trip " + block_tag);
// test TextEncoder(UTF-8) against the older idiom
var exp_encoded = encode_utf8(block);
assert_array_equals(encoded, exp_encoded,
"UTF-8 reference encoding " + block_tag);
var exp_decoded = decode_utf8(exp_encoded);
assert_string_equals(decoded, exp_decoded,
"UTF-8 reference decoding " + block_tag);
}
}
function test_utf_samples () {
// z, cent, CJK water, G-Clef, Private-use character
var sample = "z\xA2\u6C34\uD834\uDD1E\uDBFF\uDFFD";
var cases = [
{ encoding: "utf-8",
expected: [0x7A, 0xC2, 0xA2, 0xE6, 0xB0, 0xB4, 0xF0, 0x9D, 0x84, 0x9E, 0xF4, 0x8F, 0xBF, 0xBD] },
{ encoding: "utf-16le",
expected: [0x7A, 0x00, 0xA2, 0x00, 0x34, 0x6C, 0x34, 0xD8, 0x1E, 0xDD, 0xFF, 0xDB, 0xFD, 0xDF] },
{ encoding: "utf-16",
expected: [0x7A, 0x00, 0xA2, 0x00, 0x34, 0x6C, 0x34, 0xD8, 0x1E, 0xDD, 0xFF, 0xDB, 0xFD, 0xDF] },
{ encoding: "utf-16be",
expected: [0x00, 0x7A, 0x00, 0xA2, 0x6C, 0x34, 0xD8, 0x34, 0xDD, 0x1E, 0xDB, 0xFF, 0xDF, 0xFD] }
];
cases.forEach(
function(t) {
var decoded = new TextDecoder(t.encoding)
.decode(new Uint8Array(t.expected));
assert_equals(decoded, sample,
"expected equal decodings - " + t.encoding);
});
}
test(test_utf_samples,
"UTF-8, UTF-16LE, UTF-16BE - Encode/Decode - reference sample");
test(test_utf_roundtrip,
"UTF-8, UTF-16LE, UTF-16BE - Encode/Decode - full roundtrip and "+
"agreement with encode/decodeURIComponent");

View File

@@ -0,0 +1,15 @@
// This is free and unencumbered software released into the public domain.
// See LICENSE.md for more information.
test(
function() {
assert_equals(new TextEncoder('x-user-defined').encoding, 'utf-8');
var decoder = new TextDecoder('x-user-defined');
for (var i = 0; i < 0x80; ++i) {
assert_equals(decoder.decode(new Uint8Array([i])), String.fromCharCode(i));
assert_equals(decoder.decode(new Uint8Array([i + 0x80])), String.fromCharCode(i + 0xF780));
}
},
"x-user-defined encoding"
);

View File

@@ -0,0 +1,27 @@
<!DOCTYPE HTML>
<title>Encoding API Tests</title>
<link rel="stylesheet" href="testharness.js/testharness.css">
<h1>Encoding API Tests</h1>
<script src="testharness.js/testharness.js"></script>
<script src="testharness.js/testharnessreport.js"></script>
<script>
setup({explicit_timeout: true});
// Hide native implementation so polyfill will be used
self.TextEncoder = null;
self.TextDecoder = null;
</script>
<script src="../lib/encoding-indexes.js"></script>
<script src="../lib/encoding.js"></script>
<script src="test-misc.js"></script>
<script src="test-utf.js"></script>
<!-- TODO: test for all single-byte encoding indexes -->
<script src="test-big5.js"></script>
<script src="test-euc-jp.js"></script>
<script src="test-iso-2022-jp.js"></script>
<script src="test-shift_jis.js"></script>
<script src="test-euc-kr.js"></script>
<script src="test-gb18030.js"></script>
<script src="test-x-user-defined.js"></script>

View File

@@ -0,0 +1,28 @@
//
// Externs for Closure Compiler
// https://developers.google.com/closure/compiler/
//
// Usage:
// java -jar compiler.jar \
// --jscomp_warning reportUnknownTypes \
// --warning_level VERBOSE \
// --summary_detail_level 3 \
// --externs util/externs.js \
// lib/encoding.js
//
/**
* @param {string} name
* @return {*}
*/
function require(name) {}
/**
* @type {Object}
*/
var module;
/**
* @type {Object.<string,*>}
*/
module.exports;