You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

136 lines
5.2 KiB
Markdown

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

# ENSNormalize.cs
0-dependency [ENSIP-15](https://docs.ens.domains/ens-improvement-proposals/ensip-15-normalization-standard) in C#
* Reference Implementation: [@adraffy/ens-normalize.js](https://github.com/adraffy/ens-normalize.js)
* Unicode: `15.0.0`
* Spec Hash: [`962316964553fce6188e25a5166a4c1e906333adf53bdf2964c71dedc0f8e2c8`](https://github.com/ensdomains/docs/blob/master/ens-improvement-proposals/ensip-15/spec.json)
* Passes **100%** [ENSIP-15 Validation Tests](https://github.com/ensdomains/docs/blob/master/ens-improvement-proposals/ensip-15/tests.json)
* Passes **100%** [Unicode Normalization Tests](https://unicode.org/Public/15.0.0/ucd/NormalizationTest.txt)
* Space Efficient: `~58KB .dll` using [Inline Blobs](./ENSNormalize/Blobs.cs) via [make.js](./Compress/make.js)
* Legacy Support: `netstandard1.1`, `net35`, `netcoreapp3.1`
* Nuget Repository: [![NuGet version](https://badge.fury.io/nu/ADRaffy.ENSNormalize.svg)](https://badge.fury.io/nu/ADRaffy.ENSNormalize)
```c#
using ADRaffy.ENSNormalize;
ENSNormalize.ENSIP15 // Main Library (global instance)
```
### Primary API [ENSIP15](./ENSNormalize/ENSIP15.cs)
```c#
// string -> string
// throws on invalid names
ENSNormalize.ENSIP15.Normalize("RaFFY🚴.eTh"); // "raffy🚴♂.eth"
// works like Normalize()
ENSNormalize.ENSIP15.Beautify("1⃣2⃣.eth"); // "1⃣2⃣.eth"
```
### Additional [NormDetails](./ENSNormalize/NormDetails.cs) (Experimental)
```c#
// works like Normalize(), throws on invalid names
// string -> NormDetails
NormDetails details = ENSNormalize.ENSIP15.NormalizeDetails("💩ì.a");
string Name; // normalized name
bool PossiblyConfusing; // if name should be carefully reviewed
string GroupDescription = "Latin+Emoji"; // group summary for name
HashSet<Group> Groups; // unique groups in name
HashSet<EmojiSequence> Emojis; // unique emoji in name
bool HasZWJEmoji; // if any emoji contain 200D
```
### Output-based Tokenization [Label](./ENSNormalize/Label.cs)
```c#
// string -> Label[]
// never throws
Label[] labels = ENSNormalize.ENSIP15.Split("💩Raffy.eth_");
// [
// Label {
// Input: [ 128169, 82, 97, 102, 102, 121 ],
// Tokens: [
// OutputToken { Codepoints: [ 128169 ], IsEmoji: true }
// OutputToken { Codepoints: [ 114, 97, 102, 102, 121 ] }
// ],
// Normalized: [ 128169, 114, 97, 102, 102, 121 ],
// Group: Group { Name: "Latin", ... }
// },
// Label {
// Input: [ 101, 116, 104, 95 ],
// Tokens: [
// OutputToken { Codepoints: [ 101, 116, 104, 95 ] }
// ],
// Error: NormException { Kind: "underscore allowed only at start" }
// }
// ]
```
### Normalization Properties
* [Group](./ENSNormalize/Group.cs) — `ENSIP15.Groups: IList<Group>`
* [EmojiSequence](./ENSNormalize/EmojiSequence.cs) — `ENSIP15.Emojis: IList<EmojiSequence>`
* [Whole](./ENSNormalize/Whole.cs) — `ENSIP15.Wholes: IList<Whole>`
### Error Handling
All errors are safe to print. [NormException](./ENSNormalize/NormException.cs) `{ Kind: string, Reason: string? }` is the base exception. Functions that accept names as input wrap their exceptions in [InvalidLabelException](./ENSNormalize/InvalidLabelException.cs) `{ Label: string, Error: NormException }` for additional context.
* `"disallowed character"` — [DisallowedCharacterException](./ENSNormalize/DisallowedCharacterException.cs) `{ Codepoint }`
* `"illegal mixture"` — [IllegalMixtureException](./ENSNormalize/IllegalMixtureException.cs) `{ Codepoint, Group, OtherGroup? }`
* `"whole-script confusable"` — [ConfusableException](./ENSNormalize/ConfusableException.cs) `{ Group, OtherGroup }`
* `"empty label"`
* `"duplicate non-spacing marks"`
* `"excessive non-spacing marks"`
* `"leading fenced"`
* `"adjacent fenced"`
* `"trailing fenced"`
* `"leading combining mark"`
* `"emoji + combining mark"`
* `"invalid label extension"`
* `"underscore allowed only at start"`
### Utilities
Normalize name fragments for substring search:
```c#
// string -> string
// only throws InvalidLabelException w/DisallowedCharacterException
ENSNormalize.ENSIP15.NormalizeFragment("AB--");
ENSNormalize.ENSIP15.NormalizeFragment("..\u0300");
ENSNormalize.ENSIP15.NormalizeFragment("\u03BF\u043E");
// note: Normalize() throws on these
```
Construct safe strings:
```c#
// int -> string
ENSNormalize.ENSIP15.SafeCodepoint(0x303); // "◌̃"
ENSNormalize.ENSIP15.SafeCodepoint(0xFE0F); // "{FE0F}"
// IList<int> -> string
ENSNormalize.ENSIP15.SafeImplode(new int[]{ 0x303, 0xFE0F }); // "◌̃{FE0F}"
```
Determine if a character shouldn't be printed directly:
```c#
// ReadOnlyIntSet (like IReadOnlySet<int>)
ENSNormalize.ENSIP15.ShouldEscape.Contains(0x202E); // RIGHT-TO-LEFT OVERRIDE => true
```
Determine if a character is a combining mark:
```c#
// ReadOnlyIntSet
ENSNormalize.ENSIP15.CombiningMarks.Contains(0x20E3); // COMBINING ENCLOSING KEYCAP => true
```
### Unicode Normalization Forms [NF](./ENSNormalize/NF.cs)
```c#
using ADRaffy.ENSNormalize;
// string -> string
ENSNormalize.NF.NFC("\x65\u0300"); // "\xE8"
ENSNormalize.NF.NFD("\xE8"); // "\x65\u0300"
// IEnumerable<int> -> List<int>
ENSNormalize.NF.NFC(new int[]{ 0x65, 0x300 }); // [0xE8]
ENSNormalize.NF.NFD(new int[]{ 0xE8 }); // [0x65, 0x300]
```