Multi-byte characters within PUA that cause some amount of trouble.

note: PUA is U+E000-U+F8FF

Table of contents

Codepoint Description WL
U+F360   \[InvisibleSpace]
U+F380   \[NegativeVeryThinSpace]
U+F39E   \[ImplicitPlus]
U+F51E   \[InlinePart]
U+F527   \[SelectionPlaceholder]
U+F528   \[Placeholder]
U+F760   \[AlignmentMarker]
U+F761   \[LeftSkeleton]
U+F762   \[RightSkeleton]
U+F764   \[AliasDelimiter]
U+F765   \[InvisibleComma]
U+F767   \[ErrorIndicator]
U+F76D   \[InvisibleApplication]
U+F7C0   \)
U+F7C1   \!
U+F7C2   \@
U+F7C5   \%
U+F7C6   \^
U+F7C7   \&
U+F7C8   \*
U+F7C9   \(
U+F7CA   \_
U+F7CB   \+
U+F7CC   \/
U+F7CD   \`
U+F7FC   \[NumberComma]

U+F360 \[InvisibleSpace]

\[InvisibleSpace] U+f360

invisible

not the same thing as U+200B ZERO WIDTH SPACE

Prints as 

should be public: https://bugs.wolfram.com/show?number=42036

U+F380, U+F382, U+F383, U+F384 Negative Spaces

\[NegativeVeryThinSpace] U+f380

invisible

\[NegativeThinSpace] U+f382

invisible

\[NegativeMediumSpace] U+f383

invisible

\[NegativeThickSpace] U+f384

invisible

U+F39E \[ImplicitPlus]

“ImplicitPlus” -> {PunctuationCharacter, 16^^f39e, < “ASCIIReplacements” -> {“+”} >},

invisible

not same as U+2064 INVISIBLE PLUS

U+F3A2 \[COMPATIBILITYNoBreak]

“COMPATIBILITYNoBreak” -> {UnsupportedCharacter, 16^^f3a2, <   >},

invisible

U+F3AD \[AutoSpace]

\[AutoSpace] U+f3ad

invisible

U+F3B1 \[Continuation]

\[Continuation] U+f3b1

U+F3B2 \[RoundSpaceIndicator]

\[RoundSpaceIndicator] U+f3b2

U+F3B3, U+F3B4 Invisible Script Bases

“InvisiblePrefixScriptBase” -> {PunctuationCharacter, 16^^f3b3, < “ASCIIReplacements” -> {“”} >},

invisible

“InvisiblePostfixScriptBase” -> {PunctuationCharacter, 16^^f3b4, < “ASCIIReplacements” -> {“”} >},

invisible

U+F3BD, U+F3BE, U+F3BF, U+F3C6 Page Breaks

\[PageBreakAbove] U+f3bd

invisible

\[PageBreakBelow] U+f3be

invisible

\[DiscretionaryPageBreakAbove] U+f3bf

invisible

\[DiscretionaryPageBreakBelow] U+f3c6

invisible

U+F51E \[InlinePart]

“InlinePart” -> {UnsupportedCharacter, 16^^f51e, < (“ASCIIReplacements” -> {“@>”}) >},

U+F760 \[AlignmentMarker]

\[AlignmentMarker] U+f760

INVISIBLE

Prints as 

U+F761, U+F762 Skeletons

“LeftSkeleton” -> {UninterpretableCharacter, 16^^f761, < “ASCIIReplacements” -> {“«”} >},
“RightSkeleton” -> {UninterpretableCharacter, 16^^f762, < “ASCIIReplacements” -> {“»”} >},

The weird problem of seeing Asmall and Bsmall in the terminal on MacOS

Prints as  

SF Mono Regular 11 pt.

Mentioned here:

http://www.renderx.com/glyphlist-old.txt

https://opensource.apple.com/source/cups/cups-30/data/psglyphs.auto.html

mentions Asmall:

https://tex.stackexchange.com/questions/56594/where-does-xelatex-look-for-smallcaps-shapes-in-fonts/56623

semi-explanation: https://mail-archive.wolfram.com/archive/l-frontend/2021/Jun00/0057.html

https://bugs.wolfram.com/show?number=411375

This is a bug report about how Failure[] objects are formatted, but it touches on a lot of other things.

To reproduce, on a Mac command-line, do:

brenton@brenton2maclap compile % /Applications/Mathematica.app/Contents/MacOS/WolframKernel -noprompt -run a=ToString[PacletObject["Foo"]]\;Print[a]\;Exit[] “No appropriate paclet with name  5  is installed.” brenton@brenton2maclap compile %

On a Mac at least, the [LeftSkeleton] and [RightSkeleton] characters appear as a small A and a small B

See screenshot

There is a lot going on here.

Failure objects are being Shortened

-noprompt is setting DefaultPrintForm to InputForm

Failure[] is assuming StandardForm

printing PUA characters

I am running into this because this is how LSPServer is started and how it logs problems, so this is not contrived or anything. I see these badly formatted Failure[] objects all the time in log files and the command-line.

Anything to help improve the behavior here would be good.

for background:

I started a thread on t-fonts about where the actual small A and small B glyphs are coming from: https://mail-archive.wolfram.com/archive/t-fonts/2021/Jun00/0000.html

https://mail-archive.wolfram.com/archive/t-fonts/2021/Jun00/0000.html

https://mail-archive.wolfram.com/archive/l-frontend/2021/Jun00/0043.html

Not just Sublime though, the default font for Terminal as well. Which is SF Mono Regular 11

For example:

which is a confluence of several issues and I have reported as: [ https://bugs.wolfram.com/show?number=411375 | https://bugs.wolfram.com/show?number=411375 ]

I think a fallback mechanism is definitely being used.

SF is installed here: /Applications/Utilities/Terminal.app/Contents/Resources/Fonts

but opening up in Glyphs application does not show the small A glyph.

So I need to learn more about font fallbacks.

Brenton

—– On Jun 16, 2021, at 11:35 AM, John Fultz jfultz@wolfram.com wrote:

Ultimately, you’re asking what font Sublime was displaying these characters in. That’s not necessarily an easy question to answer.

If the specified code point had a glyph in the base font that Sublime was using, then that’s almost certainly where it came from. I.e., if the base font Sublime was using is in the list returned by Ian’s command, then there’s your answer. That font had the glyph, for some reason (why it would have such a glyph in the PUA is a question the font’s designer would have to answer), and Sublime just drew it.

But if the glyph didn’t exist in the base font that Sublime was using, then there’s one of two possibilities…

  • Sublime itself is actively using the PUA in some way. If so, you’d have to talk to the Sublime developers.
  • Either Sublime or the operating system is using some sort of font fallback mechanism. This is not uncommon. It’s what allows us to be able to, e.g., display Japanese characters even when we’re using a font that doesn’t have Japanese glyphs. Windows and Mac both have font fallback mechanisms, and both of their systems are documented…but not well documented. Definitely for the Windows one, there are cases which I’ve not been able to easily comprehend, except in some sort of hand-waving way. Furthermore, it’s entirely possible that Sublime could have created their own font fallback mechanism that overrides the one in the system.

Incidentally, the FE should never be exporting any PUA character in raw form. The FE forcefully converts all PUA characters (and a number of others, too) to long-names, even if you try to save text as UTF-8 or whatever.

-John

FWIW, I see the small A at 0xF761 using FontForge.

Checking with FontForge helped because I could see that the glyph really was in the SF Mono font that I was looking at!

I don’t think there is anything to with fallbacks or anything.

And then searching for “small caps” (thanks Brett) gave some more info.

[ https://en.wikipedia.org/wiki/Small_caps https://en.wikipedia.org/wiki/Small_caps ]
[ https://github.com/fontforge/fontforge/issues/2448 https://github.com/fontforge/fontforge/issues/2448 ]

Going to Glyphs, I can now see that the glyphs exist, but they are not mapped to Unicode code point.

It seems like it was a recommendation from Adobe a long time ago to map these characters to the PUA so they could be accessible, but that is no longer the recommendation.

Software like: FontForge however Terminal.app and Sublime render their fonts etc. are still doing this mapping.

But e.g. Glyphs is not doing the mapping.

Brenton

screenshot 411375_0001.png

U+F764 \[AliasDelimiter]

\[AliasDelimiter] f764 this is special letterlike, but not really

I argue that this should not be serialized in notebooks

It is an input method

Prints as 

what is \ [AliasDelimiter] ?

[AliasDelimiter] is strange

UnsupportedCharacter?

something new?

UninterpretableCharacter?

it is currently letterlike

ASK ON l-kernel

kernel suggestion:

make [AliasDelimiter] uninterpretable

what is [AliasIndicator] ?

FE suggestion: it currently serializes in notebooks

maybe it should not serialize? it is an input mode

scan for [AliasDelimiter] in layout

in VideoFrames.nb it is the name of the bookmark

/Users/brenton/development/stash/PAC/resourcefunctionhelpers/ResourceFunctionHelpers/Kernel/FunctionParity.wl

[AliasDelimiter] is in a comment

stray AliasDelimiter: /Users/brenton/development/cvs-development/Mathematica/Documentation/English/System/ReferencePages/Symbols/BatchApplied.nb

talked about in UA meeting thurs sep 23 2021

SW did agree:

[AliasDelimiter] should be a syntax error

but kept saying “it’s not the most important thing”

U+F765 \[InvisibleComma]

“InvisibleComma” -> {PunctuationCharacter, 16^^f765, < “ASCIIReplacements” -> {“,”, “”} >},

invisible

Prints as 

not same as U+2063 INVISIBLE SEPARATOR

U+F767 \[ErrorIndicator]

“ErrorIndicator” -> {UninterpretableCharacter, 16^^f767, < “ASCIIReplacements” -> {“^^^”} >},

Prints as 

U+F76D \[InvisibleApplication]

“InvisibleApplication” -> {PunctuationCharacter, 16^^f76d, < “ASCIIReplacements” -> {“@”, “”} >},

invisible

Prints as 

not same as U+2061 FUNCTION APPLICATION

U+F7C0 - U+F7CD Linear Syntax

\) U+f7c0

\! U+f7c1

\@ U+f7c2

//UNUSED: constexpr codepoint CODEPOINT_LINEARSYNTAX_HASH(0xf7c3);

//UNUSED: constexpr codepoint CODEPOINT_LINEARSYNTAX_DOLLAR(0xf7c4);

\% U+f7c5

\^ U+f7c6

\& U+f7c7

\* U+f7c8

\( U+f7c9

\_ U+f7ca

\+ U+f7cb

\/ U+f7cc

\` U+f7cd

There should be a \ character

describe terrible linear syntax problem here

U+F7FC \[NumberComma]

confusible with COMMA

FE - Kernel difference

troublesome characters

https://bugs.wolfram.com/show?number=172258

Ran into this while fuzz testing. These are the characters that are documented as letter-like, yet result in a RowBox[{“a”,”xxx”,”b”}] when typed into the FE:

[Placeholder]

[SelectionPlaceholder]

For example, typing a[Dash]b into the FE results in RowBox[{“a”, “[Dash]”, “b”}] Expected: was “a[Dash]b”