Troublesome characters: PUA
Multi-byte characters within PUA that cause some amount of trouble.
note: PUA is U+E000-U+F8FF
Table of contents
Codepoint | Description | WL |
---|---|---|
U+F360 | \[InvisibleSpace] |
|
U+F380 | \[NegativeVeryThinSpace] |
|
U+F39E | \[ImplicitPlus] |
|
U+F51E | \[InlinePart] |
|
U+F527 | \[SelectionPlaceholder] |
|
U+F528 | \[Placeholder] |
|
U+F760 | \[AlignmentMarker] |
|
U+F761 | \[LeftSkeleton] |
|
U+F762 | \[RightSkeleton] |
|
U+F764 | \[AliasDelimiter] |
|
U+F765 | \[InvisibleComma] |
|
U+F767 | \[ErrorIndicator] |
|
U+F76D | \[InvisibleApplication] |
|
U+F7C0 | \) |
|
U+F7C1 | \! |
|
U+F7C2 | \@ |
|
U+F7C5 | \% |
|
U+F7C6 | \^ |
|
U+F7C7 | \& |
|
U+F7C8 | \* |
|
U+F7C9 | \( |
|
U+F7CA | \_ |
|
U+F7CB | \+ |
|
U+F7CC | \/ |
|
U+F7CD | \` |
|
U+F7FC | \[NumberComma] |
U+F360 \[InvisibleSpace]
\[InvisibleSpace] U+f360
invisible
not the same thing as U+200B ZERO WIDTH SPACE
Prints as
should be public: https://bugs.wolfram.com/show?number=42036
U+F380, U+F382, U+F383, U+F384 Negative Spaces
\[NegativeVeryThinSpace] U+f380
invisible
\[NegativeThinSpace] U+f382
invisible
\[NegativeMediumSpace] U+f383
invisible
\[NegativeThickSpace] U+f384
invisible
U+F39E \[ImplicitPlus]
“ImplicitPlus” -> {PunctuationCharacter, 16^^f39e, < | “ASCIIReplacements” -> {“+”} | >}, |
invisible
not same as U+2064 INVISIBLE PLUS
U+F3A2 \[COMPATIBILITYNoBreak]
“COMPATIBILITYNoBreak” -> {UnsupportedCharacter, 16^^f3a2, < | >}, |
invisible
U+F3AD \[AutoSpace]
\[AutoSpace] U+f3ad
invisible
U+F3B1 \[Continuation]
\[Continuation] U+f3b1
U+F3B2 \[RoundSpaceIndicator]
\[RoundSpaceIndicator] U+f3b2
U+F3B3, U+F3B4 Invisible Script Bases
“InvisiblePrefixScriptBase” -> {PunctuationCharacter, 16^^f3b3, < | “ASCIIReplacements” -> {“”} | >}, |
invisible
“InvisiblePostfixScriptBase” -> {PunctuationCharacter, 16^^f3b4, < | “ASCIIReplacements” -> {“”} | >}, |
invisible
U+F3BD, U+F3BE, U+F3BF, U+F3C6 Page Breaks
\[PageBreakAbove] U+f3bd
invisible
\[PageBreakBelow] U+f3be
invisible
\[DiscretionaryPageBreakAbove] U+f3bf
invisible
\[DiscretionaryPageBreakBelow] U+f3c6
invisible
U+F51E \[InlinePart]
“InlinePart” -> {UnsupportedCharacter, 16^^f51e, < | (“ASCIIReplacements” -> {“@>”}) | >}, |
U+F760 \[AlignmentMarker]
\[AlignmentMarker] U+f760
INVISIBLE
Prints as
U+F761, U+F762 Skeletons
“LeftSkeleton” -> {UninterpretableCharacter, 16^^f761, < | “ASCIIReplacements” -> {“«”} | >}, |
“RightSkeleton” -> {UninterpretableCharacter, 16^^f762, < | “ASCIIReplacements” -> {“»”} | >}, |
The weird problem of seeing Asmall and Bsmall in the terminal on MacOS
Prints as
SF Mono Regular 11 pt.
Mentioned here:
http://www.renderx.com/glyphlist-old.txt
https://opensource.apple.com/source/cups/cups-30/data/psglyphs.auto.html
mentions Asmall:
https://tex.stackexchange.com/questions/56594/where-does-xelatex-look-for-smallcaps-shapes-in-fonts/56623
semi-explanation: https://mail-archive.wolfram.com/archive/l-frontend/2021/Jun00/0057.html
https://bugs.wolfram.com/show?number=411375
This is a bug report about how Failure[] objects are formatted, but it touches on a lot of other things.
To reproduce, on a Mac command-line, do:
brenton@brenton2maclap compile % /Applications/Mathematica.app/Contents/MacOS/WolframKernel -noprompt -run a=ToString[PacletObject["Foo"]]\;Print[a]\;Exit[] “No appropriate paclet with name 5 is installed.” brenton@brenton2maclap compile %
On a Mac at least, the [LeftSkeleton] and [RightSkeleton] characters appear as a small A and a small B
See screenshot
There is a lot going on here.
Failure objects are being Shortened
-noprompt is setting DefaultPrintForm to InputForm
Failure[] is assuming StandardForm
printing PUA characters
I am running into this because this is how LSPServer is started and how it logs problems, so this is not contrived or anything. I see these badly formatted Failure[] objects all the time in log files and the command-line.
Anything to help improve the behavior here would be good.
for background:
I started a thread on t-fonts about where the actual small A and small B glyphs are coming from: https://mail-archive.wolfram.com/archive/t-fonts/2021/Jun00/0000.html
https://mail-archive.wolfram.com/archive/t-fonts/2021/Jun00/0000.html
https://mail-archive.wolfram.com/archive/l-frontend/2021/Jun00/0043.html
Not just Sublime though, the default font for Terminal as well. Which is SF Mono Regular 11
For example:
which is a confluence of several issues and I have reported as: [ https://bugs.wolfram.com/show?number=411375 | https://bugs.wolfram.com/show?number=411375 ]
I think a fallback mechanism is definitely being used.
SF is installed here: /Applications/Utilities/Terminal.app/Contents/Resources/Fonts
but opening up in Glyphs application does not show the small A glyph.
So I need to learn more about font fallbacks.
Brenton
—– On Jun 16, 2021, at 11:35 AM, John Fultz jfultz@wolfram.com wrote:
Ultimately, you’re asking what font Sublime was displaying these characters in. That’s not necessarily an easy question to answer.
If the specified code point had a glyph in the base font that Sublime was using, then that’s almost certainly where it came from. I.e., if the base font Sublime was using is in the list returned by Ian’s command, then there’s your answer. That font had the glyph, for some reason (why it would have such a glyph in the PUA is a question the font’s designer would have to answer), and Sublime just drew it.
But if the glyph didn’t exist in the base font that Sublime was using, then there’s one of two possibilities…
- Sublime itself is actively using the PUA in some way. If so, you’d have to talk to the Sublime developers.
- Either Sublime or the operating system is using some sort of font fallback mechanism. This is not uncommon. It’s what allows us to be able to, e.g., display Japanese characters even when we’re using a font that doesn’t have Japanese glyphs. Windows and Mac both have font fallback mechanisms, and both of their systems are documented…but not well documented. Definitely for the Windows one, there are cases which I’ve not been able to easily comprehend, except in some sort of hand-waving way. Furthermore, it’s entirely possible that Sublime could have created their own font fallback mechanism that overrides the one in the system.
Incidentally, the FE should never be exporting any PUA character in raw form. The FE forcefully converts all PUA characters (and a number of others, too) to long-names, even if you try to save text as UTF-8 or whatever.
-John
FWIW, I see the small A at 0xF761 using FontForge.
Checking with FontForge helped because I could see that the glyph really was in the SF Mono font that I was looking at!
I don’t think there is anything to with fallbacks or anything.
And then searching for “small caps” (thanks Brett) gave some more info.
[ https://en.wikipedia.org/wiki/Small_caps | https://en.wikipedia.org/wiki/Small_caps ] |
[ https://github.com/fontforge/fontforge/issues/2448 | https://github.com/fontforge/fontforge/issues/2448 ] |
Going to Glyphs, I can now see that the glyphs exist, but they are not mapped to Unicode code point.
It seems like it was a recommendation from Adobe a long time ago to map these characters to the PUA so they could be accessible, but that is no longer the recommendation.
Software like: FontForge however Terminal.app and Sublime render their fonts etc. are still doing this mapping.
But e.g. Glyphs is not doing the mapping.
Brenton
screenshot 411375_0001.png
U+F764 \[AliasDelimiter]
\[AliasDelimiter] f764 this is special letterlike, but not really
I argue that this should not be serialized in notebooks
It is an input method
Prints as
what is \ [AliasDelimiter] ?
[AliasDelimiter] is strange
UnsupportedCharacter?
something new?
UninterpretableCharacter?
it is currently letterlike
ASK ON l-kernel
kernel suggestion:
make [AliasDelimiter] uninterpretable
what is [AliasIndicator] ?
FE suggestion: it currently serializes in notebooks
maybe it should not serialize? it is an input mode
scan for [AliasDelimiter] in layout
in VideoFrames.nb it is the name of the bookmark
/Users/brenton/development/stash/PAC/resourcefunctionhelpers/ResourceFunctionHelpers/Kernel/FunctionParity.wl
[AliasDelimiter] is in a comment
stray AliasDelimiter: /Users/brenton/development/cvs-development/Mathematica/Documentation/English/System/ReferencePages/Symbols/BatchApplied.nb
talked about in UA meeting thurs sep 23 2021
SW did agree:
[AliasDelimiter] should be a syntax error
but kept saying “it’s not the most important thing”
U+F765 \[InvisibleComma]
“InvisibleComma” -> {PunctuationCharacter, 16^^f765, < | “ASCIIReplacements” -> {“,”, “”} | >}, |
invisible
Prints as
not same as U+2063 INVISIBLE SEPARATOR
U+F767 \[ErrorIndicator]
“ErrorIndicator” -> {UninterpretableCharacter, 16^^f767, < | “ASCIIReplacements” -> {“^^^”} | >}, |
Prints as
U+F76D \[InvisibleApplication]
“InvisibleApplication” -> {PunctuationCharacter, 16^^f76d, < | “ASCIIReplacements” -> {“@”, “”} | >}, |
invisible
Prints as
not same as U+2061 FUNCTION APPLICATION
U+F7C0 - U+F7CD Linear Syntax
\) U+f7c0
\! U+f7c1
\@ U+f7c2
//UNUSED: constexpr codepoint CODEPOINT_LINEARSYNTAX_HASH(0xf7c3);
//UNUSED: constexpr codepoint CODEPOINT_LINEARSYNTAX_DOLLAR(0xf7c4);
\% U+f7c5
\^ U+f7c6
\& U+f7c7
\* U+f7c8
\( U+f7c9
\_ U+f7ca
\+ U+f7cb
\/ U+f7cc
\` U+f7cd
There should be a \
describe terrible linear syntax problem here
U+F7FC \[NumberComma]
confusible with COMMA
FE - Kernel difference
troublesome characters
https://bugs.wolfram.com/show?number=172258
Ran into this while fuzz testing. These are the characters that are documented as letter-like, yet result in a RowBox[{“a”,”xxx”,”b”}] when typed into the FE:
[Placeholder]
[SelectionPlaceholder]
For example, typing a[Dash]b into the FE results in RowBox[{“a”, “[Dash]”, “b”}] Expected: was “a[Dash]b”