Author Topic: [Java] What happens when you combine support for Unicode, DM encryption, and...  (Read 3759 times)

0 Members and 1 Guest are viewing this topic.

Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
So I've just committed a major revision to my bot to enable the ability to send encrypted unicode characters. I've introduced a new ByteArray class, which implicitly does the String->UTF-8 conversion and vice-versa. This allows my encryption modules to work on the UTF-8 encoded byte arrays directly, thus circumventing the issue of worring about unicode characters.

I've got a chat splitter in my bot; the idea is that if you type a really long line of text in to the chat box and hit enter, or if a command has a long response, the core will automatically split it up in to multiple SID_CHATCOMMAND messages which are properly formatted.

Here's the catch: unicode characters manifest in UTF-8 byte arrays as byte pairs, or sometimes trios, but in order to calculate how much data to pull out of the buffer, I have to have already converted the unicode characters to a UTF-8 byte array, because bnet limits you on the size of the byte array, not the unicode string. This means that the chat splitter can potentially break up a unicode character in to two lines of text.

Any ideas on how to work around this situation?

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Offline Chavo

  • x86
  • Hero Member
  • *****
  • Posts: 2219
  • no u
    • View Profile
    • Chavoland
split before you convert the string (even if it means doing so outside your normal message splitter)

Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
I can't do that, because I don't know how long the unicode string will be once utf8 encoded. In the worst case, it will triple in size, but it would be foolish to limit the user to 66 characters.

This is roughly what I have, in very very simplified form

sendChat(prefix="/w bnu-camel ", text="some really long unicode string", crypto=DM_ENCRYPTION) {
  int MAX_CHAT_LENGTH = 200; // just for shits
  int length_to_pull_from_buffer = MAX_CHAT_LENGTH - prefix.toUtf8.length;
  if(crypto == DM_ENCRYPTION)
    length_to_pull_from_buffer = (length_to_pull_from_buffer - 1) / 2; // DM has a prefix and doubles length

  for(int i = 0; i < text.length; i += length_to_pull_from_buffer) {
    part = prefix + encrypt(text.substr(i, length_to_pull));
  }
}


<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
Expanded:
Code: [Select]
private void enqueueChat(ByteArray prefix, ByteArray text, int priority) {
//Split up the text in to appropriate sized pieces
int pieceSize = MAX_CHAT_LENGTH;
if(prefix != null)
pieceSize -= prefix.length();
if(enabledCryptos != 0) {
if((enabledCryptos & GenericCrypto.CRYPTO_REVERSE) != 0)
pieceSize--; // Reverse has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_MC) != 0)
pieceSize--; // MC has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_DM) != 0)
pieceSize = (pieceSize - 1) / 2; // DM doubles in size and has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_HEX) != 0)
pieceSize = (pieceSize - 1) / 2; // Hex doubles in size and has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_BASE64) != 0)
pieceSize = (pieceSize - 1) * 3 / 4; // B64 increases 33% and has a prefix
}

ChatQueue cq = profile.getChatQueue();
for(int i = 0; i < text.length(); i += pieceSize) {
ByteArray piece = text.substring(i);
if(i > 0) {
// This is not the first piece; prepend ellipsis
piece = new ByteArray("...").concat(piece);
i -= 3;
}
if(piece.length() > pieceSize) {
// This is not the last piece; append ellipsis
piece = piece.substring(0, pieceSize - 3).concat("...".getBytes());
i -= 3;
}

// Cryptos
if(enabledCryptos != 0)
piece = GenericCrypto.encode(piece, enabledCryptos);

// Prepend the prefix
if(prefix != null)
piece = prefix.concat(piece);

cq.enqueue(this, piece, priority);
}
}

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
I had an idea; ByteArrayEx extends ByteArray, and keeps track of the width of each unicode char, thus allowing the splitter to obtain a hint about where it's safe to split.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
I had an idea; ByteArrayEx extends ByteArray, and keeps track of the width of each unicode char, thus allowing the splitter to obtain a hint about where it's safe to split.

I got about half way in to considering how ugly this solution would be before I marked ByteArray as a final class.

Any other ideas? :)

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Offline Joe

  • B&
  • Moderator
  • Hero Member
  • *****
  • Posts: 10319
  • In Soviet Russia, text read you!
    • View Profile
    • Github
I swear this wasn't supposed to look like VB. But, what's wrong with the idea of doing it like this?

Code: [Select]
Const MaxLength = 200 //i did it for the lulz?

Proc SendText(Utf8Text)
    If Length(Utf8Text) < 200 Then Enqueue(ToUnicode(Utf8Text)); Break

    For I = 0 to Utf8Text;  I+=200
        Enqueue Substring(ToUnicode(Utf8Text), I, 200)
    Next I
End Proc
I'd personally do as Joe suggests

You might be right about that, Joe.


Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
That doesn't prevent a utf8 encoded unicode character from beign split in half

[edit] and, additionally, is exactly what i already have :)

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Offline MyndFyre

  • Boticulator Extraordinaire
  • x86
  • Hero Member
  • *****
  • Posts: 4540
  • The wait is over.
    • View Profile
    • JinxBot :: the evolution in boticulation
What if you prepend the byte length of the message?
I have a programming folder, and I have nothing of value there

Running with Code has a new home!

Our species really annoys me.

Offline Camel

  • Hero Member
  • *****
  • Posts: 1703
    • View Profile
    • BNU Bot
What if you prepend the byte length of the message?

That doesn't address the issue either.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!