News:

So the widespread use of emojis these days kinda makes forum smileys pointless, yeah?

Main Menu

[Java] What happens when you combine support for Unicode, DM encryption, and...

Started by Camel, June 26, 2008, 03:42:08 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Camel

So I've just committed a major revision to my bot to enable the ability to send encrypted unicode characters. I've introduced a new ByteArray class, which implicitly does the String->UTF-8 conversion and vice-versa. This allows my encryption modules to work on the UTF-8 encoded byte arrays directly, thus circumventing the issue of worring about unicode characters.

I've got a chat splitter in my bot; the idea is that if you type a really long line of text in to the chat box and hit enter, or if a command has a long response, the core will automatically split it up in to multiple SID_CHATCOMMAND messages which are properly formatted.

Here's the catch: unicode characters manifest in UTF-8 byte arrays as byte pairs, or sometimes trios, but in order to calculate how much data to pull out of the buffer, I have to have already converted the unicode characters to a UTF-8 byte array, because bnet limits you on the size of the byte array, not the unicode string. This means that the chat splitter can potentially break up a unicode character in to two lines of text.

Any ideas on how to work around this situation?

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Chavo

split before you convert the string (even if it means doing so outside your normal message splitter)

Camel

I can't do that, because I don't know how long the unicode string will be once utf8 encoded. In the worst case, it will triple in size, but it would be foolish to limit the user to 66 characters.

This is roughly what I have, in very very simplified form

sendChat(prefix="/w bnu-camel ", text="some really long unicode string", crypto=DM_ENCRYPTION) {
  int MAX_CHAT_LENGTH = 200; // just for shits
  int length_to_pull_from_buffer = MAX_CHAT_LENGTH - prefix.toUtf8.length;
  if(crypto == DM_ENCRYPTION)
    length_to_pull_from_buffer = (length_to_pull_from_buffer - 1) / 2; // DM has a prefix and doubles length

  for(int i = 0; i < text.length; i += length_to_pull_from_buffer) {
    part = prefix + encrypt(text.substr(i, length_to_pull));
  }
}


<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Camel

Expanded:
private void enqueueChat(ByteArray prefix, ByteArray text, int priority) {
//Split up the text in to appropriate sized pieces
int pieceSize = MAX_CHAT_LENGTH;
if(prefix != null)
pieceSize -= prefix.length();
if(enabledCryptos != 0) {
if((enabledCryptos & GenericCrypto.CRYPTO_REVERSE) != 0)
pieceSize--; // Reverse has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_MC) != 0)
pieceSize--; // MC has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_DM) != 0)
pieceSize = (pieceSize - 1) / 2; // DM doubles in size and has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_HEX) != 0)
pieceSize = (pieceSize - 1) / 2; // Hex doubles in size and has a prefix
if((enabledCryptos & GenericCrypto.CRYPTO_BASE64) != 0)
pieceSize = (pieceSize - 1) * 3 / 4; // B64 increases 33% and has a prefix
}

ChatQueue cq = profile.getChatQueue();
for(int i = 0; i < text.length(); i += pieceSize) {
ByteArray piece = text.substring(i);
if(i > 0) {
// This is not the first piece; prepend ellipsis
piece = new ByteArray("...").concat(piece);
i -= 3;
}
if(piece.length() > pieceSize) {
// This is not the last piece; append ellipsis
piece = piece.substring(0, pieceSize - 3).concat("...".getBytes());
i -= 3;
}

// Cryptos
if(enabledCryptos != 0)
piece = GenericCrypto.encode(piece, enabledCryptos);

// Prepend the prefix
if(prefix != null)
piece = prefix.concat(piece);

cq.enqueue(this, piece, priority);
}
}

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Camel

I had an idea; ByteArrayEx extends ByteArray, and keeps track of the width of each unicode char, thus allowing the splitter to obtain a hint about where it's safe to split.

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Camel

Quote from: Camel on June 27, 2008, 03:19:07 AM
I had an idea; ByteArrayEx extends ByteArray, and keeps track of the width of each unicode char, thus allowing the splitter to obtain a hint about where it's safe to split.

I got about half way in to considering how ugly this solution would be before I marked ByteArray as a final class.

Any other ideas? :)

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

Joe

I swear this wasn't supposed to look like VB. But, what's wrong with the idea of doing it like this?

Const MaxLength = 200 //i did it for the lulz?

Proc SendText(Utf8Text)
    If Length(Utf8Text) < 200 Then Enqueue(ToUnicode(Utf8Text)); Break

    For I = 0 to Utf8Text;  I+=200
        Enqueue Substring(ToUnicode(Utf8Text), I, 200)
    Next I
End Proc
Quote from: Camel on June 09, 2009, 04:12:23 PMI'd personally do as Joe suggests

Quote from: AntiVirus on October 19, 2010, 02:36:52 PM
You might be right about that, Joe.


Camel

That doesn't prevent a utf8 encoded unicode character from beign split in half

[edit] and, additionally, is exactly what i already have :)

<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!

MyndFyre

Quote from: Joe on January 23, 2011, 11:47:54 PM
I have a programming folder, and I have nothing of value there

Running with Code has a new home!

Quote from: Rule on May 26, 2009, 02:02:12 PMOur species really annoys me.

Camel


<Camel> i said what what
<Blaze> in the butt
<Camel> you want to do it in my butt?
<Blaze> in my butt
<Camel> let's do it in the butt
<Blaze> Okay!