Author Topic: [Java] What happens when you combine support for Unicode, DM encryption, and... (Read 4285 times)

Camel · « **on:** June 26, 2008, 03:42:08 pm »

So I've just committed a major revision to my bot to enable the ability to send encrypted unicode characters. I've introduced a new ByteArray class, which implicitly does the String->UTF-8 conversion and vice-versa. This allows my encryption modules to work on the UTF-8 encoded byte arrays directly, thus circumventing the issue of worring about unicode characters.

I've got a chat splitter in my bot; the idea is that if you type a really long line of text in to the chat box and hit enter, or if a command has a long response, the core will automatically split it up in to multiple SID_CHATCOMMAND messages which are properly formatted.

Here's the catch: unicode characters manifest in UTF-8 byte arrays as byte pairs, or sometimes trios, but in order to calculate how much data to pull out of the buffer, I have to have already converted the unicode characters to a UTF-8 byte array, because bnet limits you on the size of the byte array, not the unicode string. This means that the chat splitter can potentially break up a unicode character in to two lines of text.

Any ideas on how to work around this situation?

Chavo · « **Reply #1 on:** June 26, 2008, 05:03:20 pm »

split before you convert the string (even if it means doing so outside your normal message splitter)

Camel · « **Reply #2 on:** June 27, 2008, 03:04:46 am »

I can't do that, because I don't know how long the unicode string will be once utf8 encoded. In the worst case, it will triple in size, but it would be foolish to limit the user to 66 characters.

This is roughly what I have, in very very simplified form

sendChat(prefix="/w bnu-camel ", text="some really long unicode string", crypto=DM_ENCRYPTION) {
int MAX_CHAT_LENGTH = 200; // just for shits
int length_to_pull_from_buffer = MAX_CHAT_LENGTH - prefix.toUtf8.length;
if(crypto == DM_ENCRYPTION)
length_to_pull_from_buffer = (length_to_pull_from_buffer - 1) / 2; // DM has a prefix and doubles length

for(int i = 0; i < text.length; i += length_to_pull_from_buffer) {
part = prefix + encrypt(text.substr(i, length_to_pull));
}
}

Camel · « **Reply #3 on:** June 27, 2008, 03:10:33 am »

Expanded:

Code: [Select]

	private void enqueueChat(ByteArray prefix, ByteArray text, int priority) {
		//Split up the text in to appropriate sized pieces
		int pieceSize = MAX_CHAT_LENGTH;
		if(prefix != null)
			pieceSize -= prefix.length();
		if(enabledCryptos != 0) {
			if((enabledCryptos & GenericCrypto.CRYPTO_REVERSE) != 0)
				pieceSize--; // Reverse has a prefix
			if((enabledCryptos & GenericCrypto.CRYPTO_MC) != 0)
				pieceSize--; // MC has a prefix
			if((enabledCryptos & GenericCrypto.CRYPTO_DM) != 0)
				pieceSize = (pieceSize - 1) / 2; // DM doubles in size and has a prefix
			if((enabledCryptos & GenericCrypto.CRYPTO_HEX) != 0)
				pieceSize = (pieceSize - 1) / 2; // Hex doubles in size and has a prefix
			if((enabledCryptos & GenericCrypto.CRYPTO_BASE64) != 0)
				pieceSize = (pieceSize - 1) * 3 / 4; // B64 increases 33% and has a prefix
		}

		ChatQueue cq = profile.getChatQueue();
		for(int i = 0; i < text.length(); i += pieceSize) {
			ByteArray piece = text.substring(i);
			if(i > 0) {
				// This is not the first piece; prepend ellipsis
				piece = new ByteArray("...").concat(piece);
				i -= 3;
			}
			if(piece.length() > pieceSize) {
				// This is not the last piece; append ellipsis
				piece = piece.substring(0, pieceSize - 3).concat("...".getBytes());
				i -= 3;
			}

			// Cryptos
			if(enabledCryptos != 0)
				piece = GenericCrypto.encode(piece, enabledCryptos);

			// Prepend the prefix
			if(prefix != null)
				piece = prefix.concat(piece);

			cq.enqueue(this, piece, priority);
		}
	}

Camel · « **Reply #4 on:** June 27, 2008, 03:19:07 am »

I had an idea; ByteArrayEx extends ByteArray, and keeps track of the width of each unicode char, thus allowing the splitter to obtain a hint about where it's safe to split.

Camel · « **Reply #5 on:** June 30, 2008, 09:16:35 am »

Quote from: Camel on June 27, 2008, 03:19:07 am

I had an idea; ByteArrayEx extends ByteArray, and keeps track of the width of each unicode char, thus allowing the splitter to obtain a hint about where it's safe to split.

I got about half way in to considering how ugly this solution would be before I marked ByteArray as a final class.

Any other ideas?

Joe · « **Reply #6 on:** June 30, 2008, 12:26:59 pm »

I swear this wasn't supposed to look like VB. But, what's wrong with the idea of doing it like this?

Code: [Select]

Const MaxLength = 200 //i did it for the lulz?

Proc SendText(Utf8Text)
    If Length(Utf8Text) < 200 Then Enqueue(ToUnicode(Utf8Text)); Break

    For I = 0 to Utf8Text;  I+=200
        Enqueue Substring(ToUnicode(Utf8Text), I, 200)
    Next I
End Proc

Camel · « **Reply #7 on:** June 30, 2008, 01:22:56 pm »

That doesn't prevent a utf8 encoded unicode character from beign split in half

[edit] and, additionally, is exactly what i already have

MyndFyre · « **Reply #8 on:** June 30, 2008, 05:09:36 pm »

What if you prepend the byte length of the message?

Camel · « **Reply #9 on:** June 30, 2008, 11:22:40 pm »

Quote from: MyndFyre on June 30, 2008, 05:09:36 pm

What if you prepend the byte length of the message?

That doesn't address the issue either.

Clan x86

News:

Author Topic: [Java] What happens when you combine support for Unicode, DM encryption, and... (Read 4285 times)

Camel

[Java] What happens when you combine support for Unicode, DM encryption, and...

Chavo

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Camel

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Camel

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Camel

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Camel

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Joe

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Camel

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

MyndFyre

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...

Camel

Re: [Java] What happens when you combine support for Unicode, DM encryption, and...