Peer-to-peer (P2P) is a buzz word these days. It is an umbrella term for any sort of communication that is decentralized from small scale direct-connect on AIM to large scale file sharing. A Virtual LAN (VLAN) is a logical network that exists ontop of an existing network. It is similar to a VPN but is independent of the real network. It's usually used to network host and multiple virtual machines together, an invaluable test tool.
Combining the two ideas could yield an invaluable networking tool that transcends network topologies such as NATs and firewalls and yet is independent of dedicated servers. It would allow friends and family to network their machines near and far. A similar tool exists called Hamachi, however it isn't truly P2P. It still relies on the vendor's server to coordinate networks. However, a P2P VLAN has more powerful implications besides transcending network topologies. It can have other features such as true anonymity, use multiple peers as one logical tunnel, and encrypt underlying communication between two nodes.
The concept of a P2P VLAN is based on a very simple and straightforward analogy. Suppose there is a room full of people and suppose each person has friends. Now suppose that each person has a concealed card from a standard 52 card deck. Only the owner of the card knows his/her card. Now, if one were to pass a message to the owner of the Ace of Diamonds (AD), one would need to know where to send data. To solve this problem, one asks each of one's friends who has the AD and those friends repeat the process. The owner of the AD can respond that he/she knows who has the AD without revealing that he/she has the AD. Eventually that response reaches the original querier and most likely through multiple friends. To note, one cannot conclude who has the AD since the response could have been forwarded. Now the original queier can pass a message by having friends relay the message. Remember, he/she may have many different routes and can randomize which friend he/she relays a message through. The effect? Nobody knows who has the AD yet communication can take place. In this analogy, the friends are directly connected peers and the cards represent IP addresses.
To test this idea, I wrote a demonstration that uses the
tun(4) psuedo network interface. It intializes a tun interface and then attempts to connect to each peer, up to M peers in a list file. The demonstration code does not choose an IP, one must manually configure the tun interface. Since our primary concern is routing, I conducted a test with three machines and forced a specific network configuration.
Our three machines:
LIGHTBULB - 172.23.0.1
BLENDER - 172.23.0.2
BOTTLE - 172.23.0.3
I forced a configuration of:
BOTTLE <-> LIGHTBULB <-> BLENDER
I did this by stripping the peer list file from BOTTLE and BLENDER. I chose this network configuration so BLENDER and BOTTLE are forced to route through LIGHTBULB when communicating with each other. This would demonstrate the routing technique described in our analogy above.
In order to coordinate the network, I designed a simple protocol based on battle.net's message format.
/* This header defines various events.
* Protocol format:
* <uint8_t event><uint16_t size><void>
* void is event specific ... all comments below refer to void format.
* The protocol is still in development.
*/
#ifndef PROTO_H
#define PROTO_H
/* Authorization technique.
* peer1 connects to peer2
* peer1 -> uint8_t clientid
* peer2 -> uint8_t clientid
* peer2 -> P2PI_CLIENTINFOS
* peer1 -> P2PI_CLIENTINFOR
*/
/* Other semantics not yet determined */
/* Non-existent Client ID */
#define P2PI_NULLCLIENT 0x00
/* These refer to the event byte */
#define P2PI_CLIENTINFOS 0x01 /* Client information exchange <struct client_info> */
#define P2PI_CLIENTINFOR 0x02 /* Response event <struct client_info> */
#define P2PI_WHOHASS 0x03 /* Route request <in_addr><void> ... void data is for anything arbitrary (e.g. public key) */
#define P2PI_WHOHASR 0x04 /* Route request <in_addr><void> ... void data is for anything arbitrary (e.g. public key) */
#define P2PI_PACKET 0x05 /* Send packet <in_addr><packet> */
#endif
The comments should be self explanatory.
The tun interface on FreeBSD has a mode (TUNSLMODE) that has if_tun prepends sockaddr to each packet. I make use of this so that I can easily extract in_addr without examining the packet in anyway. The struct in_addr is used for routing and route request instead of sockaddr_in as to not reveal the destination port (althought the demonstation does not encrypt the packets for simplicity).
To handle routing and route requests, the demonstration uses a hash table and struct route_info:
struct route_info {
int route; /* Can route */
int routereq; /* Route request in progress */
};
The hash table relates in_addr -> route_info.
Furthermore, each connection is associated with a struct con_info which holds the file descriptor, hash table and other miscellaneous necessary information.
struct con_info {
int fd, mode;
struct sockaddr_in sin;
struct client_info cname;
uint8_t cid, buff[P2PI_BUFF];
size_t buffsz;
struct hash_table route;
};
Upon recieving a route request, the connection requesting an address is hashed as requesting a route while the demonstration forwards the route request to peers. Upon a response, the demonstration marks the address as routable on each connection that responded, forwards the response to all connections waiting for a route, and then unmarks the route request. While this is happening, for sanity reasons, packets read from the tun interface destined for an unroutable address are simply discarded. Queueing the packets and then sending them on response could have serious consequences since TCP might behave to the delays while a route is in pursuit and queue more packets that might confuse the destination when they are finally sent.
Once an address is marked routable, packets read from tun are dispatched through connections able to route to the destined address. Peers who recieve the packets should also behave accordingly.
One might ask what would happen if a peer in between a route died. Since each connection is associated with a hash_table that holds route_info, the hash table for that connection would have been cleared and the the packet would be discarded since it is no longer routable. The peer will do the above method to try to establish a new route.
In the experiment, I did a simple a test by pinging BOTTLE from BLENDER. The route was established through LIGHTBULB almost instantaneously and got ping times as low as 50ms. I was even able to ssh to BOTTLE from BLENDER over this P2P VLAN, all the while LIGHTBULB is routing under the covers. To test the above scenario with a dead route, I restarted BOTTLE's demonstration which killed the route to BOTTLE. Soon after, ping packets routed to LIGHTBULB were discarded until LIGHTBULB re-established connection with BOTTLE and the route re-established between LIGHTBULB and BOTTLE.
Here is a picture of the transactions:
That experiment concludes the feasability of a P2P VLAN and demonstrates how one might structure the software to handle the routing.
The real software will probably use UDP since it is probably bad to encapsulate a reliable protocol TCP/IP into TCP/IP packets. But the bigger issue is how to deal with liars. A liar is a peer that behaves badly and wrongfully responds to requests and perhaps uses other malicious tatics to damage the network.
One of the major security tatics, aside of encryption, is to use multiple and random routes to prevent any one peer from accumulating all packets. That said, we treat multiple peers as one logical tunnel. Although, this serves multiple purposes aside of security, it allows for more robust communication, as well as distributes bandwidth usage among peers. Most peers will not want to dedicate a large portion of their bandwidth to routing.
The first and foremost problem to be dealt with is automatic determination of a node's IP. A node could ask its peers if an IP is in use, but a peer could lie. To overcome this problem, a node could generate a public key, a timely process, and then take a hash of the key. If we were to use the 10.x.y.z IP block, we would want a 24 bit hash, perhaps I will write a CRC24 method. Although, 256 bit keys would have high collision rates on a 24 bit hash, the collision rate of IP addresses would be near one in sixteen million. This method ensures a) Most likely a unique IP is chosen b) Easy validation of liar route responses (the public key is appended to responses) c) Difficulty to target any one specific IP address. CRC24 might not be a good enough hash, a real cryptographic hash might have to be created to do the hashing since CRC can supposedly be reversed to a degree. Even though collision rates between 256 bit keys and CRC hashes would be high, one could somehow possibly reverse the process and pick a 256 bit number from one of the possibilities. But that only fixes one of many problems.
Other measures taken could be heuristics (using known information) and tolerance. Tolerance assumes that a majority of peers are not liars. It assumes that the majority of matching responses are the correct response, and if there is no majority response, the responses are discarded. To note, this would require each peer to have a minimum of three direct peers. However for small networks, this could be easily overcome. As the network grows larger of fully functional peers, it can tolerate more liars.
I will compile a list of experiments to test each security measure or a combination of the security measures. When I get more time, I will code the actual project with a friend. It's lots of fun
A more interesting aspect of a P2P VLAN, is that it could be used not only for small things but for wide scale. Perhaps it could be a full fledge virtual Internet ... all that would need be done is a p2p-driven DNS system. Again, such a P2P VLAN ensures anonymity, security and privacy.
Please comment ... especially on how to deal with lying peers.
Until next time
See also
http://freenet.sourceforge.net/