Tuesday, March 29, 2011

Personal server

For personal use I need a small sized server that is reachable over the Internet. I use it to manage my personal emails, queue long downloads, serve access services (ftp, http, ...).

My requirements:
  • Reachability: it should be reachable from almost every location, even behind proxies and firewalls
  • Stability: it should be as stable as possible
  • Usability: it should be able to run any software that I like, without restrictions
  • Security: pretty clear I guess
  • Manageability: it should be manageable via different interfaces, offer possibilities to easy backup data etc
  • Pay-ability: at the lowest cost possible

Bandwidth is not a tight requirement, since it will only be serving, well just me. I might put some pages on the web server, but even then images of heavy content will be placed somewhere on the Internet. Also, high availability is also not important, since it will only be for me (as long as there is enough stability). In the end I decided to install a server at home. Doing this I can also use it as a NAS to serve content, store backups and the likes via the home network. My standard ISP contract does not include fixed IP addresses. But, they do allow each port to be accessible from the Internet, so thats good (there are ISPs that block ports < 1024 without having an option to open them). The bandwidth is not very high, but that is ok for me. In physical measured speeds, it is about 7Mbps down and 418 Kbps up. So with that in mind, I arranged the following:
  1. Registered a .be TLD by a provider that has domain management
  2. Put an old laptop, aka 'server', somewhere on my desk (who needs an UPS, if you have a laptop)
  3. Register for a free domain alias service
  4. Configure laptop with OS and other required software
  5. Configure home router to make server accessible over the Internet
The only extra cost for me is about 15euro/year for the TLD (domain+management), that's it.

Ok, so how did I wire this up:

First, the domain is a normal '.be' domain registered via a provider. The provider (aka registrar) registers the domain with the instance controlling the .be TLD zone's. They supply their own name servers as NS record, so my domain will be resolved by my provider name servers. Next, the provider gives me access to control my domain controlled by their name servers. This access is pretty elaborate, its a web interface right on top of the named zone configuration. So I have full control and can actually configure it as it was running on my own name server, sweet. Since I do not have a fixed IP address, I'm not able to point a host name in my domain directly to my home server. I use a free dns alias service (for example) that is able to update a host name entry directly when my IP address changes. Before declaring me foobar, I'll give some clarifications at this point:

  • The dnsalias is also a name provider, just as my .be domainname provider. For free, they only give you a host within their own domain (like x.dnsalias.com, y.dynip.org, ...) and limited manageability. If you pay them, you can have both though, they register your domain and you can use their dns services to update it as you want. However, they do not support .be domains, since they are not an official .be registrar.
  • I could have dropped the '.be' domain and directly used the dnsalias. Now I have two name resolv's, a host on my TLD resolves to the dnsalias which then resolves to my home IP address. By directly using the dnsalias I save one resolve. But, I like to have my own .be domain. Furthermore, the dnsalias name is not as officially mine as a .be domain is. Even not when I pay for their services. (Now I can always decide to get a fixed IP address and point directly to that from my .be domain)
  • I could also have dropped the dnsalias and update my IP directly with my domain service provider. But they don't offer an interface to do that on an automated fashion.
It took me 5 minutes to go through the free registration for a dnsalias, thats it. The fact that there are two hosts to be resolved are certainly not noticeable for normal usage. To map a host in my domain to the dnsalias, I cannot use a normal DNS 'A' record mapping that maps a host name to an ip address (or the other way around for reverse zones), since it is not possible to use a host name instead of an IP address in an A record. So luckily there exists something as a 'CNAME' record, which allows a host name in your domain to be linked with another host name. So on my domain provider I have something like this setup in my zone;
host.domain.be. IN CNAME host.dnsalias.com
You can also use wildcards, so '*.domain.be. IN CNAME host.dnsalias.com' would resolve everything in *.domain.be to host.dnsalias.com.The dnsalias service has a normal A record mapping to my home ip address wit a very low TTL. If I do a zone scan, it looks like this:
host.dnsalias.com. 60 IN A w.x.y.z
This means that if the information is propagated, it will be only be cached for 60 seconds on the intermediaries. Not what you typically want for a busy site. My server will run a small client that updates my IP address with the dns alias service each time it changes. It does that by using an external IP check service, that returns the Internet visible ip address. When that is changed over the last time, it sends out an update to the dnsalias service (+ my account information) with the new IP address. In the above zone snippet, the address w.x.y.z will then be updated.

As operating system I choose Ubuntu (10.04.2 LTS 32bit). Its rock solid, supports all hardware on the server, has a great user support base and its pretty secure out of the box. However, I also like windows for its office and outlook for my emails. And some other windows only programs that just work better under windows. The laptop was originally installed with windows, so it had a windows XP cdkey. Also, windows XP also has nice native remote desktop support. The RDP protocol shares the clipboard, sounds, it reverse maps your hard drive over the same protocol (so on the VM you see a share of the client's hard drive from which you are connecting) and so on. Some will argue that a remote X server is maybe better, might be true, but an RDP client is available on any windows client. On ubuntu its also available by default. On other linux dists is probably a simple download. Setting up a remote X on a windows client will be more work/require more privileges I think. To resume, these are the steps I did to configure my ubuntu:

  1. Install VMware server, v2.0.2-203138, used bridge Ethernet connection for VM
  2. Install windows XP on VM, necessary software, and enabled remote desktop
  3. Install ddclient for automatic dns update
  4. Configured my physical wiress ethernet connection to autostart without logging in
  5. Install sshd. Adjusted sshd config file to listen on two ports (22 and 443) and allow it to foward (more on that later)
VMware server: On previous installs of ubuntu (and VMWare server) I never had problems. However this time the installation failed. Thanks to the community I was able to pickup a patch for that: radu.cotescu.com. After that VMware server installed without any problems.

DDClient:

sudo apt-get install ddclient
sudo nano /etc/ddclient.conf

# Configuration file for ddclient generated by debconf
#
# /etc/ddclient.conf

protocol=dyndns2
use=web, web=checkip.dyndns.com, web-skip='IP Address'
server=members.dyndns.org
login=<your login>
password='<your password>'
yourhost.dnsalias.com
I followed this guide: ddns ubuntu

Wireless Ethernet connection to autostart:The network manager is only started once you logon in X. So I needed something to connect the server to the wireless network the moment ubuntu was booted. I followed this guide. Basically it came to:

sudo gedit /etc/network/interfaces 

auto lo
iface lo inet loopback
auto wlan0
iface wlan0 inet static
address 192.168.0.2
gateway 192.168.0.1
dns-nameservers 195.238.2.22, 195.238.2.21
netmask 255.255.255.0
wpa-driver wext
wpa-conf managed
wpa-ssid Gateway
wpa-ap-scan 2
wpa-proto RSN
wpa-pairwise CCMP
wpa-group TKIP
wpa-key-mgmt WPA-PSK
wpa-psk <the key>
The wpa-psk key is generated by this command:
wpa_passphrase <your_essid> <your_ascii_key>
With the command:
iwlist scan 
You should be able to find out the information you need from the AP you want to connect to. After that a network restart enabled my wireless on startup.

SSHD:

sudo apt-get install sshd
vi /etc/ssh/sshd_config
And add this:
# What ports, IPs and protocols we listen for
Port 22
Port 443
GatewayPorts clientspecified
The last option will allow the client to specify target ports to which ssh should forward packets to (by default the sshd can only forward to the host it is running on. To forward to other targets, you need the 'GatewayPorts' as mentioned above).

Ok! The only thing left was the accessibility. I could port map the RDP port from the XP VM directly to the Internet via the router. I could say that 3389 should be port forwarded to 192.168.0.3. However, I only wanted to open one port (besides HTTP) to the Internet, and preferably I want to shield the windows VM completely from the Internet. Also, as I want to access my services from everywhere, sometimes places just give you Internet access via an http proxy. From those places I would not be able to connect directly to the RDP service. To solve this, I tunnel everything I need over SSH. My server exposes only 3 ports:

  • 22 (SSH)
  • 80 (http)
  • 443 (SSH)
The router is configured to port forward TCP 22,80 and 443 to the host OS on 192.168.0.2. Instead of running an HTTP SSL acceptor on 443 I configured SSHD to listen on two ports simultaneously; 443 and 22, so its not a typo. When I'm connecting from a remote location (me, being the client) I just need putty. Putty can be configured to connect directly (using port 22) and make a forward tunnel. Doing this I can choose a local port on the client that maps to a port on the target. Even better, I can map it to a port on any target, local network, or even back to the Internet. So, to connect to RDP, I need a tunnel mapping from
<any local port> : 192.168.0.3 : 3389
Remember: the Internet does not know (route) 192.168.0.3. But this address is valid in the 'tunnel' that putty sets up. Putty first establishes a connection to ssh.mydomain.be and over that connection it is making a connection to 192.168.0.3 , so requests to 192.168.0.3 are send to the sshd, which then delivers them to the local network. The total pictures looks like:
  • I let putty connect to ssh.mydomain.be at port 22 (or 443, see below)
  • Putty creates a forward tunnel from localhost:xxxx (over 192.168.0.2) to 192.168.0.3:3389 via the tunnel (xxxx is 4000 in the screenshot below).
In putty that looks like this:


On the client computer from which I'm connecting from, I point my RDP client to localhost:4000 and the connection is established. The nice thing about all of this, is that there is an option in putty to tunnel my tunnel over an http proxy. So if I'm not allowed to go out on 22 directly, I configure putty to talk to the proxy to send out my packets. If you enable HTTP proxying on putty, putty sends an HTTP CONNECT <targethost:port> to the proxy. The proxy will then tunnel your request further to the target.

However, sometimes proxies disallow to tunnel to a target port as '22(ssh)'. An easy trick is spawning the sshd on a second, SSL/TLS port 443. The HTTP handshake for a SSL/TLS connection is the same as it is for a proxied SSH request (its also tunneled by HTTP CONNECT). So actually the proxy thinks you are going to SSL (because of the port), but you are not. To do this, just let putty connect to port 443 instead of 22 (as shown in the first image above). Next you have to tell putty that it should use a proxy:

Before declaring me foobar (again), I'll give some clarifications at this point:

  • What I'm doing here is building a "poor man's" VPN to a certain extend. However, VPNs work with different protocols, requires a VPN client (and a VPN server). It also requires the network infrastructure to 'allow' to setup a VPN. So suppose you want to connect your (own) office with your home network, a real VPN would definitely be better and more scalable. However, this setup needs to work as lightweight as possible on any type of client (possibly not managed by me) and preferably on any type of network.
  • A really tightend up proxy will disover fast (even if you do it over 443) that you are not SSL/TLS-ing. In that case I could still add an extra module which will tunnel my ssh tunnel in an ssl tunnel tunneled over the proxy. By doing that the proxy is not able to distinguish your session from a SSL/TLS that for example has been started by a clients browser. Since I did not yet meet such tightned proxies, and this setup would require some additional client software as well, I will leave it like this until its really necessary some day.

Saturday, March 26, 2011

java.text.Collator

Some time ago I got introduced to a part of the Java text API that was unexplored territory for me: the Collator.

Languages imply more complexity then one on first sight would think (check this if you have any doubt).
The main usage of the Collator is to help us with a part of that linguistic complexity, more specifically locale sensitive comparison. It implements the collation specification defined by the Unicode

The Java Collator roughly does these things:

  • Canonicalization of canonical equivalent characters
  • Multi level comparison

Comparing Java based Strings works by comparing their Unicode code point that maps with the character. This would mean that the position of the character in the Unicode code charts specifies the sorting weight, but that is not the case. Languages might have different sorting weights for the exact same characters.

For example, if you don't know anything about the German language, you might expect that ß (\u00DF) is sorted as it was a 'b' or 'B'. That is not correct, since its actually represents the combination 'ss'. But even knowing this, a standard comparison with ß would yield false results since its code point is higher then a normal 's'. So in the end it will not be sorted as an 'ss' but it will be sorted as it was 'higher' then 'z'.

Multi level comparison solves this by offering 4 comparison levels: base letters, accents, case, punctuations. If the first level is used, only base character differences are considered. With the second level base characters as well as accents are considered significant, etc

Note: the Collator apparently does not support punctuation.

System.out.println("a equals b -> " + (collator.compare("a", "b")==0 ? "true":"false"));
System.out.println("a equals à -> " + (collator.compare("a", "à")==0 ? "true":"false"));
System.out.println("A equals a -> " + (collator.compare("a", "A")==0 ? "true":"false"));


With collator.setStrength(Collator.PRIMARY):
a equals b -> false
a equals à -> true
A equals a -> true

With collator.setStrength(Collator.SECONDARY);
a equals b -> false
a equals à -> false
A equals a -> true

With collator.setStrength(Collator.TERTIARY);
a equals b -> false
a equals à -> false
A equals a -> false

Our first use case for which we used the Collator was for the first function; canonicalization. Unicode foresees different ways of representing certain characters. For example; ü is identified by a single code point \u00FC and thus a single character. However, it is also possible to form ü with a character + diacritical mark(°): u (\u0075) and ¨ (\u0308).

It makes sense if you think about it, on a classic typewriter you would also form ü by first printing u, go back one position and then print ¨. The ¨ is a so called invisible character on the type writer. On our keyboard we can do the same. You can press the button marked ü or you can: <altgr> + <¨> + <u> which gives you ü.

The end result (on your screen at least) is the same disregarding how you form ü, however, it is stored differently. The single character ü would be saved as: 0x00FC. If you would have formed ü by typing ¨ followed by u, it would be saved as: 0x0075 0x0308

As long as you just want print those characters, in a browser, console, text editor, ... you can (hopefully) rely on that software to display it right. However, if you are writing Java and want to do operations with character streams containing such characters it becomes tricky.

Example (°°): lets take a typical German word such as "abgaskrümmerdichtung":
String single = "abgaskr\u00FCmmerdichtung";
String combined = "abgaskr\u0075\u0308mmerdichtung";

System.out.println("Single equals combined? " + single.equals(combined));
System.out.println("Single: " + single);
System.out.println("Combined: " + combined);

The first line will say that they are not equal: Single equals combined? false
However, when both are displayed, they look exactly the same:

Single: abgaskrümmerdichtung
Combined: abgaskrümmerdichtung

Our software needs comparison on a higher level rather then pure code point comparison. It needs to be canonicalized first, by something that knows that \u0075\u0308 is in fact \u00FC. Collator to the rescue:

collator.setDecomposition(Collator.CANONICAL_DECOMPOSITION);
String single = "abgaskr\u00FCmmerdichtung";
String combined = "abgaskr\u0075\u0308mmerdichtung";

System.out.println("Single equals combined? " + (collator.compare(single, combined) == 0 ? "true": "false"));

This will print; Single equals combined? true

A second use-case where the Collator came in handy: we were in need to map characters from ISO8859-1 to 7bit ASCII. ISO8859-1 contains several accented characters that do not exist in 7bit ASCII. Our goal is to map these characters to their canonical equivalent that is supported in 7bit ASCII. For example: "çéàëê" could be mapped to "ceaee". Of course, other characters for which no obvious equivalence exist cannot me mapped (and will be converted as '?')

Remember: Java uses Unicode and UTF16 as encoding. ASCII and the ISO8859 family are both character maps and encodings in one. Unicode and ISO8859-1 share the same code points for the first 256 glyphs. They are also compatible on encoding level: if you take the letter 'A' and save it (codepoint + encoding = ISO8859) and you decode it as Unicode/UTF8, it will still print 'A'. (this is not the case with UTF16/32). ASCII only shares the first 128 glyphs (extended, 8bit, ASCII is not compatible with Unicode or ISO8859-1).
String ascii7Characters = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

Collator ascii7Collator = Collator.getInstance(new Locale("nl", "BE"));
ascii7Collator.setStrength(Collator.PRIMARY);

Map<CollationKey, Character> ascii7CollationMappings = new HashMap<CollationKey, Character>();

for (char c : ascii7Characters.toCharArray()) {
   ascii7CollationMappings.put(ascii7Collator.getCollationKey(String.valueOf(c)), c);
}

String accented = "çéàëê";
StringBuilder ascii7 = new StringBuilder();

for (Character character : accented.toCharArray()) {
   Character canonicalizedCharacter = ascii7CollationMappings.get(ascii7Collator.getCollationKey(String.valueOf(character)));
   ascii7.append(Character.isUpperCase(character) ? Character.toUpperCase(canonicalizedCharacter): canonicalizedCharacter);
}

System.out.println("ISO8859-1 converted to 7bit ASCII:" + ascii7.toString());
This will print: ISO8859-1 converted to 7bit ASCII:ceaee

The String created on line 12 is of course Unicode (and encoded in UTF16, but not relevant now).
But since these characters are in the range which is equal between ISO8859-1 and Unicode, it actually does not matter. In real life we would be reading in a byte stream which is explicitly decoded as ISO8859-1:
String accented = new String(inputInIso8859_1, "ISO8859-1");

What we did here is use the collator its CollationKey and bind it to our normalized 7bit ASCII character. These key are the canonicalized form of the character, depending on the strength and decomposition values you configured the Collator with. Characters that are canonical equal will also have the same CollationKey. You can use the CollactionKey for linguistically correct sorting/searching, since it will yield the correct order based upon the Locale you initialized the Collator with (it implements Comparable).

(°)Diacritical marks are special glyphs, in that way that they are combined with other glyphs and say something about the intonation. For example: <`> can be considered a diacritical mark.

(°°)The so called code points named in this text refer to glyphs in the Unicode map. They are shown in Java escaped hexadecimal form notation, so \u + 16bit hex. The encoded form is UTF16, with the given examples this means that it is 16bit per character and the code point matches with the encoded form (since the code points in the examples are between \u0000...\uD7FF and \uE000...\uFFFF)