StartSSL in Java

Yesterday I had a moment to finally try NetBeans 8.0 against our existing Subversion managed code which I had migrated to an https location with a StartSSL certificate. The web browser and, in my hazy memory of the past, TortoiseSVN clients had had no issue with the new location so I was surprised to run into this error message:

Error validating server certificate for 'https://mysvnrepo.tld:443':
 - The certificate is not issued by a trusted authority. Use the fingerprint to validate the certificate manually!
 I didn't try accepting because that made me think I had something configured incorrectly. My NetBeans 7.3 install was working fine, but it was limping along in CLI mode for Subversion since it isn't updated for the latest Subversion client version to work with my updated working directories.

Some searching around the NetBeans forums lead me to some suggestions for debugging the issue using -Djavax.net.debug=ssl so I whipped up a test application that uses the Java URL class to GET the content of https pages. Accessing sites using GoDaddy certificates worked just fine, but the ones using StartSSL certificates was a no-go.
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
 StartSSL is reasonably new to the CA arena when compared to the likes of Verisign and Thawte, having operated a Certification Authority (CA) since 2005. Their model of operation is appealing, especially to the budget conscious, because you pay for verifications of individuals and organizations, not for issuing of certificates. When you compare their $59.90 fee for a StartSSL Verified status to get access to as many of their class 2/3 certificates as you need to places that charge you $150+/year for each web site certificate and $200+/year for each type of Microsoft Authenticode, Java, and Adobe AIR certificate you start to see why Thawte was worth so much to Verisign who was in turn purchased by Symantec. StartSSL is supported by Microsoft Windows, the major browsers and on mobile devices by having their root certificate included in those browsers and devices.

It isn't included in the Java cacerts file.

To get a CA root certificate added to the cacerts file, a CA is suppose to apply to the Oracle Java Root Certificate program. The startcom / startssl user Admin indicated in 2011 that they have done this with no success. Users have also tried via bug reports to get the certificate included and were rejected saying it must be the CA and not the users that drive CA inclusion.

I think someone has their head on backwards and is facing the wrong crowd. If the developers, the users of your language, are interested in having a CA added then the way they went about requesting it was exactly right. If a bunch of my users or potential users say they will stay with me or start using my product if I add support for API X, there is a strong incentive for me to contact the makers of API X and not wait for them to contact me. I suspect that in the beginning Sun didn't wait for Verisign and Thawte to come ask to be included in their cacerts file. Even Thawte's dead Personal Freemail CA is still in the list and Thawte says it can be dropped in 2011.

Curious what certs are in your Java install's cacerts file? Have keytool tell you. The trick is to tell keytool to list, verbosely, the cacerts keystore who's password by default is changeit. That's a lot of output so you may want to filter it to just the Owner lines.

*nix Shell:
echo 'changeit' | keytool -list -v -keystore $(find $JAVA_HOME -name cacerts) | grep 'Owner:'

Windows Power Shell:
PS C:\Program Files (x86)\Java\jre1.8.0_25\bin> .\keytool -list -v -keystore ..\lib\security\cacerts | select-string -pattern "Owner:"
Enter keystore password:  changeit

What to do?

The real bear is in desktop Java. For our server systems I can add StartSSL to the certificates. For my Subversion issue I can add an exception and if it doesn't stick I can add the StartSSL root to my desktop's cacerts file. It is not very reasonable for anyone but Oracle to add the StartSSL CA root to every end user's cacerts file which puts a damper on using StartSSL to sign Java Web Start applications or applets or access StartSSL signed web sites.

To avoid the Java security code signing restrictions I could switch away from Java Web Start to shipping "executable JAR files", but still I have issues accessing https servers using StartSSL signed certificates even though these sites work fine from the browser and C++/MFC code on windows using the CHttpFile class. I would still need to either add the StartSSL CA root to the cacerts file, disable certificate checking, or avoid the Java URL class and use something else like the Apache Commons HttpClient to dynamically insert trust for StartSSL. Blah.

I could go all-out. Include java.dll and friends and replace javaw(.exe) with MyApp(.exe) that points to my own cacerts file, uses my own icon, and tries harder to act like a native application. No waiting on Oracle. No trying to dynamically modify how certificates are checked. Keep using StartSSL. This would be at the cost of giving up everything Web start and others are doing for me.

Of course I could also stop using StartSSL and switch back to one of the authorities who's root CA key is in Oracle's Java cacerts file, but I wanted to expand the use of HTTPS, not run it at minimal levels. Or I could switch away from Java.

If you care about this issue, maybe we can use social media to raise our voices instead of getting shut down at a bug report for "open"jdk or swept behind the scenes of a CA only apply here forum. Let's see if we can get #startsslinjava trending. Share. Like. Blog. Pass along.


Yet Another Annoying Password Requirements List

Yesterday I tried creating an account on a hosting provider and my first line go-to program for creating passwords failed to meet their rules:

Password criteria:
  • must be 8-14 characters long
  • starts with a letter
  • include a lower case letter
  • include an upper case letter
  • include a number
  • include a special character (!,@,#,%)
  • does not contain the username
  • not include other special characters
Through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember, but easy for computers to guess. – xkcd.com/936 Password Strength

My kingdom for the ability to use a passphrase like correct horse battery staple without these silly and seemingly arbitrary extra rules beyond length.


Non-global learnyounode without much typing

For whatever reason, when you dive into node.js you come across lots of code that tells you to install command-line javascript programs "globally" into /usr/local. Lots of examples say to do this using the sudo command, eg `sudo npm install -g learnyounode` and others say they get messed up doing that so they suggest just changing the ownership of /usr/local to be you.... I get the feeling that most node.js creators and users are working off of Macbooks or something and have a very single-user view of their computer and perhaps play a little loose with security.

This was a big hurdle for me to get over when I first started playing with Node.js on my Ubuntu desktops and Debian servers. It was like the Ruby version thing all over again. Times a thousand. I didn't really want to commit to one-off programs like learnyounode to be stuck in my /usr/local forever. I thought the thing to do was to use node's prefix option but even then I wasn't sure I wanted the prefix/bin files to be in my $PATH all the time.

Fortunately once I learned a bit and got the search terms right I found that others were also trying to solve this dilemma. One of the solutions I liked was using npm run to run scripts in node_modules/.bin. It let me use those binaries locally when I was in the package's folder without committing to them any other time. This appeals to me more than any of the $PATH modifying ones. So, to use nodeschool.io's javascripting or learnyounode interactive modules it was as simple as this:

mkdir -p node/learn
cd node/learn
npm install javascripting
npm install learnyounode
edit package.json
>>> in packages.json
"scripts": {
    "test": "echo \"Error: no test specified\" && exit 1",
    "learnyounode": "learnyounode"
$ npm run learnyounode
I found it tedious after a while to type such a long command. Especially when adding program arguments. So I used a simple alias for that shell instance:

$ alias learnyounode='npm run learnyounode'
$ alias lyn='npm run learnyounode'
You can do one or the other or whatever you like. I decided that even learnyounode was annoying to keep typing so I used lyn.

I really like this solution for working with these interactive programs. I will see what challenges arise as I get more advanced in my node.js and npm usage. I can foresee wanting a "user local" install but still wanting to slip in and out of it. Maybe using a chroot or something.

One thing this method doesn't do is let these package installed binaries like learnyounode and javascripting work from any directory so their directions of "make and change into a new directory" don't work. Instead the "learn" directory with node_modules is where I create all my practice programs.


LXC multiple personality disorder

I have a couple of server systems running as Linux Containers (LCX) as a test since Debian 6.0 (squeeze). The host system has been upgraded to Debian 7.0 (wheezy) lxc version: 0.8.0-rc1 and things generally work fine but I rebooted the other day and the containers fell to pieces.

After manually stopping the containers (or so I thought) and starting them up again one of the containers was fine. The other, not so much. Connecting to it remotely with the PuTTY ssh client would fail either immediately with "Network error: Connection reset by peer" or it would work for some seemingly random amount of time, a second to minutes, before another error would appear "Network error: Software caused connection abort"

Scouring the web I found lots of suggestions saying it was missing config files or keys in the instance's /etc/ssh/ directory, but I knew this was not the case. The files were there and the connection worked, sometimes. Plus I did some tests running netcat (nc) as a client and a server and those connections also failed either after a while or sometimes right away. Sometimes when connecting to the server instance I had just started I would be told the connection was refused.

I started to believe that I had another server running in my network that claimed the same IP address and server name on login. This belief moved to some kind of server multiple personality disorder when I saw that my tmux session sometimes existed and sometimes didn't on login even though the file I had created in the "tmux exists" connection was there in the "no tmux" connection.

I popped onto the lxc irc channel on freenode for some advice. A fellow user, wam, ran me through some tests. I wasn't running out of memory. My configuration was very similar to the working container. No firewalls were blocking stuff on this internal private network. He suggested that I use tshark to track down the possible RSET, so I went (t)shark fishing:

  4.999284 3com_c0:25:71 -> 46:4d:07:7e:87:9c ARP 60 Who has  Tell
  4.999327 46:4d:07:7e:87:9c -> 3com_c0:25:71 ARP 42 is at 46:4d:07:7e:87:9c
  5.007975 fa:11:43:eb:f6:eb -> 3com_c0:25:71 ARP 42 Who has  Tell (duplicate use of detected!)
  5.008086 3com_c0:25:71 -> fa:11:43:eb:f6:eb ARP 60 is at 00:50:da:c0:25:71 (duplicate use of detected!)

Duplicate use of detected with different mac addresses? I thought I had just ruled out multiple servers.

DeHackEd on the LXC IRC channel, #lxcontaines, suggested checking brctl showmacs which I filtered further using other information he shared:

brctl showmacs br0 | grep -v '  1'
port no mac addr                is local?       ageing timer
  4     46:4d:07:7e:87:9c       no                70.08
  2     fa:11:43:eb:f6:eb       no                22.85
  2     fe:68:85:a8:dc:6e       yes                0.00
  3     fe:b0:fc:d7:2e:9f       yes                0.00

  4     fe:dd:ac:a6:ac:f4       yes                0.00

Both of the systems claiming are running on the LXC host. Odd. Using lxc-ls and lxc-list shows only the working container and the broken one. Not three. Another person, devonblzx suggested that I just specify the hwaddr in the lxc config file. I, in fact, had done this once upon a time and I had long since commented it out. I don't remember why. Maybe it wasn't the unique lxc.network.hwaddr but the non-unique lxc.network.name that was tripping me up. After what DeHackEd and devonblzx pointed out in /sys/class/net/$bridgename/brif/ and /sys/devices/virtual/net/$bridgename/ I bet it was the .name value. I should try it again. The question that nagged me was, how had I launched duplicate instances and would setting hwaddr protect me? DeHackEd said I'm not suppose to be able to launch multiple instances, at least not by the same user, due to control channels that use the names that would conflict. I thought it was maybe a bug in lxc 0.8.0 so I shared my ps output that showed there were indeed three instances with two pointing to the same config file:

root 9287 0.0 0.0 20920 664 ? Ss Feb13 0:00 lxc-start -n lxc -f /etc/lxc/auto/brokencontainer.conf -d
root 18816 0.0 0.0 20920 668 ? Ss Feb13 0:00 lxc-start -d -n brokencontainer

DeHackEd promptly said "no name conflict..." and it took me a minute to spot it. One had been started with the name lxc. I asked why lxc-ls or lxc-list don't show it but no one volunteered that answer so I dove into the start-up process to figure out why it started with -n lxc.

My configuration was from Debian 6.0 worked like this. The /etc/default/lxc file specified both that I wanted it to run from init and listed the CONTAINERS I wanted to start. The init script didn't care anymore about the CONTAINERS variable, instead it looked at /etc/lxc/auto/* and tried deriving the names from them. My /etc/lxc directory looks like this:

/etc/lxc/auto/workingcontainer.conf -> /etc/lxc/workingcontainer.conf
/etc/lxc/auto/brokencontainer.conf -> /etc/lxc/brokencontainer.conf

This is not quite how the README.Debian file suggests things to be:

LXC container can be automatically started on boot. In order to enable this, the LXC init script has to be enabled in /etc/default/lxc and and container that should be automatically started needs its configuration file symlinked (or copied) into the /etc/lxc/auto directory.
Note that the name in /etc/lxc/auto needs to be the container name, e.g.:
  /etc/lxc/auto/www.example.org -> /var/lib/lxc/www.example.org/config
I joined the #debian channel on  the OFTC IRC network to get some advice and figure out if my init.d/lxc script or something was messed up and peter1138 helped straighten me out. He said he had a similar setup when he first upgraded from squeeze but when he created a new container in wheezy he saw the config files were in /var/lib/lxc/containername/config and that the /etc/lxc/auto/containername pointed there.

This made it so that the init script works by extracting containername from the folder holding config. There is nothing in that process that cares what the file in /etc/lxc/auto/* is named, it just better be a symbolic link to a file in a directory who's name is the container name you want. I complained about config files in /var, broken upgrades, and a seemingly misleading emphasis on the auto/ name and how autostart works and was given the bug! challenge.

I think it would be even better if it just read the lxc.utsname from the file as peter1138 suggested, then it could be a symlink or copy to any file without needing some specific directory layout. I said it didn't seem that the name in auto needed to be the container name at all and peter1138 agreed that for autostart to auto start this was true, but if the name was the container name then lxc-list would tag the container in the listing as autostart.

I hope this is helpful to someone else facing similar sounding issues even if that someone else turns out to be a future me.


IRC SSL Client Certs

ChatZilla supports using SSL connections and auto-identifying with SSL Client Certificates on the OFTC and freenode IRC networks using CAcert WoT User and StartSSL free email verified certificates. You may have trouble using StartSSL verified user certificates. Tested using ChatZilla in Firefox 35.0.1.


How do I handle fstab mounts under run in Debian Wheezy?

A release goal for Debian 7.0 ("wheezy") was to introduce a new top level directory, /run, and relocate system state information that does not need to persist through a reboot but that may need to be written early or otherwise when the root filesystem is read only. Other distributions are also introducing /run and a proposal has been submitted to include it in the Filesystem Hiearchy Standard (FHS).

This is all fine and well, but it has tripped up automated mounting of /etc/fstab entries under /run (formerly /var/run).

The proposed update to debian-policy says this:
Files and directories residing in '/run' should be stored on a temporary filesystem and not be persistent across a reboot, and hence the presence of files or directories in any of these directories is not guaranteed and 'init.d' scripts must handle this correctly. This will typically amount to creating any required subdirectories dynamically when the 'init.d' script is run, rather than including them in the package and relying on 'dpkg' to create them.
Can I then conclude that /etc/init.d/mountall.sh is not handling /etc/fstab correctly with regard to mounts under /run or that there should be another init.d script to handle the /etc/fstab mounts under /run correctly or did the writers expect that fstab mounts under /run are invalid and all actions under it should be done programmatically by the individual services and generally be fixed-up by their init.d scripts?


Exploring StartSSL - Automated Registration Email

Reading about the decision to no longer include CACert.org in the Debian ca-certificates package (Debian bug 718434, LWN: Debian and CAcert) I was introduced to StartCom's free certificate offering. As I investigated their site I was both intrigued by the free offering and the Web-of-Trust program idea, and put off by the lack of clear or sometimes conflicting information.

For the impatient, the TL;DR version is this:

  1. Sign up first for a free (class 1) certificate by clicking Sign-up For Free in the top left of the site. Everything else is confusing.
  2. Use an email address that doesn't do grey listing, spam filtering, or anything, and that you have access to the logs on (is this service only for "techies"?)
    1. If you do have grey listing or spam filtering that blocks the web page test so they give you big red text telling you you're all wrong, disable it or at least allow from the names and IP addresses in their SPF record. (yes, I guess this service is only for "techies.")
  3. If the form submits without telling you your mail server is wrong but you don't get an email pretty quick, log out (top right corner icon) and try registering again.

If you'd like to learn more of the details or share my pain, read on:

All paths seemed to lead to getting a certificate so I settled on starting with the StartSSL Free (Class 1) certificate since I wasn't sure exactly what the requirements were to get the StartSSL Verified (Class 2) one. After deciding that "Sign Up" and "Express Lane" are the same thing, and seeing that I must fill out the form as an individual, I entered my personal (gmail) address.

This took me to a page asking for me to check my email right away and copy/paste in the code they sent me. Now Gmail is usually very fast about showing new emails, but nothing was there. Not in Important and unread. Not in Everything else, and not even in the Spam folder. Not several minutes later. The page was very insistent that I do not leave or reload it so in a new tab I started searching for answers.

The first answer I came across can be summarized thus "it must be your problem" with no additional suggestions. I have come to identify this as a common communication style from StartCom:
Important! Experience has shown that the failure of email messages not arriving are always the fault of the receiving end. If the wizard confirms to having sent the message, i.e. no error occurred, than the message has been delivered and accepted by your mail server!
 Surely they've had Gmail users do this process before. So strange that it wouldn't work. After all, I wasn't using one of their blacklisted email providers listed on their enrollment page. I decided to try again from a different browser using my work email address, the one that I manage and have access to the server logs on. This is what I learned.

When you click Continue on the enrollment page your server will get hit from one site. In my case it was []. If you have gray listing in place (the work server does) and it sends back an error like 450, the web page immediately tells you it couldn't deliver the email. It does mention that the problem could be grey listing among other things, and basically says it's your fault. So you try to open up your grey listing to allow startcom.org through, but that doesn't seem to be enough because for some reason the client name comes through as unknown. (Edit: I had recently upgraded our mail server and I believe the "unknown" issue was a local configuration issue.)

So you add their IP address and then the web page thinks that all is well and sends you to the "wait for it" code confirmation page, but still no email. Why? Probably because the web page just does a test connection. Right after it sends you to the next page another server, [] in my case, connects (also with client_name=unknown) and gets Greylisted. So I sit here waiting, hoping for a retry, feeling stuck with no help. Back to searching in a new tab.

The second answer I came across also says "it must be your problem" :(
The program always sends the verification code! Do not blame us, if it does not arrive....we do not have control over your mail server and mail account!
 Third time's the charm? Good thing I have three browsers installed. So I checked the SPF (TXT) record for startcom.org and added all of the names and IP addresses listed into my server's client whitelist for greylisting and tried again from the third browser using the work email address. Success! The email made it to my inbox.

I didn't really want to do the certificate in the third-choice browser, so I went to the second browser and pasted the code there. It failed to verify but the failure message told me something I would have loved to have known long before. I didn't copy the exact message, sorry, but basically it said "if it fails, log out and try to sign in again". A "resend this request" button would have been better, but at least now I know that I don't have to stand like a deer in the headlights on the "wait for it" page when things fail.

Now I just have to wait 6 hours for the account to be reviewed, probably because I tried so many times.

Good luck. I may end up dabbling with CACert, Comodo, or retreating to my own self-signed certificates again.