Thursday, November 19, 2009

Setting environment variables in Java

Have you ever noticed that the Java System class doesn't let you set environment variables? You can retrieve them using getenv() but there is no equivalent setenv() function.

First off, what is the environment? The manual entry for environ(7) describes the environment as:

an array of strings [that] have the form name=value. Common examples are USER (the name of the logged-in user), HOME (A user's login directory) and PATH (the sequence of directory prefixes that many programs use to search for a file known by an incomplete pathname).

It turns out that when you start up the JVM, it copies this environment into its own Map of Strings. The actual container it uses is an unmodifiable map, probably to be extra safe.

So in a running Java application we have 2 environments: the JVM copy that you can read via System.getenv() and the underlying environment that lives in the C library.

If we want to change the JVM's copy, we can do so using reflection. Or at least we should be able to, just as long as our code is not running in a sandbox. In that case you'd be right out of luck. Anyway, the code to fetch a modifiable copy of the environment could look like this:

import java.lang.reflect.Field;
import java.util.Map;
import java.util.HashMap;
public class Environment {
@SuppressWarnings("unchecked")
public static Map<String, String> getenv() {
try {
Map<String, String> unomdifiable = System.getenv();
Class<?> cu = unomdifiable.getClass();
Field m = cu.getDeclaredField("m");
m.setAccessible(true);
return (Map<String, String>)m.get(unomdifiable);
}
catch (Exception e) { }
return new HashMap<String, String>();
}
}

Calling System.getenv() returns us an UnmodifiableMap of Strings containing the copy of the C environment. In order to get to its squishy modifiable heart, we access the m member variable which is modifiable. This is because UnmodifiableMap uses the proxy pattern - it implements the Map interface, but delegates all calls for retrieving values off to a member variable Map that contains the values and does all the real work. The set methods throw UnsupportedOperation exceptions. By accessing that m we can change the JVM's copy of the environment from under its very nose. Heh heh.

Not so fast. On Windows this is slightly different. The environment there contains a different Map that allows you to search for variables in a case-insensitive way. System.getenv() returns one Map (the unmodifiable one, same as Linux) whereas System.getenv("value") looks up the value in the case-insensitive Map. In order to create a robust update-env implementation we should update both of these Maps. This needs some more reflection to get the Map in question out of the ProcessEnvironment, probably like this:

@SuppressWarnings("unchecked")
public static Map<String, String> getwinenv() {
try {
Class<?> sc = Class.forName("java.lang.ProcessEnvironment");
Field caseinsensitive =
sc.getDeclaredField("theCaseInsensitiveEnvironment");
caseinsensitive.setAccessible(true);
return (Map<String, String>)caseinsensitive.get(null);
}
catch (Exception e) {
}
return new HashMap<String, String>();
}

An example of how this may be useful without going any further would be if you have the DISPLAY environment variable set but you cannot use it (for example you connect via ssh -X, start a background process, then disconnect closing the X session connection too). In this situation even if you tell java.awt to run headless it may see that DISPLAY is set, try to use it and throw an exception. By clearing DISPLAY we can use headless methods to create off-screen graphics, to send them to a printer or to save a fake screen shot to file maybe for automated testing.

However this is not enough to affect the environment for any child processes. They will still only see the original, unmodified environment that the JVM craftily made a copy of at start-up. The comment in java.lang.ProcessEnvironment says it all:

// We cache the C environment.  This means that subsequent calls
// to putenv/setenv from C will not be visible from Java code.

Grrrr! By far the easiest approach here is to just use the ProcessBuilder. This lets you change the environment before launching the child process. Win. End of post.

No! I really want to change the underlying C environment for no good reason!

In order to change that we have to resort to either JNI or JNA. Lets start with JNA, it is the easier of the two and needs less tools. Just download a jar file and with the Java Compiler you're good to hack.

In case you have not heard of it, JNA is to Java what ctypes is to Python - a byte-code library that lets you open native, compiled, shared libraries and call the C routines within directly from Java code. JNI on the other hand requires you to create C++ functions, compile these, then call them from Java.

Armed with the JNA classes we can wrap the standard setenv() and unsetenv() functions from the C library.

import com.sun.jna.Library;
import com.sun.jna.Native;
public class Environment {
public interface LibC extends Library {
public int setenv(String name, String value, int overwrite);
public int unsetenv(String name);
}
static LibC libc = (LibC) Native.loadLibrary("c", LibC.class);
}

That works fine on Linux, but on Windows we have to take a different approach. There neither setenv nor unsetenv exist, instead we have to call _putenv. This function accepts a "name=value" string, and if we pass "name=" we can delete a variable from the environment. Unfortunately this multi-platform approach messes up the code a fair amount. Here is one way to do it:

// Inside the Environment class...
public interface WinLibC extends Library {
public int _putenv(String name);
}
public interface LinuxLibC extends Library {
public int setenv(String name, String value, int overwrite);
public int unsetenv(String name);
}
static public class POSIX {
static Object libc;
static {
if (System.getProperty("os.name").equals("Linux")) {
libc = Native.loadLibrary("c", LinuxLibC.class);
} else {
libc = Native.loadLibrary("msvcrt", WinLibC.class);
}
}

public int setenv(String name, String value, int overwrite) {
if (libc instanceof LinuxLibC) {
return ((LinuxLibC)libc).setenv(name, value, overwrite);
}
else {
return ((WinLibC)libc)._putenv(name + "=" + value);
}
}

public int unsetenv(String name) {
if (libc instanceof LinuxLibC) {
return ((LinuxLibC)libc).unsetenv(name);
}
else {
return ((WinLibC)libc)._putenv(name + "=");
}
}
}

static POSIX libc = new POSIX();

Here we use JNA to load either libc on Linux or the msvcrt DLL (which contains _putenv) on Windows. There are ugly casts in there, and other OSes are left as an exercise to the reader, but this means that I can call POSIX.setenv() or unsetenv() and have it work.

To complete the picture, the JNI equivalent of this would be:

public class Environment {
public static class LibC {
public native int setenv(String name, String value, int overwrite);
public native int unsetenv(String name);
LibC() {
System.loadLibrary("Environment_LibC");
}
}
static LibC libc = new LibC();
}

The call to System.loadLibrary() loads a dynamic/shared library. On Linux it looks for "libEnvironment_LibC.so" and on Windows "Environment_LibC.dll". The implementation of those native calls could be like this C++ code:

#include "Environment_LibC.h"
#include <stdlib.h>
#ifdef WINDOWS
#include <string>
#endif

struct JavaString
{
JavaString(JNIEnv *env, jstring val):
m_env(env),
m_val(val),
m_ptr(env->GetStringUTFChars(val, 0)) {}

~JavaString() {
m_env->ReleaseStringUTFChars(m_val, m_ptr);
}

operator const char*() const {
return m_ptr;
}

JNIEnv *m_env;
jstring &m_val;
const char *m_ptr;
};

JNIEXPORT jint JNICALL Java_Environment_00024LibC_setenv
(JNIEnv *env, jobject obj, jstring name, jstring value, jint overwrite)
{
JavaString namep(env, name);
JavaString valuep(env, value);
#ifdef WINDOWS
std::string s(namep);
s += "=";
s += valuep;
int res = _putenv(s.c_str());
#else
int res = setenv(namep, valuep, overwrite);
#endif
return res;
}

JNIEXPORT jint JNICALL Java_Environment_00024LibC_unsetenv
(JNIEnv *env, jobject obj, jstring name)
{
JavaString namep(env, name);
#ifdef WINDOWS
std::string s(namep);
s += "=";
int res = _putenv(s.c_str());
#else
int res = unsetenv(namep);
#endif
return res;
}

You generate the header files using javah - that gives you the strange function names needed - and compile the code as C++ to produce a shared library:

javac Environment.java
javah Environment
g++ -shared Environment_LibC.cc -o libEnvironment_LibC.so -I$(JAVA_HOME)/include/linux -I$(JAVA_HOME)/include/

This library needs to be in one of the usual places to work - somewhere where it can be found by dlopen(). So either in /usr/lib, a directory where ldconfig looks, or in one of the paths in LD_LIBRARY_PATH. This depends on the OS you are using. For Windows, Solaris or Mac you need a whole different set of flags and incantations. You can see why I lean towards JNA, even though it has its own problems. For the record this is how to compile the JNI library on Windows using MinGW:

g++ -DWINDOWS -Wl,--kill-at -shared Environment_LibC.cc -o Environment_LibC.dll -I$(JAVA_HOME)/include/win32 -I$(JAVA_HOME)/include

That --kill-at switch is a real gotcha. Without it the function symbol that the MinGW compiler produces is not the one that the JVM was expecting. On Windows the library itself must be in the current directory, or in one of the directories listed in the PATH variable.

As you can see we repeat the whole setenv-and-unsetenv-do-not-exist dance and use _putenv() for both. Here I fudge it with a bit of ifdeffing. Meh.

Now that we have a way to call the C library's setenv() and unsetenv() (or equivalent), let's wrap it all up. Here are the final setenv() and unsetenv() functions that update the C environment and the Java one too:

// inside the Environment class...
public static int unsetenv(String name) {
Map<String, String> map = getenv();
map.remove(name);
Map<String, String> env2 = getwinenv();
env2.remove(name);
return libc.unsetenv(name);
}

public static int setenv(String name, String value, boolean overwrite) {
if (name.lastIndexOf("=") != -1) {
throw new IllegalArgumentException(
"Environment variable cannot contain '='");
}
Map<String, String> map = getenv();
boolean contains = map.containsKey(name);
if (!contains || overwrite) {
map.put(name, value);
Map<String, String> env2 = getwinenv();
env2.put(name, value);
}
return libc.setenv(name, value, overwrite?1:0);
}

Curiously enough ProcessEnvironment is wired so as to validate the values that you add to the "unmodifiable" Map, but the case insensitive equivalent on Windows is not validated. If you try and add an invalid environment variable, such as one with a name that contains =, only the unmodifiable map will throw an IllegalArgumentException. This makes it fairly robust as the nasty name doesn't trickle down to the underlying C environment, but for Windows we have to do an extra check manually.

I've uploaded a tarball with all the files mentioned on here together with the dependencies to my GBA Remakes site. So now you've no excuse to not go off setting environment variables like mad.

Of course you shouldn't really do any of this. This post was 60% "if you really need to", 40% "might be useful". ProcessBuilder is the way to go for changing the environment in child processes.

The only thing that you might want to do is change the environment in a running Java process, but even then it is probably easier to create a wrapper script or batch file that launches your program and fiddle the environment prior to launching the JVM. Using reflection to access all those inner member variables is pretty flaky - if they change their name in some future version, your code will stop working. Happy hacking :-)

Wednesday, November 11, 2009

Optimisations

If you have been reading the changes that I've been pushing out to Bunjalloo lately, you will have noticed that there are quite a few that are aimed at optimising parts of the code. There was a lot of low-hanging fruit in there and the rendering of some 200+ comment threads on reddit was starting to annoy me.

There's an old rule when it comes to optimising your code; don't. 2nd verse, same as the first. The 3rd rule of optimisation is to profile your code first. So that's what I did.

The good old way to see where the bottle necks are is to use gprof. You compile with profiling on by passing a couple of flags to gcc, run the program and it spits out a profile upon exit. You then run the gprof command line tool and that interprets the profile to give you a table of hot spots. From commit 5beaffd581ed it looked something like this:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
10.00 0.02 0.02 103228 0.00 0.00 Font::valueToIndex(unsigned int) const
10.00 0.04 0.02 68648 0.00 0.00 std::_List_iterator<HtmlElement*>::operator--()
7.50 0.06 0.01 701479 0.00 0.00 std::less<unsigned int>::operator()(unsigned int const&, unsigned int const&) const

Right there you can see that Font::valueToIndex was taking a lot of time, but I already knew that that method could be improved. The other C++ STL calls were trickier to track down. I use all sorts of STL containers, so the problem could have been anywhere...

A couple of years ago Google released their performance analysis tools. One of the advantages google-perftools' CPU profiler has over gprof is that the output shows function calls line-by-line in a nifty graphical representation. Running the profiler on the same code gave this output:

thumbnail of the analysis
Here is the original full-size image

From there, I could trace the call graph back from the "slow" STL call to my code see that the use of a std::set was causing problems. It was overkill anyway and a simple change improved things there.

The different output in the perftools analysis is probably due to the way it works compared to gprof. With gprof, you get a call to the profiling library added to each of your program's functions, which means you need to recompile all of your code. With the Google tools you don't need to recompile your code, you just have to link in the profiler library to your executable. It then hooks in to the start of the program and uses timers and signals to instrument the code, adding traces at the current point of execution. The result is that code that takes longer has more "hits", but the result is not exactly the same as gprof's output.

This is why being able to compile NDS code on the PC is a really good idea. These things are not impossible on the DS of course - counting the hblanks your slow function takes by changing the background color is a classic technique - but in general running unit tests, debugging, and doing performance analysis are much tougher to do on an embedded device.

Enough waffle. All this means that the next version of Bunjalloo may be a bit nippier, hopefully.

Friday, November 06, 2009

NDS thumbnail preview in Nautilus

When you insert a cartridge into a Nintendo DS, the start menu shows a little image or logo of the game. Homebrew games for the DS can also add their own logos and these show up when you view them in Moonshell or whatever menu your cartridge happens to use.

Ages ago I wrote a little script to show these images as the preview icon for NDS files in the Gnome file browser Nautilus. The script is about a hundred lines of Python and it's available here.

The setup file in that download tarball associates the icon-extracting command to the MIME-type of the Nintendo DS file. If you use an older distro then you may have to associate the script with the generic application/octet-stream MIME type. The script filters by file name anyway. This association is not done using the usual Nautilus scripting method, where you add a script to ~/.gnome2/nautilus-scripts/, but rather using gconf variables.

If you need to uninstall the icon preview for some reason, you can run:

gconftool-2 -u /desktop/gnome/thumbnailers/application@x-nintendo-ds-rom

Oh, and while we're here I recommend you go and play the absolutely brilliant nostalgia-fest that is Manic Miner in the Lost Levels.

Sunday, August 16, 2009

Game Boy Remakes moves home

Geocities has been around in one form or another since about 1995, but Yahoo has recently announced that the web hosting service will close this October. With this in mind, I've moved all my Game Boy Advance remakes and BBC emulation to a new home on Google Sites.

Do you remember when people said "don't forget to update your bookmarks!" when they changed their website address? Now nobody does that because everyone uses a search engine to find what they need. That means that when Geocities closes, all the search-juice for the old site will be lost. So here I'm sowing a little search seed, which will hopefully grow and let people find these remakes when Yahoo finally pulls the plug. BTW, closing down Geocities seems a really odd thing to do. The limits on bandwidth were tiny (4MB per hour!), the sites are full to the brim with adverts. It must have been costing Yahoo nothing to host all of them.

Anyway the migration was completely manual, but hopefully I haven't missed anything and the new web site should have all of the old content on it.

As I moved all of the content over, I was reading through a lot of the stuff I had up on the old site. I really miss remaking these old games, I'd love to get back into that again one more time. Reading through old assembly code figuring out the algorithms used and rewriting them in C is pretty good fun. And at the end you get a game to play too. I have a few ideas for a remake, but I don't want to promise anything. I know from experience that announcing something that is a work in progress, or even less than that, can suck the life out of the effort.

Monday, July 20, 2009

Migration to Mercurial

Liquid Metal, VeniceImage by ms sdb via Flickr


I've taken the plunge and decided to integrate all of the Bunjalloo code onto the Google Code site. This meant I migrated from the original Github-hosted Git repository to make use of the new Mercurial support that GC added not so long ago.

The main reason I migrated was to get all the issue tracking, wiki and code changes on to one site. I really had 2 choices: migrate the main site to Github and use the issues and wiki on there, or migrate the code to Google Code and use Mercurial. I quite like Github and I think it is amazing that there are so many brilliant free hosting services, but I really prefer the Google Code interface from a user's point of view. It is generally less cluttered.

So what does this change get us? First off, we lose github's famous "social coding" features. I had one fork in 2 years, with 0 additional commits. No great loss there then. You can still email patches to the discussion list of course, and of course with either DVCS the author information is retained. I gain the fact that I can now reference changes a bit easier in bug reports and close tickets straight from commit messages. Assuming I even write any more code or fix any bugs ;-)

I had a couple of pet peeves with Mercurial, mostly that I really missed gitk's features. Luckily I discovered what I assume everyone else must use: Tortoise HG. This provides "hgtk log" on Linux that works a lot better than the "hg view" history viewer, which is a port of a really ancient version of gitk. The most important feature that hgtk has is a refresh button so I don't have to keep killing and restarting the application each time I make some changes.

Some other areas that I saw as weaknesses were to do with Mercurial's history fiddling, or lack thereof. However I decided that I'm probably better off not messing with history too much anyway. Lately I've tended to avoid doing that in Git and I get more done, even if the revision log is not as clean as it could be. I finally understood the way to do this using Mercurial Queues anyway, even if it is a bit more fiddly.

My final missing feature was the "commit -v" feature, which shows you the patch of the commit that you are making without having to open up a separate console. This hasn't been fixed, but I've worked around it by writing a Vim script to do something similar. Pressing "K" shows the diff of the current tree in a new buffer. This actually works out pretty well as I can see a patch and write a comment at the same time, rather than having to jump back to the top as I did with the "commit -v" thing.

To do the actual Git to Mercurial conversion I used the "hg convert" extension that is included with core Mercurial. That worked flawlessly and made switching really easy. The conversion guide on the Google Code support site has detailed steps on what to do when converting from Subversion, but I'll describe a few gotchas that I found with the Git to Hg transition.

The recently released Mercurial 1.3 included a tiny patch that I wrote to generate a slightly nicer log for git conversions, so I used that version. You see Git tracks both the author of a patch and the person who made the actual commit, but Mercurial only tracks the "user". The user is equivalent to Git's "committer" by default, while author information is assumed to be the same dude. When you ran hg convert, it added a line like "committer: A.Committer " to the Mercurial log message for every commit, even if the author and committer were one and the same in the original git repository. It looked a bit silly when 99.9% of the commits were my own. So my patch made sure that the "committer" line only got added if the author of the original change was not the same as the commiter of the change.

Another interesting niggle: Git has 2 different types of tag; lightweight and annotated. Annotated tags can also be gpg signed, but that wasn't the case in my repository. The difference between lightweight and annotated tags is pretty subtle. As far as I understand it, a lightweight tag is simply a reference to a commit ID, while an annotated tag also has its own blob in the Git database.

By default "git tag mytag HEAD" will create a tag of the lightweight variety. This is apparently the Wrong Thing To Do and one of the few places that Git's default behaviour is not the best option. You really should pass the "-a" option to create an annotated tag. Suffice to say that I used the default lightweight type of tags for quite a while until I discovered my mistake. The "hg convert" extension doesn't convert lightweight tags at all, it only converts the chunky annotated kind. This is possibly by design (maybe by misunderstanding?) as it would be easy to fix the convert extension to convert either type of tag.

The easiest workaround for me was to just convert my git lightweight tags to annotated tags in the source git repository using the "--force" option to overwrite the old ones. The convert process picked these up and converted them over correctly. Interestingly enough Justin Williams had posted about a similar problem and his timing was perfect to ask it over on StackOverflow.com.

Now that I've used DVCS a bit more and the novelty of branching has worn off, I decided that I wanted the minimal number of heads in my new Mercurial repository. I also wanted to maintain as much of the released history as possible. Luckily the history was mostly linear. I did create a couple of branches for maintenance releases of Bunjalloo early on, but after about version 0.4 I just made releases from the trunk.

Originally the repository was in Subversion and I pulled in the tag branches with git-svn too. This lead to a few branch stubs with a single commit ("creating tag blah") with a corresponding git tag that I must have created at some point later on. I used the Mercurial Queues extension to trim these out of the history where applicable so that the final repo has just 2 heads - the main trunk and an old, closed maintenance branch from the 0.3 days.

Oh, when you install Mercurial from source on Ubuntu (possibly on any Debian derivative?) it rather inexplicably creates an /etc/mercurial/hgrc file that enables all of the extensions. This lead me to (re)discover a bug with the inotify extension when used in conjunction with Mercurial Queues. My solution was to simply disable the inotify extension (in fact just removing the /etc/mercurial directory and enabling what you need in $HOME/.hgrc is a better idea overall).

Anyway, feel free to check out the code and send me your patches to fix all of those open issues! :-)

Wednesday, July 08, 2009

Bunjalloo 0.7.5

hand_and_bugcolorImage by beneneuman via Flickr

I've just put up yet another new version of Bunjalloo. This one fixes a load of bugs that caused lots of top pages to be broken. In particular you can log in to GMail again. Yay!

Changes in 0.7.5:
  • Improvements to caching - logging in to GMail works again
  • Clicking preference icon goes straight to preferences
  • Fix encoding problems that caused crashes
  • Fixed lots of non-ascii character keyboard bugs
  • Fix configuration changes that use escapable % characters
You may have to manually fix the download path in your configuration settings. This is because the download path could have become messed up and show all path separators as %2F instead of /.

In the next release I want to fix cookies so that you don't have to enter your name and password into all the sites that you log in to. I changed the password on my google account from something like "password" to a good strong one and it's a pain typing it in all the time on the DS ;-)

Monday, July 06, 2009

Creative Zen Mozaic on Ubuntu 9.04 Jaunty

Where Birthday Happens!Image by ♪ Sleeping Sun ♪ via Flickr

I've just been given a Creative Zen Mozaic portable audio player for my birthday (it's a bit early, but I'm not complaining!). I've had a Zen Micro for a few years now and it's a pretty decent gizmo. Small enough to fit in my pocket, has stood up to several falls, has a UI that gets out of the way and it works with Gnomad2.

Update for Ubuntu 9.10: The Mozaic now runs out of the box, not changes needed. I'll leave this here for posterity.

After plugging the new Mozaic in to my laptop, which runs the latest release of Ubuntu (9.04 Jaunty Jackalope at the time of writing) I discovered that horror! it wasn't recognised by Gnomad2. It seems like it's a common problem. The solution is sort of linked to on that bug report, but the help is in French and the file you have to change has moved since that was written. You couldn't make up a better "open source documentation sucks" anecdote even if you tried.

So.. using your favourite editor open up the file /lib/udev/rules.d/45-libmtp8.rules and add the magic lines:
# Creative ZEN Mozaic
ATTR{idVendor}=="041e", ATTR{idProduct}=="4161", SYMLINK+="libmtp-%k", MODE="660", GROUP="audio"
You'll have to do that with super powers, so use something like this to open the file with the correct permissions using gedit:

gksu gedit /lib/udev/rules.d/45-libmtp8.rules

Add the lines right before the first "#Creative ZEN" line that is already in there, just to be on the safe side. This can be derived from first priciples actually, because if you run lsusb it'll say something like:

Bus 001 Device 004: ID 041e:4161 Creative Technology, Ltd

Which ties in pretty nicely to the line you have to add to the 45-libmtp8.rules file.

Anyway, after adding that unplug the Mozaic and plug it back in again. That forces udev to reread the settings you have just changed. Start Gnomad2 and hopefully it will recognise your player and list all the tracks on it. Yipee!