Thursday, December 20, 2012

git hangs after "Resolving deltas"

Have had a funy problem with Git. I suppose it's proxy-related. Writing it down, because sure that will have the same problem some time again. Also hope it will help to people who are also suffering with it.

As a precondition, I have a git with following in '.gitconfig':

[http]
proxy=http://user:password@proxy:8080

When I tried to clone repository I've got this:

$ git clone https://code.google.com/p/caliper/
Cloning into 'caliper'...
remote: Counting objects: 3298, done.
remote: Finding sources: 100% (3298/3298), done.
remote: Total 3298 (delta 1755)
Receiving objects: 100% (3298/3298), 7.14 MiB | 1.94 MiB/s, done.
Resolving deltas: 100% (1755/1755), done.

And then nothing, it just hangs. If you go and have a look, you can see that files are downloaded, but not unpacked. As all other people on Internet, I have no idea why that is happening, but eventually I have found a way to get files out of it.

When it hangs, just kill the process with Ctrl+C and run this command in repository folder:

$ git fsck
notice: HEAD points to an unborn branch (master)
Checking object directories: 100% (256/256), done.
Checking objects: 100% (3298/3298), done.
notice: No default references
dangling commit 2916d1238ca0f4adecbda580ef4329a649fc777c
Now just merge that dangling commit:
$ git merge 2916d1238ca0f4adecbda580ef4329a649fc777c
and from now on you can enjoy repository content in any way you want.

Thursday, December 13, 2012

File.setLastModified & File.lastModified

Have observed interesting behavior of File.lastModified file property on Linux. Basically, my problem was that I was incrementing the value of that property by 1 in one thread and monitoring the change in the other thread. And apparently no change in property's value happened, the other thread did not see increment. After some time trying to make it work, I realized that I have to increment it at least by a 1000 to make the change visible.

Wondering why that is happening, I have had a look at JDK source code and that's what I found:

JNIEXPORT jlong JNICALL
Java_java_io_UnixFileSystem_getLastModifiedTime(JNIEnv *env, jobject this,
                                                jobject file)
{
    jlong rv = 0;

    WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path) {
        struct stat64 sb;
        if (stat64(path, &sb) == 0) {
            rv = 1000 * (jlong)sb.st_mtime;
        }
    } END_PLATFORM_STRING(env, path);
    return rv;
}

What happens is that on Linux File.lastModified has 1sec resolution and simply ignores milliseconds. I'm not an expert in Linux programming, so not sure is there any way get that time with millisecond resolution on Linux. Assume it should be possible because 'setLastModified' seems like is working as it is expected to work - sets modification time with millisecond resolution (you can find the source code in 'UnixFileSystem_md.c').

So, just a nice thing to remember: when you work with files on Linux, you may not see change in File.lastModified when it's value updated for less than 1000ms.

Wednesday, October 24, 2012

Effective Concurrency by Herb Sutter

Have never ever written feedback on events or courses, but here I decided to write one. It is about "Effective concurrency" course by Herb Sutter. Hopefully that post will help to someone to support an approval for that course :)

So, as I have already said, a few weeks back I was lucky enough to attend "Effective Concurrency" course by Herb Sutter. That guy is software architect at Microsoft where he has been the lead designer of C++/CLI, C++/CX, C++ AMP, and other technologies. He also has served for a decade as chair of the ISO C++ standards committee. Many people also know him for his books.

Tuesday, September 11, 2012

Building OpenJDK on Windows

Experimenting with some stuff, I found that it is often useful to have JDK source code available in hand to make some changes, play with it, etc. So I decided to download and compile that beast. Apparently, it took me some time to do that, although my initial thought was that it's should be as simple as running make command :). As you can guess, I found that it's not a trivial task and to simplify my life in future, it would be useful to keep some records of what I was doing.

Saturday, May 19, 2012

Bug in Java Memory Model implementation

Just have came around amazing question on stackoverwflow:

http://stackoverflow.com/questions/10620680/why-volatile-in-java-5-doesnt-synchronize-cached-copies-of-variables-with-main

Basically the guy there is trying to use "piggybacking" to publish non-volatile variable and it doesn't work. "piggybacking" is a technique that uses data visibility guarantees of volatile variable or monitor to publish non-volatile data. For example such technique is used in ConcurrentHashmap#containsValue() and ConcurrentHashmap#containsKey(). The fact that is doesn't work in that case is a bug in Oracle's Java implementation. And that is rather scary - concurrency problems are very hard to indentify even on bug-free JVM and such bugs in Memory Model implementation making things much worse. Hopefully that's the only bug related to JMM and Oracle has good test coverage for such cases.

The good news is that this particular problem appears just on C1 (client Hotspot compiler) and not in all cases. It doesn't happen on C2 (server compiler, enabled with "-server" switch). Fortunately, the most of people are running java on server side and there are quite a few client application which are using advanced concurrency features.

For ones who want to understand that case better, please, follow the link, I've provided at the beginning of post. Also there is very useful post on "concurrency-interest", which also has a good explanation of what is going on there: http://cs.oswego.edu/pipermail/concurrency-interest/2012-May/009449.html

Monday, February 6, 2012

What is behind System.nanoTime()?

In java world there is a very good perception about System.nanoTime(). There is always some guys who says that it is fast, reliable and, whenever possible, should be used for timings instead of System.currentTimemillis(). In overall he is absolutely lying, it is not bad at all, but there are some drawback which developer should be aware about. Also, although they have a lot in common, these drawbacks are usually platform-specific.

Windows

Functionality is implemented using QueryPerformanceCounter API, which is known to have some issues. There is possibility that it can leap forward, some people are reporting that is can be extremely slow on multiprocessor machines, etc. I spent a some time on net trying to find how exactly QueryPerformanceCounter works and what is does. There is no clear conclusion on that topic but there are some posts which can give some brief idea how it works. I would say that the most useful, probably are that and that ones. Sure, one can find more if search a little bit, but info will be more or less the same.

So, it looks like implementation is using HPET, if it is available. If not, then it uses TSC with some kind of synchronization of the value among CPUs. Interestingly that QueryPerformanceCounter promise to return value which increases with constant frequency. It means that in case of using TSC and several CPUs it may have some difficulties not just with the fact that CPUs may have just different value of TSC, but also may have different frequency. Keeping all that in mind Microsoft recommends to use SetThreadAffinityMask to stuck thread which calls to QueryPerformanceCounter to single processor, which, obviously, is not happening in JVM.

Linux

Linux is very similar to Windows, apart from the fact that it is much more transparent (I managed to download sources :) ). The value is read from clock_gettime with CLOCK_MONOTONIC flag (for real man, source is available in vclock_gettime.c from Linux source). Which uses either TSC or HPET. The only difference with Windows is that Linux not even trying to sync values of TSC read from different CPUs, it just returns it as it is. It means that value can leap back and jump forward with dependency of CPU where it is read. Also, in contrast to Windows, Linux doesn't keep change frequency constant. On the other hand, it definitely should improve performance.

Solaris

Solaris is simple. I believe that via gethrtime it goes to more or less the same implementation of clock_gettime as linux does. The difference is that Solaris guarantees that counter will not leap back, which is possible on Linux, but it is possible that the same value will be returned back. That guarantee, as can be observed from source code, is implemented using CAS, which requires sync with the main memory and can be relatively expensive on multi-processor machines. The same as on Linux, change rate can vary.

Conclusion

The conclusion is king of cloudy. Developer has to be aware that function is not perfect, it can leap back or just forward. It may not change monotonically and change rate can vary with dependency on CPU clock speed. Also, it is not as fast as many may think. On my Windows 7 machine in a single threaded test it is just about 10% faster than System.currentTimeMillis(), on multi threaded test, where number of threads is the same as number of CPUs, it is just the same. And on IBM Z400 workstation with WinXP System.nanoTime() is always approximately 8 times slower.

So, in overall, all it gives is increase in resolution, which may be important for some cases. And as a final note, even when CPU frequency is not changing, do no think that you can map that value reliably to system clock, see details here (this example describes just Windows, but more or less the same stuff is applicable to all other OSes).

Appendix

Appendix contains implementations of the function for different OSes. Source code is from OpenJDK v.7.

Solaris

// gethrtime can move backwards if read from one cpu and then a different cpu
// getTimeNanos is guaranteed to not move backward on Solaris
inline hrtime_t getTimeNanos() {
  if (VM_Version::supports_cx8()) {
    const hrtime_t now = gethrtime();
    // Use atomic long load since 32-bit x86 uses 2 registers to keep long.
    const hrtime_t prev = Atomic::load((volatile jlong*)&max_hrtime);
    if (now <= prev)  return prev;   // same or retrograde time;
    const hrtime_t obsv = Atomic::cmpxchg(now, (volatile jlong*)&max_hrtime, prev);
    assert(obsv >= prev, "invariant");   // Monotonicity
    // If the CAS succeeded then we're done and return "now".
    // If the CAS failed and the observed value "obs" is >= now then
    // we should return "obs".  If the CAS failed and now > obs > prv then
    // some other thread raced this thread and installed a new value, in which case
    // we could either (a) retry the entire operation, (b) retry trying to install now
    // or (c) just return obs.  We use (c).   No loop is required although in some cases
    // we might discard a higher "now" value in deference to a slightly lower but freshly
    // installed obs value.   That's entirely benign -- it admits no new orderings compared
    // to (a) or (b) -- and greatly reduces coherence traffic.
    // We might also condition (c) on the magnitude of the delta between obs and now.
    // Avoiding excessive CAS operations to hot RW locations is critical.
    // See http://blogs.sun.com/dave/entry/cas_and_cache_trivia_invalidate
    return (prev == obsv) ? now : obsv ;
  } else {
    return oldgetTimeNanos();
  }
}

Linux

jlong os::javaTimeNanos() {
  if (Linux::supports_monotonic_clock()) {
    struct timespec tp;
    int status = Linux::clock_gettime(CLOCK_MONOTONIC, &tp);
    assert(status == 0, "gettime error");
    jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) + jlong(tp.tv_nsec);
    return result;
  } else {
    timeval time;
    int status = gettimeofday(&time, NULL);
    assert(status != -1, "linux error");
    jlong usecs = jlong(time.tv_sec) * (1000 * 1000) + jlong(time.tv_usec);
    return 1000 * usecs;
  }
}

Windows

jlong os::javaTimeNanos() {
  if (!has_performance_count) {
    return javaTimeMillis() * NANOS_PER_MILLISEC; // the best we can do.
  } else {
    LARGE_INTEGER current_count;
    QueryPerformanceCounter(¤t_count);
    double current = as_long(current_count);
    double freq = performance_frequency;
    jlong time = (jlong)((current/freq) * NANOS_PER_SEC);
    return time;
  }
}

References

Inside the Hotspot VM: Clocks, Timers and Scheduling Events
Beware of QueryPerformanceCounter()
Implement a Continuously Updating, High-Resolution Time Provider for Windows
Game Timing and Multicore Processors
High Precision Event Timer (Wikipedia)
Time Stamp Counter (Wikipedia)