...

пятница, 8 февраля 2019 г.

Making Git for Windows work in ReactOS

Good day to you! image

My name is Stanislav and I like to write code. This is my first english article on Habr which I made due to several reasons:

This article is an english version of my very first article on russian.

Let me introduce the main figures in this story who actually fixed the bug preventing Git from running in ReactOS — the French developer Hermès Bélusca-Maïto (or just Hermes with hbelusca nickname) and of course me (with x86corez nickname).

The story begins with the following messages from the ReactOS Development IRC channel:

Jun 03 18:52:56 <hbelusca> Anybody want to work on some small problem? If so, can someone figure out why this problem https://jira.reactos.org/browse/CORE-12931 happens on ReactOS? :D
Jun 03 18:53:13 <hbelusca> That would help having a good ROS self-hosting system with git support.
Jun 03 18:53:34 <hbelusca> (the git assertion part only).

Debriefing

Since ReactOS target platform is Windows Server 2003, Git version 2.10.0 was chosen for the investigation — it's the last one supporting Windows XP and 2003.

Testing was done in the ReactOS Command Prompt, external symptoms of the problem were rather ambiguous. For the instance, when you run git without additional parameters, it displays the help message in the console without any problems. But once you try git clone or even just git --version in most cases the console just remains empty. Occasionally it displayed a broken message with an assertion:

git.exe clone -v "https://github.com/minoca/os.git" "C:\Documents and Settings\Administrator\Bureau\minocaos"

A s s e r t i o n   f a i l e d ! 

P r o g r a m :   C : \ P r o g r a m   F i l e s \ G i t \ m i n g w 3 2 \ b i n \ g i t . e x e 
F i l e :   e x e c _ c m d . c ,   L i n e   2 3 
E x p r e s s i o n :   a r g v 0 _ p a t h 

This application has requested the Runtime to terminate in an unusual way.
Please contact the application's support team for more information.

Fortunately git is an open source project, and it helped a lot in our investigation. It was not too hard to locate the actual code block where unhandled exception happened: https://github.com/git-for-windows/git/blob/4cde6287b84b8f4c5ccb4062617851a2f3d7fc78/exec_cmd.c#L23

char *system_path(const char *path)
{
    /* snip */

#ifdef RUNTIME_PREFIX
    assert(argv0_path); // it asserts here
    assert(is_absolute_path(argv0_path));

    /* snip */
#endif

    strbuf_addf(&d, "%s/%s", prefix, path);
    return strbuf_detach(&d, NULL);
}

And argv0_path variable value is assigned by this code:

const char *git_extract_argv0_path(const char *argv0)
{
    const char *slash;

    if (!argv0 || !*argv0)
        return NULL;

    slash = find_last_dir_sep(argv0);

    if (slash) {
        argv0_path = xstrndup(argv0, slash - argv0);
        return slash + 1;
    }

    return argv0;
}

Once I obtained all this information I've sent some messages to the IRC channel:

Jun 03 19:04:36 <x86corez> hbelusca: https://github.com/git-for-windows/git/blob/4cde6287b84b8f4c5ccb4062617851a2f3d7fc78/exec_cmd.c#L23
Jun 03 19:04:41 <x86corez> assertion is here
Jun 03 19:04:57 <hbelusca> yes I know, I've seen the code yesterday. The question is why it's FALSE on ROS but TRUE on Windows.
Jun 03 19:06:02 <x86corez> argv0_path = xstrndup(argv0, slash - argv0);
Jun 03 19:06:22 <x86corez> xstrndup returns NULL %-)
Jun 03 19:06:44 <hbelusca> ok, so what's the values of argv0 and slash on windows vs. on ROS? :P
Jun 03 19:08:48 <x86corez> good question!

The variable name suggests that the actual value is derived from argv[0] — usually it contains the executable name in zero index. But subsequently everything was not so obvious...

Jun 03 20:15:21 <x86corez> hbelusca: surprise... git uses its own xstrndup implementation
Jun 03 20:15:35 <x86corez> so I can't simply hook it xD
Jun 03 20:15:56 <hbelusca> well, with such a name "xstrndup" it's not surprising it's its own implementation
Jun 03 20:16:04 <x86corez> probably I would need an user-mode debugger... like OllyDbg
Jun 03 20:16:09 <hbelusca> that's everything but standardized function.
Jun 03 20:16:24 <hbelusca> x86corez: ollydbg should work on ROS.
Jun 03 20:16:30 <mjansen> what are you breaking today?
Jun 03 20:16:44 <x86corez> mjansen: https://jira.reactos.org/browse/CORE-12931
Jun 03 20:16:51 <hbelusca> (of course if you also are able to compile that git with symbols and all the stuff, it would be very nice)

Action

After that I decided to compile git straight from the source, so I'll be able to print variables of the interest "on the fly" directly to the console. I've followed this step-by-step tutorial and in order to compile compatible git version I've selected this branch: https://github.com/git-for-windows/git/tree/shears/v2.10.0-rc2

Setting up mingw32 toolchain went smoothly and I just started building. However almost always there are undocumented pitfalls in such step-by-step tutorials, and I immediately encountered one of them:

image

Through trial and error, as well as using hints from the hall (IRC channel), all compile time errors were fixed. If somebody wants to follow my steps, here is the diff to make compiler happy: https://pastebin.com/ZiA9MaKt

In order to avoid calling multiple functions during initialization and to reproduce the bug easily, I decided to put several debug prints right in the beginning of the main() function, which is located in common-main.c in the case of git:

int main(int argc, const char **argv)
{
    /*
     * Always open file descriptors 0/1/2 to avoid clobbering files
     * in die().  It also avoids messing up when the pipes are dup'ed
     * onto stdin/stdout/stderr in the child processes we spawn.
     */
    //DebugBreak();
    printf("sanitize_stdfds(); 1\n");
    sanitize_stdfds();

    printf("git_setup_gettext(); 1\n");
    git_setup_gettext();

    /*
     * Always open file descriptors 0/1/2 to avoid clobbering files
     * in die().  It also avoids messing up when the pipes are dup'ed
     * onto stdin/stdout/stderr in the child processes we spawn.
     */
    printf("sanitize_stdfds(); 2\n");
    sanitize_stdfds();

    printf("git_setup_gettext(); 2\n");
    git_setup_gettext();

    printf("before argv[0] = %s\n", argv[0]);
    argv[0] = git_extract_argv0_path(argv[0]);
    printf("after argv[0] = %s\n", argv[0]);

    restore_sigpipe_to_default();
    printf("restore_sigpipe_to_default(); done\n");

    return cmd_main(argc, argv);
}

I've got the following output:

C:\>git --version
sanitize_stdfds(); 1
git_setup_gettext(); 1
sanitize_stdfds(); 2
git_setup_gettext(); 2
before argv[0] = git
after argv[0] = git
restore_sigpipe_to_default(); done

A s s e r t i o n   f a i l e d ! 
(cutted a part of error message which is the same as the one above)

One can assume everything is fine here, argv[0] value is correct. I've got an idea to run git inside the debugger, OllyDbg for example, but something went wrong...

Jun 04 01:54:46 <sanchaez> now please try gdb/ollydbg in ROS
Jun 04 01:58:11 <sanchaez> you have gdb in RosBE
Jun 04 01:58:20 <sanchaez> just in case :p
Jun 04 01:59:45 <x86corez> ollydbg says "nope" with MEMORY_MANAGEMENT bsod
Jun 04 02:00:07 <x86corez> !bc 0x0000001A
Jun 04 02:00:08 <hTechBot> KeBugCheck( MEMORY_MANAGEMENT );
Jun 04 02:00:13 <hbelusca> :/
Jun 04 02:00:49 <sanchaez> welp
Jun 04 02:00:56 <sanchaez> you only have one option now :D

And right here sanchaez suggested an excellent idea that shed light on many things!

image

Assertion no longer occurred, and git successfully printed its version.

Jun 04 02:23:40 <x86corez> it prints!
Jun 04 02:23:44 <x86corez> but only in gdb
Jun 04 02:23:53 <hbelusca> oh
Jun 04 02:24:00 <hbelusca> C:\git/git.exe
Jun 04 02:24:13 <hbelusca> I wonder whether it's the same in windows, or not.

The case moved from the dead center, and I tried different ways to run git in the command prompt, and found the right way!

image

The problem was clearly that git expected the full path on the command line. So I compared its debug output on Windows. The results surprised me a bit.

image

For some reason argv[0] value contained the full path to the git.exe binary.

Jun 05 23:01:44 <hbelusca> x86corez: can you try to run git also by not using cmd.exe?
Jun 05 23:02:05 <hbelusca> (to exclude the possibility it's cmd that doesn't call Createprocess with a complete path)
Jun 05 23:02:09 <hbelusca> while I think it should...
Jun 05 23:02:30 <x86corez> not using cmd... moment
Jun 05 23:02:55 <hbelusca> x86corez: alternatively, on windows, try starting git using our own cmd.exe :)

Hermes suggested to check whether ReactOS cmd.exe is a guilty component here...

image

But this screenshot confirmed that the actual problem is somewhere else.

Jun 05 23:04:38 <x86corez> ROS cmd is not guilty
Jun 05 23:07:57 <hbelusca> If there was a possibility to consult the received path, before looking at the contents of argvs... ?
Jun 05 23:08:30 <x86corez> dump contents of actual command line?
Jun 05 23:08:39 <hbelusca> yeah
Jun 05 23:09:39 <hbelusca> The thing you retrieve using GetCommandLineW
Jun 05 23:10:03 <hbelusca> (which is, after simplifications, basically : NtCurrentPeb()->ProcessParameters->CommandLine )
Jun 05 23:10:59 <hbelusca> Also I was thinking it could be a side-effect of having (or not having) git path into the env-vars....
Jun 05 23:12:17 <x86corez> hbelusca, command line is "git  --version"
Jun 05 23:12:34 <hbelusca> Always?
Jun 05 23:12:39 <x86corez> Yes, even on Windows
Jun 05 23:15:13 <hbelusca> ok but then it would be nice if these different results are at least the same on Windows and on ROS, so that we can 100% exclude problems outside of msvcrt.

The last option was to test ReactOS msvcrt.dll in Windows. I tried to place the file in the same directory where git.exe is located, but it didn't help. Mark suggested to add an .local file:

Jun 05 22:59:01 <mjansen> x86corez: add .local file next to msvcrt.dll ;)
Jun 05 22:59:47 <mjansen> exename.exe.local
Jun 05 23:00:17 <x86corez> just an empty file?
Jun 05 23:00:21 <mjansen> yea
Jun 05 23:00:49 <hbelusca> mjansen: do we support these .local files?
Jun 05 23:00:52 <mjansen> we dont
Jun 05 23:00:54 <mjansen> windows does
Jun 05 23:15:48 <x86corez> moment... I'll try with .local
Jun 05 23:18:43 <x86corez> mjansen: I've created git.exe.local but it still doesn't load msvcrt.dll in this directory

But for some reason this method did not work either. Perhaps the fact that I did all the experiments on the server edition of Windows (2008 R2).

The last idea was suggested by Hermes:

Jun 05 23:19:28 <hbelusca> last solution: patch "msvcrt" name within git and perhaps other mingwe dlls ^^
Jun 05 23:20:12 <x86corez> good idea about patching!

So I replaced all occurrences of msvcrt in git.exe with msvcrd using WinHex, and renamed ReactOS msvcrt.dll accordingly, and here we are:

image

Jun 05 23:23:29 <x86corez> Yes! guilty is msvcrt :)
Jun 05 23:25:37 <hbelusca> ah, so as soon as git uses our msvcrt we get the problem on windows.
Jun 05 23:25:38 <x86corez> hbelusca, mjansen, https://image.prntscr.com/image/FoOWnrQ4SOGMD-66DLW16Q.png
Jun 05 23:25:58 <hbelusca> aha and it asserts <3
Jun 05 23:26:03 <hbelusca> (it shows the assertion now)

Now we hit the same assertion, but already in Windows! And that means the source of our problems is in the one of ReactOS msvcrt functions.

It's also worth to note that the assertion message is displayed correctly in Windows.

Jun 05 23:26:13 <x86corez> but it prints text and correctly.
Jun 05 23:26:20 <hbelusca> oh
Jun 05 23:26:33 <x86corez> and on ROS it doesn't print in most cases xD
Jun 05 23:26:38 <hbelusca> so also it excludes another hypothesis, namely that it could have been a bug in our msvcrt/crt
Jun 05 23:26:56 <hbelusca> So possibly a strange bug in our console

So, to solve the actual problem, we had to find the API function in msvcrt which gives the full path of the current application. I googled a bit and assumed the problem is with _pgmptr function.

Jun 06 00:07:43 <x86corez> https://msdn.microsoft.com/en-us/library/tza1y5f7.aspx
Jun 06 00:07:57 <x86corez> When a program is run from the command interpreter (Cmd.exe), _pgmptr is automatically initialized to the full path of the executable file.
Jun 06 00:08:01 <x86corez> this ^^)
Jun 06 00:08:50 <hbelusca> That's what GetModuleFileName does.
Jun 06 00:09:04 <x86corez> yeah
Jun 06 00:10:30 <hbelusca> Of course in ROS msvcrt we don't do this, but instead we initialize pgmptr to what argv[0] could be.
Jun 06 00:11:08 <hbelusca> That's one thing.
Jun 06 00:11:34 <hbelusca> The other thing is that nowhere it appears (in MS CRT from VS, or in wine) that argv is initialized using pgmptr.
Jun 06 00:13:33 <x86corez> hbelusca, I've checked argv[0] in some ROS command line tools, running them in Windows
Jun 06 00:13:56 <x86corez> they all interpret argv[0] as command line, not full path
Jun 06 00:14:04 <x86corez> so... I think it's git specific behaviour
Jun 06 00:14:16 <x86corez> or specific mingw compiler settings
Jun 06 00:28:12 <hbelusca> x86corez: I'm making a patch for our msvcrt, would be nice if you could test it :)
Jun 06 00:28:21 <x86corez> I'll test it

Hermes sent a link to the patch, I manually applied it and rebuilt the system, and after these moves the original problem magically disappeared!

image

Jun 06 00:34:26 <x86corez> hbelusca, IT WORKS!
Jun 06 00:35:10 <hbelusca> L O L
Jun 06 00:35:18 <hbelusca> So it seems that something uses pgmptr to rebuild an argv.
Jun 06 00:35:52 <x86corez> I've even able to clone :)
Jun 06 00:36:19 <hbelusca> \o/
Jun 06 00:36:21 <gigaherz> 2.10.0-rc2? not the release?
Jun 06 00:36:24 <hbelusca> ok I'm gonna commit that stuff.
Jun 06 00:36:43 <hbelusca> x86corez: gonna have ROS self-hosting <33
Jun 06 00:36:48 <x86corez> yeah!
Jun 06 00:37:01 <x86corez> gigaherz: I've built that from sources
Jun 06 00:37:37 <gigaherz> oh, for testing this bug? o_O
Jun 06 00:37:50 <sanchaez> yes, you missed the fun :p
Jun 06 00:39:46 <x86corez> git 2.10.0-windows.1 (release) works too!
Jun 06 00:39:54 <encoded> commit!!!

Afterword

That said, another bug that indirectly prevents ReactOS from building itself has been fixed thanks to collective efforts. The funny coincidence is the fact that not long before another bug was fixed in the same msvcrt dynamic library (namely, in the qsort function) which did not allow to compile the USB drivers in ReactOS.

I participate in the development of many projects written in different programming languages, both closed and open source. I'm contributing to the ReactOS project since 2014, but I began to actively help and actually write code only in 2017. It's especially interesting to work in this area because it's an entire operating system! You feel a huge scale of the result in which the efforts were invested, as well as a pleasant feeling that one bug became less! :)

Someone will probably wonder why I'm contributing to ReactOS and not Linux for example. So historically in most cases I write programs for Windows and my favorite programming language is Delphi. Perhaps that's why the architecture of Windows NT together with the Win32 API is very interesting to me, and the ReactOS project of the free Windows alternative makes the old dream come true — it allows you to find out how everything works under the hood in practice.

I hope you enjoyed my first english article here. I'm looking forward to your comments!



Let's block ads! (Why?)

Комментариев нет:

Отправить комментарий