">

(original) (raw)


On 22 Nov 2017, at 02:32, comic fans <comicfans44@gmail.com> wrote:

with some dirty hack , I've made xray runtime 'built' on windows ,

\\o/

but unfortunately I haven't enough knowledge about linker and the
runtime, and finally built executable didn't run. I'd like to share
my changes here , hopes somebody help me to make it run on windows.

Thanks for working on this!

If you're alright with it, maybe you can send some patches to review, preferably through the LLVM Phabricator instance? You can have me or Reid (who knows more about COFF and the Windows stuff) as reviewers.

in AsmPrinter, copy/paster xray for coff target

InstMap = OutContext.getCOFFSection("xray\_instr\_map", 0,
SectionKind::getReadOnlyWithRel());
FnSledIndex = OutContext.getCOFFSection("xray\_fn\_idx",
0,SectionKind::getReadOnlyWithRel());

in XRayArgs , allow windows platform to use xray args. with this,
generated code seems have sled and xray parts.


Nice, I suspect we can make this change with tests as well, which we can build on incrementally.

in xray runtime,
bool atomic\_compare\_exchange\_strong(volatile atomic\_sint32\_t \*a,
s32 \*cmp,
s32 xchg,
memory\_order mo)
is missed for MSVC , I take atomic\_uint32\_t implementation


This is in compiler-rt/lib/sanitizer\_common/... right?

msvc 14.1 treats BufferQueue::Buffer::Buffer as constructor instead of
data member, Buf.Buffer=>Buf.Data


Interesting. That's an easy patch to merge. :)

FunctionRecord pack , \_\_attribute\_\_((packed)) => #pragma
pack(push,1), msvc also requires bitfields to be same type to pack
them together( all types => uint32\_t)


Are you able to test this on other platforms?

FD int => HANDLE, most code logic still valid (-1 as invalid value),
r/w API replaced with windows

mprotect => VirtualProtect

readTSC in xray\_x86\_64.inc also works for windows

replace read tsc from proc with QueryPerformanceFrequency

msvc can not compile such code
void setupNewBuffer(int (\*wall\_clock\_reader)(clockid\_t,
struct timespec \*));

must use typedef first . xray use clock\_gettime as default
implementation , which is not friendly for windows .create a fake one
based on chrono system\_clock(ignore clockid\_t)


This one is definitely something to do, even for potentially supporting XRay on Darwin where older versions of the SDK (10.11 and lower) don't define clock\_gettime. Probably can be split off as a thing that can be reviewed and merged regardless.

for tls destructor part, I've just commented them out.(but
https://www.codeproject.com/Articles/8113/Thread-Local-Storage-The-C-Way
gives a thread exit callback way for coff)


Interesting, thanks! This one is something that could be abstracted away on a per-platform basis.

and last thing , which I don't understand is the weak symbol for
\_\_start\_xray\_instr\_map\[\]
\_\_stop\_xray\_instr\_map\[\]
\_\_start\_xray\_fn\_idx\[\]
\_\_stop\_xray\_fn\_idx\[\]

I replace them with \_\_declspec(selectany) , but I'm not sure they
have same meanings.


The \_\_{start, stop}\_xray\_{instr\_map,fn\_idx}\[\] arrays are usually generated by the linker on ELF and ELF-like platforms. I'm not aware what the MSVC COFF linkers do, probably something others who know better can answer.


some random generated code:
.text
.intel\_syntax noprefix
.def call;
.scl 2;
.type 32;
.endef
.globl call # -- Begin function call
.p2align 4, 0x90
call: # @call
.seh\_proc call
# BB#0: # %entry
.p2align 1, 0x90
.Lxray\_sled\_0:
.ascii "\\353\\t"
nop word ptr \[rax + rax + 512\]
sub rsp, 16
.seh\_stackalloc 16
.seh\_endprologue
mov dword ptr \[rsp + 12\], ecx
mov dword ptr \[rsp + 8\], 0
mov dword ptr \[rsp + 4\], 0
.LBB0\_1: # %for.cond
# =>This Inner Loop Header: Depth=1
mov eax, dword ptr \[rsp + 4\]
cmp eax, dword ptr \[rsp + 12\]
jge .LBB0\_4
# BB#2: # %for.body
# in Loop: Header=BB0\_1 Depth=1
mov eax, dword ptr \[rsp + 4\]
add eax, dword ptr \[rsp + 8\]
mov dword ptr \[rsp + 8\], eax
# BB#3: # %for.inc
# in Loop: Header=BB0\_1 Depth=1
mov eax, dword ptr \[rsp + 4\]
add eax, 1
mov dword ptr \[rsp + 4\], eax
jmp .LBB0\_1
.LBB0\_4: # %for.end
mov eax, dword ptr \[rsp + 8\]
add rsp, 16
.p2align 1, 0x90
.Lxray\_sled\_1:
ret
nop word ptr cs:\[rax + rax + 512\]
.seh\_handlerdata
.text
.seh\_endproc
# -- End function
.section xray\_instr\_map,"y"
.Lxray\_sleds\_start0:
.quad .Lxray\_sled\_0
.quad call
.byte 0x00
.byte 0x00
.byte 0x00
.zero 13
.quad .Lxray\_sled\_1
.quad call
.byte 0x01
.byte 0x00
.byte 0x00
.zero 13
.Lxray\_sleds\_end0:
.section xray\_fn\_idx,"y"
.p2align 4, 0x90
.quad .Lxray\_sleds\_start0
.quad .Lxray\_sleds\_end0
.text

and parts of obj dump:


SECTION HEADER #5
/16 name (xray\_instr\_map)
0 physical address
0 virtual address
40 size of raw data
198 file pointer to raw data (00000198 to 000001D7)
1D8 file pointer to relocation table
0 file pointer to line numbers
4 number of relocations
0 number of line numbers
100000 flags
1 byte align

RAW DATA #5
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020: 56 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 V...............
00000030: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................


RELOCATIONS #5
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000000 ADDR64 00000000 00000000 0 .text
00000008 ADDR64 00000000 00000000 E call
00000020 ADDR64 00000000 00000056 0 .text
00000028 ADDR64 00000000 00000000 E call

SECTION HEADER #6
/4 name (xray\_fn\_idx)
0 physical address
0 virtual address
10 size of raw data
200 file pointer to raw data (00000200 to 0000020F)
210 file pointer to relocation table
0 file pointer to line numbers
2 number of relocations
0 number of line numbers
500000 flags
16 byte align

RAW DATA #6
00000000: 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@.......

RELOCATIONS #6
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000000 ADDR64 00000000 00000000 8 xray\_instr\_map
00000008 ADDR64 00000000 00000040 8 xray\_instr\_map


This looks like it's actually worked, at least at CodeGen time.

Thanks again for sharing your experience, it'd be really great if you can have patches that we can review and land to potentially get XRay working on Windows!

Cheers

On Tue, Nov 21, 2017 at 7:46 PM, Dean Michael Berris
<dean.berris@gmail.com> wrote:

On 17 Nov 2017, at 00:44, comic fans via llvm-dev <llvm-dev@lists.llvm.org>
wrote:

I'm learning the xray library and try if it can be built on windows, in
xray\_fdr\_logging\_impl.h

line 152 , comment written as
// Using pthread\_once(...) to initialize the thread-local data structures


but at line 175, 183, code written as

thread\_local pthread\_key\_t key;

// Ensure that we only actually ever do the pthread initialization once.
thread\_local bool UNUSED Unused = \[\] {
new (&TLSBuffer) ThreadLocalData();
auto result = pthread\_key\_create(&key, +\[\](void \*) {
auto &TLD = \*reinterpret\_cast(&TLSBuffer);


I'm confused that pthread\_key\_t and Unused are both thread\_local
variable, doesn't it mean the following lambda will run for each
thread , and create one pthread\_key\_t for only one tls data(instead of
only one pthread\_key\_t for all thread) ? also what does the '+' before
lambda expression mean ? this may be stupid questions, could somebody
kindly helped ?


Yeah, that comment is out-of-date (and the implementation is buggy) -- which
is a shame really. :/

But, the good news, is I think we've fixed this now in the top-of-trunk with
https://reviews.llvm.org/D39526 and https://reviews.llvm.org/D40164.

Curiously though, how far did your exploration into getting XRay to build on
Windows go?

Cheers

-- Dean


-- Dean