RFC: StreamExecutor C API by annarev · Pull Request #257 · tensorflow/community (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation116 Commits65 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

annarev

jzhoulon

jzhoulon

Artem-B

jzhoulon

@annarev

yisitu

yisitu

yisitu

@yisitu

Defined a portable bool type TF_BOOL. Added structure size definitions and updated usage examples. Fixed typos where member functions are supposed to be member function pointers. Fixed missing extern "C".

@yisitu

@penpornk

@yisitu

jzhoulon

TF_Status* status);
// Causes the host code to synchronously wait for the event to complete.
void (*block_host_for_event)(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @annarev It seems that block_host_until_done is removed, can we keep this API? block_host_until_done is a more large-grained API, it provides more flexibility for the backend to implement, and lots of device runtime already provide the capability to match this API(CUDA: cuStreamSynchronize(stream),SYCL queue.wait() ). I saw current implementation is composed of create_event+ record_event + block_host_for_event + delete event, that means each back-end needs to implements all of these API, which limits the flexibility, and I am not sure whether it will bring some performance penalty (for DPC++, we need to enqueue a barrier to retrieve the event)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penpornk

@penpornk

@annarev

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

typedef struct SP_DeviceDescription { size_t struct_size;

// Device hardware name. Used for printing. // Must be null-terminated. char* name;

// Device vendor name. Used for printing. // Must be null-terminated. char* device_vendor;

// Returns the PCI bus identifier for this device, of the form // [domain]:[bus]:[device].[function] // Used for printing. // Must be null-terminated. char* pci_bus_id;

TF_Bool has_numa_node; // True if numa_node is set. // Returns the NUMA node associated with this device, for use in // determining socket locality. int numa_node;

TF_Bool has_memory_bandwidth; // True if memory_bandwidth is set. // Device's memory bandwidth in bytes/sec. (This is for reads/writes to/from // the device's own memory, not for transfers between the host and device.) int64_t memory_bandwidth;

TF_Bool has_gflops; // True if gflops field is set. // Estimate of floating point operations per second for this device * 10e-9. double gflops; } SP_DeviceDescription;

@penpornk

@penpornk

…gableDevice RFC (PR tensorflow#262).

@annarev

@annarev

@penpornk

@penpornk

Also some more minor edits.

@penpornk @yisitu

Co-authored-by: Situ Yi 60493024+yisitu@users.noreply.github.com

@penpornk @yisitu

Co-authored-by: Situ Yi 60493024+yisitu@users.noreply.github.com

@penpornk

Changed "shall" to "must" to make sure people would not misinterpret "shall".

@annarev

Updates based on recent changes and discussion on RFC PR tensorflow#262

@kulinseth

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription for version compatibility. Is that correct?

@annarev

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription for version compatibility. Is that correct?

Yes, we rely on struct_size and version value in SE_PlatformRegistrationParams.

@annarev

@annarev

I updated the doc to add a way to use a custom allocator:
21be6e9
Addition of a custom allocator was requested in Pluggable Device RFC:
#262

@penpornk

This RFC has been up for a while. The implementation has even been checked in over a month ago. I think we should wrap this up (i.e., merge this PR).

Changes to StreamExecutor C API since the last combined design review meeting with PluggableDevice on 8/13/20:

If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

@kulinseth

If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

Thanks Penporn, for reaching out. Nothing from our side.

@penpornk

@kulinseth Thank you for the quick confirmation! :)

@ematejska @alextp I believe there is no outstanding issue now. Could we move forward?

@alextp

theadactyl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wdfnst

@penpornk Hi, is there any PluggableDevice template for beginner? TKS

Reviewers

@sanjoy sanjoy sanjoy left review comments

@kulinseth kulinseth kulinseth left review comments

@Artem-B Artem-B Artem-B left review comments

@jzhoulon jzhoulon jzhoulon left review comments

@penpornk penpornk penpornk left review comments

@yisitu yisitu yisitu left review comments

@theadactyl theadactyl theadactyl approved these changes

@ematejska ematejska Awaiting requested review from ematejska

@ewilderj ewilderj Awaiting requested review from ewilderj

@martinwicke martinwicke Awaiting requested review from martinwicke