RFC: StreamExecutor C API by annarev · Pull Request #257 · tensorflow/community (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation116 Commits65 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Defined a portable bool type TF_BOOL. Added structure size definitions and updated usage examples. Fixed typos where member functions are supposed to be member function pointers. Fixed missing extern "C".
TF_Status* status); |
---|
// Causes the host code to synchronously wait for the event to complete. |
void (*block_host_for_event)( |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @annarev It seems that block_host_until_done is removed, can we keep this API? block_host_until_done is a more large-grained API, it provides more flexibility for the backend to implement, and lots of device runtime already provide the capability to match this API(CUDA: cuStreamSynchronize(stream),SYCL queue.wait() ). I saw current implementation is composed of create_event+ record_event + block_host_for_event + delete event, that means each back-end needs to implements all of these API, which limits the flexibility, and I am not sure whether it will bring some performance penalty (for DPC++, we need to enqueue a barrier to retrieve the event)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to add the following SP_DeviceDescription
struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id
or if some optional fields should be required).
typedef struct SP_DeviceDescription { size_t struct_size;
// Device hardware name. Used for printing. // Must be null-terminated. char* name;
// Device vendor name. Used for printing. // Must be null-terminated. char* device_vendor;
// Returns the PCI bus identifier for this device, of the form // [domain]:[bus]:[device].[function] // Used for printing. // Must be null-terminated. char* pci_bus_id;
TF_Bool has_numa_node; // True if numa_node
is set.
// Returns the NUMA node associated with this device, for use in
// determining socket locality.
int numa_node;
TF_Bool has_memory_bandwidth; // True if memory_bandwidth
is set.
// Device's memory bandwidth in bytes/sec. (This is for reads/writes to/from
// the device's own memory, not for transfers between the host and device.)
int64_t memory_bandwidth;
TF_Bool has_gflops; // True if gflops
field is set.
// Estimate of floating point operations per second for this device * 10e-9.
double gflops;
} SP_DeviceDescription;
- Added the Implementation Conventions and Usage Overview section.
- Replaced the Stability / User Impact with the Versioning Strategy and Stability section. Also moved the section higher.
- Updated the API design according to the real implementation.
- Updated code examples.
…gableDevice RFC (PR tensorflow#262).
- Referenced semver and TensorFlow's Version Compatibility page.
- Added the Updating Guidelines, Convention, and Detecting Incompatibility sections.
- Merged the current code examples with @yisitu's newer ones on PR tensorflow#262 and simplified them a bit.
Also some more minor edits.
Co-authored-by: Situ Yi 60493024+yisitu@users.noreply.github.com
Co-authored-by: Situ Yi 60493024+yisitu@users.noreply.github.com
Changed "shall" to "must" to make sure people would not misinterpret "shall".
Updates based on recent changes and discussion on RFC PR tensorflow#262
I am planning to add the following
SP_DeviceDescription
struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such aspci_bus_id
or if some optional fields should be required).
Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription
for version compatibility. Is that correct?
I am planning to add the following
SP_DeviceDescription
struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such aspci_bus_id
or if some optional fields should be required).Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in
DeviceDescription
for version compatibility. Is that correct?
Yes, we rely on struct_size
and version value in SE_PlatformRegistrationParams
.
I updated the doc to add a way to use a custom allocator:
21be6e9
Addition of a custom allocator was requested in Pluggable Device RFC:
#262
This RFC has been up for a while. The implementation has even been checked in over a month ago. I think we should wrap this up (i.e., merge this PR).
Changes to StreamExecutor C API since the last combined design review meeting with PluggableDevice on 8/13/20:
- Added more explanation of the overall functionality.
- Added more clarifications on versioning strategy.
- Corrected typos. Minor functionality additions such as
block_host_until_done
,unified_memory_allocate/deallocate
, etc.
If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.
If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.
Thanks Penporn, for reaching out. Nothing from our side.
@kulinseth Thank you for the quick confirmation! :)
@ematejska @alextp I believe there is no outstanding issue now. Could we move forward?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@penpornk Hi, is there any PluggableDevice template for beginner? TKS
Reviewers
sanjoy sanjoy left review comments
kulinseth kulinseth left review comments
Artem-B Artem-B left review comments
jzhoulon jzhoulon left review comments
penpornk penpornk left review comments
yisitu yisitu left review comments
theadactyl theadactyl approved these changes
ematejska Awaiting requested review from ematejska
ewilderj Awaiting requested review from ewilderj
martinwicke Awaiting requested review from martinwicke