Zac Yu - Academia.edu (original) (raw)

Papers by Zac Yu

Research paper thumbnail of Syntharch: Interactive Image Search with Attribute-Conditioned Synthesis

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

The use of interactive systems has been found to be a promising approach for content-based image ... more The use of interactive systems has been found to be a promising approach for content-based image retrieval, the task of retrieving a specific image from a database based on its content. These systems allow the user to refine the set of results iteratively until the target is reached. In order to proceed with the search efficiently, conventional methods rely on some shared knowledge between the user and the system, such as semantic visual attributes of the images. Those approaches demand the images to be semantically labeled and introduce a semantic gap between the two parties' understanding. In this paper, we explore an alternative approach to interactive image search where feedback is elicited exclusively in visual forms, therefore eliminating the semantic gap and allowing for a generalized version of the method to operate on unlabeled databases. We present Syntharch, a novel interactive image search approach which uses synthesized images as options for feedback, instead of asking textual questions to gain information on the relative attribute values of the target image. We further demonstrate that by using synthesized images rather than real images retrieved from the database as feedback options, Syntharch causes less confusion to the user. Finally, we establish that our proposed search method performs similarly or better in comparison to the conventional approach.

Research paper thumbnail of Mave

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022

Attribute value extraction refers to the task of identifying values of an attribute of interest f... more Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attributevalue annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover. Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is an effective dataset for attribute value extraction task. It is also a very challenging task on zero-shot attribute extraction. Data is available at https:// github.com/ google-research-datasets/ MAVE.

Research paper thumbnail of Opening Up an Intelligent Tutoring System Development Environment for Extensible Student Modeling

Lecture Notes in Computer Science, 2018

ITS authoring tools make creating intelligent tutoring systems more cost effective, but few autho... more ITS authoring tools make creating intelligent tutoring systems more cost effective, but few authoring tools make it easy to flexibly incorporate an open-ended range of student modeling methods and learning analytics tools. To support a cumulative science of student modeling and enhance the impact of real-world tutoring systems, it is critical to extend ITS authoring tools so they easily accommodate novel student modeling methods. We report on extensions to the CTAT/Tutorshop architecture to support a plug-in approach to extensible student modeling, which gives an author full control over the content of the student model. The extensions enhance the range of adaptive tutoring behaviors that can be authored and support building external, student-or teacher-facing real-time analytics tools. The contributions of this work are: (1) an open architecture to support the plugging in, sharing, re-mixing, and use of advanced student modeling techniques, ITSs, and dashboards; and (2) case studies illustrating diverse ways authors have used the architecture.

Research paper thumbnail of Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020

Attribute value extraction refers to the task of identifying values of an attribute of interest f... more Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. It is an important research topic which has been widely studied in e-Commerce and relation learning. There are two main limitations in existing attribute value extraction methods: scalability and generalizability. Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications. Moreover, very limited research has focused on generalizing extraction to new attributes. In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework. In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context. A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable. A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context. The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework. We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.

Research paper thumbnail of GymCam

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2018

Worn sensors are popular for automatically tracking exercises. However, a wearable is usually att... more Worn sensors are popular for automatically tracking exercises. However, a wearable is usually attached to one part of the body, tracks only that location, and thus is inadequate for capturing a wide range of exercises, especially when other limbs are involved. Cameras, on the other hand, can fully track a user's body, but suffer from noise and occlusion. We present GymCam, a camera-based system for automatically detecting, recognizing and tracking multiple people and exercises simultaneously in unconstrained environments without any user intervention. We collected data in a varsity gym, correctly segmenting exercises from other activities with an accuracy of 84.6%, recognizing the type of exercise at 93.6% accuracy, and counting the number of repetitions to within ± 1.7 on average. GymCam advances the field of real-time exercise tracking by filling some crucial gaps, such as tracking whole body motion, handling occlusion, and enabling single-point sensing for a multitude of users.

Research paper thumbnail of Syntharch: Interactive Image Search with Attribute-Conditioned Synthesis

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

The use of interactive systems has been found to be a promising approach for content-based image ... more The use of interactive systems has been found to be a promising approach for content-based image retrieval, the task of retrieving a specific image from a database based on its content. These systems allow the user to refine the set of results iteratively until the target is reached. In order to proceed with the search efficiently, conventional methods rely on some shared knowledge between the user and the system, such as semantic visual attributes of the images. Those approaches demand the images to be semantically labeled and introduce a semantic gap between the two parties' understanding. In this paper, we explore an alternative approach to interactive image search where feedback is elicited exclusively in visual forms, therefore eliminating the semantic gap and allowing for a generalized version of the method to operate on unlabeled databases. We present Syntharch, a novel interactive image search approach which uses synthesized images as options for feedback, instead of asking textual questions to gain information on the relative attribute values of the target image. We further demonstrate that by using synthesized images rather than real images retrieved from the database as feedback options, Syntharch causes less confusion to the user. Finally, we establish that our proposed search method performs similarly or better in comparison to the conventional approach.

Research paper thumbnail of Mave

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022

Attribute value extraction refers to the task of identifying values of an attribute of interest f... more Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attributevalue annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover. Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is an effective dataset for attribute value extraction task. It is also a very challenging task on zero-shot attribute extraction. Data is available at https:// github.com/ google-research-datasets/ MAVE.

Research paper thumbnail of Opening Up an Intelligent Tutoring System Development Environment for Extensible Student Modeling

Lecture Notes in Computer Science, 2018

ITS authoring tools make creating intelligent tutoring systems more cost effective, but few autho... more ITS authoring tools make creating intelligent tutoring systems more cost effective, but few authoring tools make it easy to flexibly incorporate an open-ended range of student modeling methods and learning analytics tools. To support a cumulative science of student modeling and enhance the impact of real-world tutoring systems, it is critical to extend ITS authoring tools so they easily accommodate novel student modeling methods. We report on extensions to the CTAT/Tutorshop architecture to support a plug-in approach to extensible student modeling, which gives an author full control over the content of the student model. The extensions enhance the range of adaptive tutoring behaviors that can be authored and support building external, student-or teacher-facing real-time analytics tools. The contributions of this work are: (1) an open architecture to support the plugging in, sharing, re-mixing, and use of advanced student modeling techniques, ITSs, and dashboards; and (2) case studies illustrating diverse ways authors have used the architecture.

Research paper thumbnail of Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020

Attribute value extraction refers to the task of identifying values of an attribute of interest f... more Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. It is an important research topic which has been widely studied in e-Commerce and relation learning. There are two main limitations in existing attribute value extraction methods: scalability and generalizability. Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications. Moreover, very limited research has focused on generalizing extraction to new attributes. In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework. In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context. A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable. A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context. The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework. We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.

Research paper thumbnail of GymCam

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2018

Worn sensors are popular for automatically tracking exercises. However, a wearable is usually att... more Worn sensors are popular for automatically tracking exercises. However, a wearable is usually attached to one part of the body, tracks only that location, and thus is inadequate for capturing a wide range of exercises, especially when other limbs are involved. Cameras, on the other hand, can fully track a user's body, but suffer from noise and occlusion. We present GymCam, a camera-based system for automatically detecting, recognizing and tracking multiple people and exercises simultaneously in unconstrained environments without any user intervention. We collected data in a varsity gym, correctly segmenting exercises from other activities with an accuracy of 84.6%, recognizing the type of exercise at 93.6% accuracy, and counting the number of repetitions to within ± 1.7 on average. GymCam advances the field of real-time exercise tracking by filling some crucial gaps, such as tracking whole body motion, handling occlusion, and enabling single-point sensing for a multitude of users.