Yimeng Deng | National University of Singapore (original) (raw)

Papers by Yimeng Deng

Research paper thumbnail of Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low s... more Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross-modal hashing has made significant progress with high-quality deep features. However, the modal gap is still a crucial bottleneck for existing cross-modal hashing methods: the commonly used convolutional neural network and bag-of-words encoders are customized for single modal prior, limiting the models to learn semantics representation in a cross-modal space. To overcome modality heterogeneity, we propose a shared transformer encoder (UniHash) to unify the cross-modal hashing into the same semantic space. A contrastive label correlation learning (CLC) loss using the category labels as modality bridge is designed together to improve the representation quality. Moreover, we take advantage of the multi-hot label space and propose a negative label generation (NegLG) strategy to get richer and uniformly distributed negative labels for contrast. Extensive experiments on three benchmarks verify the advantage of our proposed method. Besides, the proposed UniHash outperforms state-of-the-art cross-modal hashing methods significantly, establishing a new important baseline for the cross-modal hashing research. Codes are released github.com/idealwhite/Unihash.

Research paper thumbnail of VLDeformer: Vision–Language Decomposed Transformer for fast cross-modal retrieval

Knowledge-Based Systems

Cross-model retrieval has emerged as one of the most important upgrades for text-only search engi... more Cross-model retrieval has emerged as one of the most important upgrades for text-only search engines (SE). Recently, with powerful representation for pairwise textimage inputs via early interaction, the accuracy of visionlanguage (VL) transformers has outperformed existing methods for text-image retrieval. However, when the same paradigm is used for inference, the efficiency of the VL transformers is still too low to be applied in a real crossmodal SE. Inspired by the mechanism of human learning and using cross-modal knowledge, this paper presents a novel Vision-Language Decomposed Transformer (VLDeformer), which greatly increases the efficiency of VL transformers while maintaining their outstanding accuracy. By the proposed method, the cross-model retrieval is separated into two stages: the VL transformer learning stage, and the VL decomposition stage. The latter stage plays the role of single modal indexing, which is to some extent like the term indexing of a text SE. The model learns cross-modal knowledge from early-interaction pre-training and is then decomposed into an individual encoder. The decomposition requires only small target datasets for supervision and achieves both 1000+ times acceleration and less than 0.6% average recall drop. VLDeformer also outperforms stateof-the-art visual-semantic embedding methods on COCO and Flickr30k. 1 1

Research paper thumbnail of A Design Framework For Event Recommendation In Novice Low-Literacy Communities

The proliferation of user-generated content (UGC) results in huge opportunities to explore event ... more The proliferation of user-generated content (UGC) results in huge opportunities to explore event patterns. However, existing event recommendation systems primarily focus on advanced information technology users. Little work has been done to address novice and low-literacy users. The next billion users providing and consuming UGC are likely to include communities from developing countries who are ready to use affordable technologies for subsistence goals. Therefore, we propose a design framework for providing event recommendations to address the needs of such users. Grounded in information integration theory (IIT), our framework advocates that effective event recommendation is supported by systems capable of (1) reliable information gathering through structured user input, (2) accurate sense making through spatial-temporal analytics, and (3) intuitive information dissemination through interactive visualization techniques. A mobile pest management application is developed as an instan...

Research paper thumbnail of Private Communications

Mobile ICT and Knowledge Sharing in Underserved

Research paper thumbnail of How Does Mobile Computing Develop Transactive Memory in Virtual Team? A Social Identification View

The advancement in mobile computing technologies has shown great potential to drive efficiency an... more The advancement in mobile computing technologies has shown great potential to drive efficiency and effectiveness of knowledge work in virtual teams. Despite their ubiquity, theoretical and empirical research investigating the impact of mobile computing artifacts on development of transactive memory in virtual teams is in its infancy. Drawing on the social psychology literature, we propose a social identity based view to understand how the use of mobile computing artifacts is associated with the development of transactive memory system (TMS) in virtual teams. Specially, the use of four categories of mobile computing artifacts (i.e., ubiquitous co-presence, status disclosure, context search, and customized notification) is proposed to enhance social identification, which thereafter promotes TMS development in terms of specialization, credibility, and coordination. This study offers a new perspective on the mechanisms through which mobile computing artifacts facilitate TMS development, and it yields important implications for the design of mobile strategy in organizations

Research paper thumbnail of VLDeformer: Learning Visual-Semantic Embeddings by Vision-Language Transformer Decomposing

ArXiv, 2021

Vision-language transformers (VL transformers) have shown impressive accuracy in cross-modal retr... more Vision-language transformers (VL transformers) have shown impressive accuracy in cross-modal retrieval. However, most of the existing VL transformers use earlyinteraction dataflow that computes a joint representation for the text-image input. In the retrieval stage, such models need to infer on all the matched text-image combinations, which causes high computing costs. The goal of this paper is to decompose the early-interaction dataflow inside the pre-trained VL transformer to achieve acceleration while maintaining its outstanding accuracy. To achieve this, we propose a novel Vision-language Transformer Decomposing (VLDeformer) to modify the VL transformer as an individual encoder for a single image or text through contrastive learning, which accelerates retrieval speed by thousands of times. Meanwhile, we propose to compose bimodal hard negatives for the contrastive learning objective, which enables the VLDeformer to maintain the outstanding accuracy of the backbone VL transformer...

Research paper thumbnail of Knowledge Sharing In Underserved Communities

Organizing principles, exchange relationships, and technology affordance of underserved communiti... more Organizing principles, exchange relationships, and technology affordance of underserved communities in emerging markets are different from privileged communities. This paper investigates knowledge sharing and the use of mobile ICT in a rural farming community in India. Our qualitative field study reveals that value creating and claiming norms are key enablers of knowledge sharing in underserved communities. The findings also identify the importance and challenges of mobile ICT innovations that foster knowledge sharing among dispersed underserved communities. We discuss the implications for theory and suggest a practical guide to enhance knowledge sharing throught mobile ICT innovations in underserved communities.

Research paper thumbnail of Demystifying continuous participation in game applications at social networking sites

Internet Research

Purpose Drawn from the social playfulness literature and the elaboration likelihood model, the pu... more Purpose Drawn from the social playfulness literature and the elaboration likelihood model, the purpose of this paper is to propose and test a research model to examine users’ continuous participation in SNS game applications. Design/methodology/approach A field survey with 133 subjects was conducted to test the research model. Findings Two identified design features, symbolic physicality and inherent sociability, are found to influence users’ perceived curiosity and perceived enjoyment toward playing SNS game applications. Perceived enjoyment is significantly associated with perceived curiosity and predicts users’ continuous participation of SNS game applications. The authors also observed a gender difference of social playfulness design on perceived curiosity. Research limitations/implications Use intention was used as a proxy for actual use behavior, since objective data on continuance behavior was not available. Additionally, the contributions of this study may be constrained by ...

Research paper thumbnail of Information Visualization and Location-Based Services on Mobile Devices

Research paper thumbnail of Task-Technology Fit for Low-Literate Consumers: Implications for IS Innovations in the Developing Regions

Research paper thumbnail of Mobile ICT and Knowledge Sharing in Underserved Communities

Research paper thumbnail of An Exploration of Social Media in Public Opinion Convergence: Elaboration Likelihood and Semantic Networks on Political Events

This study investigated use of social media to enable public opinion convergence in activities wi... more This study investigated use of social media to enable public opinion convergence in activities with wide-scale interaction, such as in political events. Using theoretical foundations in elaboration likelihood, we explored the process of opinion convergence by analyzing Twitter data of Singapore General Election 2011. Our quantitative analyses showed that informative tweets were more effective than affective tweets in opinion convergence,

Research paper thumbnail of Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low s... more Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross-modal hashing has made significant progress with high-quality deep features. However, the modal gap is still a crucial bottleneck for existing cross-modal hashing methods: the commonly used convolutional neural network and bag-of-words encoders are customized for single modal prior, limiting the models to learn semantics representation in a cross-modal space. To overcome modality heterogeneity, we propose a shared transformer encoder (UniHash) to unify the cross-modal hashing into the same semantic space. A contrastive label correlation learning (CLC) loss using the category labels as modality bridge is designed together to improve the representation quality. Moreover, we take advantage of the multi-hot label space and propose a negative label generation (NegLG) strategy to get richer and uniformly distributed negative labels for contrast. Extensive experiments on three benchmarks verify the advantage of our proposed method. Besides, the proposed UniHash outperforms state-of-the-art cross-modal hashing methods significantly, establishing a new important baseline for the cross-modal hashing research. Codes are released github.com/idealwhite/Unihash.

Research paper thumbnail of VLDeformer: Vision–Language Decomposed Transformer for fast cross-modal retrieval

Knowledge-Based Systems

Cross-model retrieval has emerged as one of the most important upgrades for text-only search engi... more Cross-model retrieval has emerged as one of the most important upgrades for text-only search engines (SE). Recently, with powerful representation for pairwise textimage inputs via early interaction, the accuracy of visionlanguage (VL) transformers has outperformed existing methods for text-image retrieval. However, when the same paradigm is used for inference, the efficiency of the VL transformers is still too low to be applied in a real crossmodal SE. Inspired by the mechanism of human learning and using cross-modal knowledge, this paper presents a novel Vision-Language Decomposed Transformer (VLDeformer), which greatly increases the efficiency of VL transformers while maintaining their outstanding accuracy. By the proposed method, the cross-model retrieval is separated into two stages: the VL transformer learning stage, and the VL decomposition stage. The latter stage plays the role of single modal indexing, which is to some extent like the term indexing of a text SE. The model learns cross-modal knowledge from early-interaction pre-training and is then decomposed into an individual encoder. The decomposition requires only small target datasets for supervision and achieves both 1000+ times acceleration and less than 0.6% average recall drop. VLDeformer also outperforms stateof-the-art visual-semantic embedding methods on COCO and Flickr30k. 1 1

Research paper thumbnail of A Design Framework For Event Recommendation In Novice Low-Literacy Communities

The proliferation of user-generated content (UGC) results in huge opportunities to explore event ... more The proliferation of user-generated content (UGC) results in huge opportunities to explore event patterns. However, existing event recommendation systems primarily focus on advanced information technology users. Little work has been done to address novice and low-literacy users. The next billion users providing and consuming UGC are likely to include communities from developing countries who are ready to use affordable technologies for subsistence goals. Therefore, we propose a design framework for providing event recommendations to address the needs of such users. Grounded in information integration theory (IIT), our framework advocates that effective event recommendation is supported by systems capable of (1) reliable information gathering through structured user input, (2) accurate sense making through spatial-temporal analytics, and (3) intuitive information dissemination through interactive visualization techniques. A mobile pest management application is developed as an instan...

Research paper thumbnail of Private Communications

Mobile ICT and Knowledge Sharing in Underserved

Research paper thumbnail of How Does Mobile Computing Develop Transactive Memory in Virtual Team? A Social Identification View

The advancement in mobile computing technologies has shown great potential to drive efficiency an... more The advancement in mobile computing technologies has shown great potential to drive efficiency and effectiveness of knowledge work in virtual teams. Despite their ubiquity, theoretical and empirical research investigating the impact of mobile computing artifacts on development of transactive memory in virtual teams is in its infancy. Drawing on the social psychology literature, we propose a social identity based view to understand how the use of mobile computing artifacts is associated with the development of transactive memory system (TMS) in virtual teams. Specially, the use of four categories of mobile computing artifacts (i.e., ubiquitous co-presence, status disclosure, context search, and customized notification) is proposed to enhance social identification, which thereafter promotes TMS development in terms of specialization, credibility, and coordination. This study offers a new perspective on the mechanisms through which mobile computing artifacts facilitate TMS development, and it yields important implications for the design of mobile strategy in organizations

Research paper thumbnail of VLDeformer: Learning Visual-Semantic Embeddings by Vision-Language Transformer Decomposing

ArXiv, 2021

Vision-language transformers (VL transformers) have shown impressive accuracy in cross-modal retr... more Vision-language transformers (VL transformers) have shown impressive accuracy in cross-modal retrieval. However, most of the existing VL transformers use earlyinteraction dataflow that computes a joint representation for the text-image input. In the retrieval stage, such models need to infer on all the matched text-image combinations, which causes high computing costs. The goal of this paper is to decompose the early-interaction dataflow inside the pre-trained VL transformer to achieve acceleration while maintaining its outstanding accuracy. To achieve this, we propose a novel Vision-language Transformer Decomposing (VLDeformer) to modify the VL transformer as an individual encoder for a single image or text through contrastive learning, which accelerates retrieval speed by thousands of times. Meanwhile, we propose to compose bimodal hard negatives for the contrastive learning objective, which enables the VLDeformer to maintain the outstanding accuracy of the backbone VL transformer...

Research paper thumbnail of Knowledge Sharing In Underserved Communities

Organizing principles, exchange relationships, and technology affordance of underserved communiti... more Organizing principles, exchange relationships, and technology affordance of underserved communities in emerging markets are different from privileged communities. This paper investigates knowledge sharing and the use of mobile ICT in a rural farming community in India. Our qualitative field study reveals that value creating and claiming norms are key enablers of knowledge sharing in underserved communities. The findings also identify the importance and challenges of mobile ICT innovations that foster knowledge sharing among dispersed underserved communities. We discuss the implications for theory and suggest a practical guide to enhance knowledge sharing throught mobile ICT innovations in underserved communities.

Research paper thumbnail of Demystifying continuous participation in game applications at social networking sites

Internet Research

Purpose Drawn from the social playfulness literature and the elaboration likelihood model, the pu... more Purpose Drawn from the social playfulness literature and the elaboration likelihood model, the purpose of this paper is to propose and test a research model to examine users’ continuous participation in SNS game applications. Design/methodology/approach A field survey with 133 subjects was conducted to test the research model. Findings Two identified design features, symbolic physicality and inherent sociability, are found to influence users’ perceived curiosity and perceived enjoyment toward playing SNS game applications. Perceived enjoyment is significantly associated with perceived curiosity and predicts users’ continuous participation of SNS game applications. The authors also observed a gender difference of social playfulness design on perceived curiosity. Research limitations/implications Use intention was used as a proxy for actual use behavior, since objective data on continuance behavior was not available. Additionally, the contributions of this study may be constrained by ...

Research paper thumbnail of Information Visualization and Location-Based Services on Mobile Devices

Research paper thumbnail of Task-Technology Fit for Low-Literate Consumers: Implications for IS Innovations in the Developing Regions

Research paper thumbnail of Mobile ICT and Knowledge Sharing in Underserved Communities

Research paper thumbnail of An Exploration of Social Media in Public Opinion Convergence: Elaboration Likelihood and Semantic Networks on Political Events

This study investigated use of social media to enable public opinion convergence in activities wi... more This study investigated use of social media to enable public opinion convergence in activities with wide-scale interaction, such as in political events. Using theoretical foundations in elaboration likelihood, we explored the process of opinion convergence by analyzing Twitter data of Singapore General Election 2011. Our quantitative analyses showed that informative tweets were more effective than affective tweets in opinion convergence,