Huy Vu - Academia.edu (original) (raw)

Papers by Huy Vu

M237I sample/ forward reads/ part 1 (please join with TGACCA1_part2.fq for a complete sample). Re... more M237I sample/ forward reads/ part 1 (please join with TGACCA1_part2.fq for a complete sample). Represents the unprocessed p53 core domain that contains the cancer mutation M237I but no other introduced mutations. This sample was prepared to analyze and control the baseline error rates associated with the procedure, as well as the intrinsic NGS sequencing error rates

M237I_RESCUE sample/ reverse reads. Identical to M237I_ACS sample except that transformants were ... more M237I_RESCUE sample/ reverse reads. Identical to M237I_ACS sample except that transformants were selected for active p53 by culturing the cells in media lacking uracil, thus requiring active p53 for growth

M237I_RESCUE sample/ forward reads. Identical to M237I_ACS sample except that transformants were ... more M237I_RESCUE sample/ forward reads. Identical to M237I_ACS sample except that transformants were selected for active p53 by culturing the cells in media lacking uracil, thus requiring active p53 for growth

M237I_ACS sample/ reverse reads/ part 2 (please join with ACAGTG2_part1.fq for a complete sample)... more M237I_ACS sample/ reverse reads/ part 2 (please join with ACAGTG2_part1.fq for a complete sample). Corresponds to ACS mutagenesis performed on the p53 core domain containing the cancer mutant M237I. No selective pressure for p53 activity was applied. This sample controls for, and allows analysis of, the diversity of the ACS library

The CHOPER filtering implementation in python

M237I sample/ reverse reads/ part 2 (please join with TGACCA2_part1.fq for a complete sample). Re... more M237I sample/ reverse reads/ part 2 (please join with TGACCA2_part1.fq for a complete sample). Represents the unprocessed p53 core domain that contains the cancer mutation M237I but no other introduced mutations. This sample was prepared to analyze and control the baseline error rates associated with the procedure, as well as the intrinsic NGS sequencing error rates

M237I sample/ reverse reads/ part 1 (please join with TGACCA2_part1.fq for a complete sample). Re... more M237I sample/ reverse reads/ part 1 (please join with TGACCA2_part1.fq for a complete sample). Represents the unprocessed p53 core domain that contains the cancer mutation M237I but no other introduced mutations. This sample was prepared to analyze and control the baseline error rates associated with the procedure, as well as the intrinsic NGS sequencing error rates

Proceedings of the Seventh Symposium on Information and Communication Technology, 2016

Geographic routing is well suited for large-scale wireless sensor networks (WSNs) because of its ... more Geographic routing is well suited for large-scale wireless sensor networks (WSNs) because of its simplicity and scalability. With the occurrence of routing holes, however, geographic routing suffers from the so-called local minimum phenomenon and the issue of traffic concentrating on the hole boundary. Several recent proposals attempt to fix these issues by deploying a special, keep-away area around the hole, which helps to improve the congestion on the hole boundary but they still are deficient if the source or destination is close to the hole. We propose a novel approach to target this problem of routing in a hole proximity while ensure both two main requirements in energy efficiency and load balancing. Our simulation experiments show that our proposed routing scheme strongly outperforms previous approaches considering routing in a hole proximity, especially in energy efficiency and load balancing.

Annotation of a gene is the process of defining its biological functions and using this knowledge... more Annotation of a gene is the process of defining its biological functions and using this knowledge in studying relevant gene products that are encoded by genomes. This process includes extraction, definition, and interpretation of features on the genome sequence derived by integrating computational tools and biological knowledge. It is notable that many annotations describe features that constitute a gene, but other annotations do not, e.g., an STS, or sequence overlap. Annotation of human genome is a central problem in bioinformatics of this century due to the demands from other areas, such as medicine. Lander has reviewed the development of sequencing human genome and its applications [Lan11]. There are currently a number of approaches and efficient algorithms for annotation of human genome, such as the ones in the projects ENCODE [Con04], modeENCODE [CSB+11], and 1000 genomes [Con10]. The ENCODE project provides a comprehensive picture of functional elements in 44 human genomes, w...

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

While deep learning (DL) architectures like convolutional neural networks (CNNs) have enabled eff... more While deep learning (DL) architectures like convolutional neural networks (CNNs) have enabled effective solutions in image denoising, in general their implementations overly rely on training data, lack interpretability, and require tuning of a large parameter set. In this paper, we combine classical graph signal filtering with deep feature learning into a competitive hybrid design-one that utilizes interpretable analytical low-pass graph filters and employs 80% fewer network parameters than state-of-the-art DL denoising scheme DnCNN. Specifically, to construct a suitable similarity graph for graph spectral filtering, we first adopt a CNN to learn feature representations per pixel, and then compute feature distances to establish edge weights. Given a constructed graph, we next formulate a convex optimization problem for denoising using a graph total variation (GTV) prior. Via a l1 graph Laplacian reformulation, we interpret its solution in an iterative procedure as a graph low-pass filter and derive its frequency response. For fast filter implementation, we realize this response using a Lanczos approximation. Experimental results show that in the case of statistical mistmatch, our algorithm outperformed DnCNN by up to 3dB in PSNR.

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2019

Bitcoin was launched in 2009, becoming the world’s first ever decentralized digital currency. It ... more Bitcoin was launched in 2009, becoming the world’s first ever decentralized digital currency. It uses a publicly distributed ledger called the blockchain to record the transaction history of the network. The Bitcoin network is structured as a decentralized peer-to-peer network, where there are no central or supernodes, and all peers are seen as equal. Nodes in the network do not have a complete view of the entire network and are only aware of the nodes that they are directly connected to. In order to propagate information across the network, Bitcoin implements a gossip-based flooding protocol. However, the current flooding protocol is inefficient and wasteful, producing a number of redundant and duplicated messages. In this paper, we present an alternative approach to the current flooding protocol implemented by Bitcoin. We propose a novel protocol that changes the current flooding protocol to a probabilistic flooding approach. Our approach allows nodes to maintain certain probabili...

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the num... more In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in humanlevel tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just 1 12 of the embedding dimensions.

Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Psychologists routinely assess people's emotions and traits, such as their personality, by collec... more Psychologists routinely assess people's emotions and traits, such as their personality, by collecting their responses to survey questionnaires. Such assessments can be costly in terms of both time and money, and often lack generalizability, as existing data cannot be used to predict responses for new survey questions or participants. In this study, we propose a method for predicting a participant's questionnaire response using their social media texts and the text of the survey question they are asked. Specifically, we use Natural Language Processing (NLP) tools such as BERT embeddings to represent both participants (via the text they write) and survey questions as embeddings vectors, allowing us to predict responses for out-of-sample participants and questions. Our novel approach can be used by researchers to integrate new participants or new questions into psychological studies without the constraint of costly data collection, facilitating novel practical applications and furthering the development of psychological theory. Finally, as a side contribution, the success of our model also suggests a new approach to study survey questions using NLP tools such as text embeddings rather than response data used in traditional methods.

Modern Technologies and Scientific and Technological Progress, 2019

INTERACTIVE VOICE RESPONSE SYSTEM Аннотация. Статья посвящена разработке интерактивной системы ре... more INTERACTIVE VOICE RESPONSE SYSTEM Аннотация. Статья посвящена разработке интерактивной системы речевого взаимодействия, главное предназначение которой заключается в мониторинге и управлении удаленным объектом. Ключевые слова: интерактивная система речевого взаимодействия, мониторинг и управление удаленным объектом, алгоритм Герцеля, голосовое меню, микроконтроллер.

Data, 2019

Hydrologic soil groups play an important role in the determination of surface runoff, which, in t... more Hydrologic soil groups play an important role in the determination of surface runoff, which, in turn, is crucial for soil and water conservation efforts. Traditionally, placement of soil into appropriate hydrologic groups is based on the judgement of soil scientists, primarily relying on their interpretation of guidelines published by regional or national agencies. As a result, large-scale mapping of hydrologic soil groups results in widespread inconsistencies and inaccuracies. This paper presents an application of machine learning for classification of soil into hydrologic groups. Based on features such as percentages of sand, silt and clay, and the value of saturated hydraulic conductivity, machine learning models were trained to classify soil into four hydrologic groups. The results of the classification obtained using algorithms such as k-Nearest Neighbors, Support Vector Machine with Gaussian Kernel, Decision Trees, Classification Bagged Ensembles and TreeBagger (Random Forest)...

Journal of Network and Computer Applications, 2019

Guaranteeing sufficient sensing coverage is of prime importance in wireless sensor networks. Unfo... more Guaranteeing sufficient sensing coverage is of prime importance in wireless sensor networks. Unfortunately, due to many reasons such as natural disruptions, adversarial attacks, or energy depletion, the occurrence of coverage holes is unavoidable. In order to assure the quality of service, coverage holes should be patched (i.e. by deploying new sensors) as soon as they appear. The solutions in state-of-the-art protocols still incur time complexity and energy overhead that increase with the size of coverage holes. To avoid that issue, this paper introduces a novel protocol (namely, TELPAC) which efficiently locate the hole boundary and determine the patching locations. The main idea behind TELPAC is to approximate the hole by a polygon whose edges are aligned by a regular triangle lattice. Based on such approximation, the patching locations are then detected by using a regular hexagon tessellation. We theoretically prove that TELPAC can detect all coverage holes in the network. The simulation results show that the number of patching locations required by TELPAC is one of the smallest. Moreover, TELPAC can reduce more than 50% of the time consumed and energy overhead in comparison with the existing protocols.

Computer Networks, 2017

Geographic routing has been widely used in wireless sensor networks because of its simplicity and... more Geographic routing has been widely used in wireless sensor networks because of its simplicity and efficiency resulting from its local and stateless nature. However, when subjected to routing holes (i.e., regions without sensor nodes that have communication capability), geographic routing suffers from the so-called local minimum phenomenon , where packets are stopped at the hole boundary. This local minimum phenomenon results in problems of load imbalance (i.e., a higher traffic intensity around the hole boundary) and routing path enlargement due to the long hole detour paths. Although several protocols have been proposed to address these issues, the load imbalance problem has not been solved thoroughly, and none of the existing protocols can solve both of these problems. In this article, we propose a distributed hole-bypassing routing protocol named ACOBA (A daptive forbidden area-based CO nstant stretch and load BA lancing), which can solve the load imbalance problem thoroughly while ensuring the constant stretch property of the routing path. Our theoretical analysis proves that the routing path stretch of the proposed protocol can be controlled to be as small as 1 + (for any predefined > 0), and the simulation experiments show that our protocol strongly outperforms state-of-the-art protocols in terms of load balancing.

Http Www Theses Fr, 2006

Cette these concerne l'etude de la testabilite des conceptions flot de donnees des systemes r... more Cette these concerne l'etude de la testabilite des conceptions flot de donnees des systemes reactifs developpes a l'aide de deux environ developpement SCADE et SIMULINK. La testabilite, utilisee pour estimer de maniere predictive la facilite a tester un systeme, est evaluee par deux mesures : la controlabilite et l'observabilite. Nous utilisons la technologie SATAN basee sur la theorie de l'information pour modeliser le transfert d'information dans le systeme. Les mesures de testabilite sont calculees a partir de la perte d'information dans le schema d'operateurs, ou chacun des operateurs contribuent a cette perte d'information. La perte d'information d'un operateur est calculee soit exhaustivement sur la base de la "table de verite" de la fonction de l'operateur, soit de maniere statistique via les resultats de simulation de l'operateur. Notre approche a ete integree dans un outil permettant une analyse automatique de testabilite

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2010

We investigate a higher-order query language that embeds operators of the positive relational alg... more We investigate a higher-order query language that embeds operators of the positive relational algebra within the simply-typed λ-calculus. Our language allows one to succinctly define ordinary positive relational algebra queries (conjunctive queries and unions of conjunctive queries) and, in addition, second-order query functionals, which allow the transformation of CQs and UCQs in a generic (i.e., syntaxindependent) way. We investigate the equivalence and containment problems for this calculus, which subsumes traditional CQ/UCQ containment. Query functionals are said to be equivalent if the output queries are equivalent, for each possible input query, and similarly for containment. These notions of containment and equivalence depend on the class of (ordinary relational algebra) queries considered. We show that containment and equivalence are decidable when query variables are restricted to positive relational algebra and we identify the precise complexity of the problem. We also identify classes of functionals where containment is tractable. Finally, we provide upper bounds to the complexity of the containment problem when functionals act over other classes.

Tourism Management, 2015

h i g h l i g h t s A novel approach to travel behavior analysis is introduced, based on geotagge... more h i g h l i g h t s A novel approach to travel behavior analysis is introduced, based on geotagged photos. A dataset comprises photos on Flickr by Hong Kong inbound tourists is built. The spatial and temporal information of photos infers tourists' movement trajectories. The advantage of the approach is shown by analyzing inbound tourist travel behavior. The study benefits destination development, transportation planning, and impact management.