Ali Bashir - Academia.edu (original) (raw)

Papers by Ali Bashir

Research paper thumbnail of Phylogenetic Inference

Research paper thumbnail of Machine learning guided aptamer refinement and discovery

Nature Communications, 2021

Aptamers are single-stranded nucleic acid ligands that bind to target molecules with high affinit... more Aptamers are single-stranded nucleic acid ligands that bind to target molecules with high affinity and specificity. They are typically discovered by searching large libraries for sequences with desirable binding properties. These libraries, however, are practically constrained to a fraction of the theoretical sequence space. Machine learning provides an opportunity to intelligently navigate this space to identify high-performing aptamers. Here, we propose an approach that employs particle display (PD) to partition a library of aptamers by affinity, and uses such data to train machine learning models to predict affinity in silico. Our model predicted high-affinity DNA aptamers from experimental candidates at a rate 11-fold higher than random perturbation and generated novel, high-affinity aptamers at a greater rate than observed by PD alone. Our approach also facilitated the design of truncated aptamers 70% shorter and with higher binding affinity (1.5 nM) than the best experimental ...

Research paper thumbnail of Author Correction: A robust benchmark for detection of germline large deletions and insertions

Nature Biotechnology, 2020

In the version of this article initially published online, orange and black were switched in the ... more In the version of this article initially published online, orange and black were switched in the legend to Fig. 5. The error has been corrected in the print, PDF and HTML versions of the article.

Research paper thumbnail of Deep diversification of an AAV capsid protein by machine learning

Nature Biotechnology, 2021

Research paper thumbnail of Guest Editorial: Advanced ICT and IoT Technologies for the Fourth Industrial Revolution

Intelligent Automation and Soft Computing, 2019

/agrariacad Metabolic profile and renal function of lambs fed with maniçoba hay replacement by sp... more /agrariacad Metabolic profile and renal function of lambs fed with maniçoba hay replacement by spineless cactus. Perfil metabólico e função renal de ovinos alimentados com palma forrageira em substituição ao feno de maniçoba

Research paper thumbnail of PathogenDB: A Modular Software Suite Integrating Genomic Clinical Microbiology and Epidemiology

Open Forum Infectious Diseases, 2016

Background. Next-generation sequencing (NGS) technologies have reduced the cost of acquiring geno... more Background. Next-generation sequencing (NGS) technologies have reduced the cost of acquiring genomic data from active infections in hospitals, with the potential to rapidly characterize patient-to-patient transmission with extreme precision. A barrier to widespread adoption of NGS in clinical microbiology is a lack of easy-to-use software for converting these data into species identifications, phylogenies, and drug susceptibilities. A clinical application should ideally provide a unified pipeline that could be deployed at a clinical microbiology lab, running semi-automated analyses that inform infection control interventions. Methods. We developed a modular open-source software suite called PathogenDB that implements major functionalities needed for genomic clinical microbiology and pathogen surveillance. A central laboratory information management system runs on a standard open-source Linux/Apache/MySQL/PHP stack. A modular genomics workflow, PathogenDB-pipeline, was publicly released in 2014. It automates de novo assembly of reads with HGAP, circularizes contigs with Circlator, annotates genes with Prokka, and predicts epigenetic motifs. The pipeline also post-processes assemblies to evaluate quality and provide visualizations using a custom genome browser (Chro-moZoom). A comparative genomics module, PathogenDB-comparison, performs semi-automated phylogenetic analysis with Mugsy and RAxML. Results. PathogenDB-pipeline has been used to assemble and annotate 232 genomes from 7 species, and runs in <12 hours end-to-end. At an urban tertiary-care hospital, PathogenDB-comparison has genomically characterized one MRSA outbreak, two transmissions via solid organ transplant, and pseudo-outbreaks of S. maltophilia and B. cepacia. Both software packages are freely available on GitHub. Conclusion. We have created modular, open-source software that automates significant portions of a genomic clinical microbiology workflow and can characterize transmissions within an outbreak. Further work could add visualizations based on epidemiological trend data and geospatial analysis, allowing rapid, unprecedented insight into transmission events and potential outbreaks occurring within a NGS-equipped hospital. Disclosures. All authors: No reported disclosures.

Research paper thumbnail of Impact of HCV core gene quasispecies on hepatocellular carcinoma risk among HALT-C trial patients

Scientific Reports, 2016

Mutations at positions 70 and/or 91 in the core protein of genotype-1b, hepatitis C virus (HCV) a... more Mutations at positions 70 and/or 91 in the core protein of genotype-1b, hepatitis C virus (HCV) are associated with hepatocellular carcinoma (HCC) risk in Asian patients. To evaluate this in a US population, the relationship between the percentage of 70 and/or 91 mutant HCV quasispecies in baseline serum samples of chronic HCV patients from the HALT-C trial and the incidence of HCC was determined by deep sequencing. Quasispecies percentage cut-points, ≥42% of non-arginine at 70 (non-R 70) or ≥98.5% of non-leucine at 91 (non-L 91) had optimal sensitivity at discerning higher or lower HCC risk. In baseline samples, 88.5% of chronic HCV patients who later developed HCC and 68.8% of matched HCC-free control patients had ≥42% non-R 70 quasispecies (P = 0.06). Furthermore, 30.8% of patients who developed HCC and 54.7% of matched HCC-free patients had quasispecies with ≥98.5% non-L 91 (P = 0.06). By Kaplan-Meier analysis, HCC incidence was higher, but not statistically significant, among patients with quasispecies ≥42% non-R 70 (P = 0.08), while HCC incidence was significantly reduced among patients with quasispecies ≥98.5% non-L 91 (P = 0.01). In a Cox regression model, non-R 70 ≥42% was associated with increased HCC risk. This study of US patients indicates the potential utility of HCV quasispecies analysis as a non-invasive biomarker of HCC risk.

Research paper thumbnail of Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

Microbiome, Jan 5, 2015

High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into... more High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq...

Research paper thumbnail of Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia

Antimicrobial agents and chemotherapy, Jan 31, 2015

Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient... more Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

Research paper thumbnail of Detecting epigenetic motifs in low coverage and metagenomics settings

BMC Bioinformatics, 2014

Background: It has recently become possible to rapidly and accurately detect epigenetic signature... more Background: It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes. Methods: Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with lowcoverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood. Conclusions: Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.

Research paper thumbnail of Resolving complex tandem repeats with long reads

Bioinformatics, 2014

Motivation: Resolving tandemly repeated genomic sequences is a necessary step in improving our un... more Motivation: Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington's diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs. Results: Here we present PACMONSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations.

Research paper thumbnail of Characterization of structural variants with single molecule and hybrid sequencing approaches

Research paper thumbnail of Hepatitis C virus genetics affects miR-122 requirements and response to miR-122 inhibitors

Nature communications, Jan 18, 2014

Hepatitis C virus (HCV) replication is dependent on a liver-specific microRNA (miRNA), miR-122. A... more Hepatitis C virus (HCV) replication is dependent on a liver-specific microRNA (miRNA), miR-122. A recent clinical trial reported that transient inhibition of miR-122 reduced viral titres in HCV-infected patients. Here we set out to better understand how miR-122 inhibition influences HCV replication over time. Unexpectedly, we observed the emergence of an HCV variant that is resistant to miR-122 knockdown. Next-generation sequencing revealed that this was due to a single nucleotide change at position 28 (G28A) of the HCV genome, which falls between the two miR-122 seed-binding sites. Naturally occurring HCV isolates encoding G28A are similarly resistant to miR-122 inhibition, indicating that subtle differences in viral sequence, even outside the seed-binding site, greatly influence HCV's miR-122 concentration requirement. In addition, we found that HCV itself reduces miR-122's activity in the cell, possibly through binding and sequestering miR-122. Our study provides insight ...

Research paper thumbnail of How do students react to analyzing their own genomes in a whole-genome sequencing course?: outcomes of a longitudinal cohort study

Genetics in Medicine, 2015

Results: All students (n = 19) opted to analyze their own genomes. At T5, 12 of 15 students state... more Results: All students (n = 19) opted to analyze their own genomes. At T5, 12 of 15 students stated that analyzing their own genomes had been useful. Ten reported they had applied their knowledge in the workplace. Technical WGS knowledge increased (mean of 63.8% at T3, mean of 72.5% at T4; P = 0.005). In-depth interviews suggested that analyzing their own genomes may increase students' motivation to learn and their understanding of the patient experience. Most (but not all) of the students reported low levels of WGS results-related distress and low levels of regret about their decision to analyze their own genomes. Conclusion: Giving students the option of analyzing their own genomes may increase motivation to learn, but some students may experience personal WGS results-related distress and regret. Additional evidence is required before considering incorporating optional personal genome analysis into medical education on a large scale.

Research paper thumbnail of A sequence-based survey of the complex structural organization of tumor genomes

Research paper thumbnail of A Context-Aware Service Discovery Consideration in 6LoWPAN

2008 Third International Conference on Convergence and Hybrid Information Technology, 2008

To make the revelation of ubiquity true different kind of network's integration is going on and t... more To make the revelation of ubiquity true different kind of network's integration is going on and this assimilation makes service discovery more challenging because different types of service discovery have been develop for the particular network. A Context-Aware Service Discovery for these amalgam networks is different from usual, especially when we talk about 6LoWPANs functioning with IP Networks, as 6LoWPAN are characterized by low power and low bandwidth, short range and low cost that make the challenge hard. Moreover Interworking of 6LoWPAN with IP networks brings in many challenges for Context-Aware based service discovery issues. In this paper we suggest an advanced service discovery architecture and mechanism that assist proximity and Context Aware based service discovery in IP network and LoWPAN interworked environment. The results show that our architecture helps to discover the closest and exact services from inside as well as outside the LoWPAN according to user requirement. It also reduces the traffic overhead for service discovery considerably.

Research paper thumbnail of Energy Efficient In-network RFID Data Filtering Scheme in Wireless Sensor Networks

Sensors, 2011

RFID (Radio frequency identification) and wireless sensor networks are backbone technologies for ... more RFID (Radio frequency identification) and wireless sensor networks are backbone technologies for pervasive environments. In integration of RFID and WSN, RFID data uses WSN protocols for multi-hop communications. Energy is a critical issue in WSNs; however, RFID data contains a lot of duplication. These duplications can be eliminated at the base station, but unnecessary transmissions of duplicate data within the network still occurs, which consumes nodes' energy and affects network lifetime. In this paper, we propose an in-network RFID data filtering scheme that efficiently eliminates the duplicate data. For this we use a clustering mechanism where cluster heads eliminate duplicate data and forward filtered data towards the base station. Simulation results prove that our approach saves considerable amounts of energy in terms of communication and computational cost, compared to existing filtering schemes.

Research paper thumbnail of Mobile RFID and its design security issues

IEEE Potentials, 2011

Radio Frequency Identification (RFID) is a technology that automatically identifies the objects i... more Radio Frequency Identification (RFID) is a technology that automatically identifies the objects in its vicinity by incorporating readers, tags and backend servers all together, forming a system that has been proved to be a versatile system having applications in many areas, a lot has been revealed and rest is yet to be explored. As RFID is that new paradigm which is very vulnerable to unauthorized attacks thus strict security measures have to be adopted. Laws have been made whose violation may result in total chaos only. The system using RFID needs proper and efficient scanning. RFID technology is mainly evidently foreseeable due to low cost RFID tags but sometimes cost is paid in the form of compromised privacy. Within less than a decade, quite a number of research papers that happens to deal the security issues RFID technology is facing, have appeared. In this paper we attempt to summarize current research works that has been done in the area of RFID security right from where it started and some of their open issues are also discussed. The paper is finally concluded with some suggestions for future work. Firstly, we outline some of the research work done so far, security risks faced to RFID and then we review some of the major applications of RFID.

Research paper thumbnail of Famine in Somalia: Evidence for a declaration

Global Food Security, 2012

Nations declared famine in parts of Somalia. Here, we report the methods, data and analysis that ... more Nations declared famine in parts of Somalia. Here, we report the methods, data and analysis that underpinned this declaration along with the review of trends in mortality and malnutrition. Methods: During July 2011, 16 population-based nutrition and mortality surveys were conducted in southern Somalia. Data on food access, collected through seasonal assessments and monthly monitoring, were analyzed using Household Economy methods. Results: In 11 of 16 survey locations, the prevalence of Global Acute Malnutrition exceeded the Integrated Food Security Phase Classification threshold for Phase 5 (Famine) of 30%. In five areas, Crude Death Rates exceeded the Integrated Food Security Phase Classification Phase 5 (Famine) threshold of 2/10,000/day. In agro-pastoral zones of the south, where access was most limited, more than 20% of households faced extreme food shortages. Comment: Survey findings and analysis confirm that a famine occurred in parts of southern Somalia during 2011 and raise the question of why strong early warning analysis did not trigger an earlier, better funded and more effective, response.

Research paper thumbnail of A hybrid approach for the automated finishing of bacterial genomes

Nature Biotechnology, 2012

Research paper thumbnail of Phylogenetic Inference

Research paper thumbnail of Machine learning guided aptamer refinement and discovery

Nature Communications, 2021

Aptamers are single-stranded nucleic acid ligands that bind to target molecules with high affinit... more Aptamers are single-stranded nucleic acid ligands that bind to target molecules with high affinity and specificity. They are typically discovered by searching large libraries for sequences with desirable binding properties. These libraries, however, are practically constrained to a fraction of the theoretical sequence space. Machine learning provides an opportunity to intelligently navigate this space to identify high-performing aptamers. Here, we propose an approach that employs particle display (PD) to partition a library of aptamers by affinity, and uses such data to train machine learning models to predict affinity in silico. Our model predicted high-affinity DNA aptamers from experimental candidates at a rate 11-fold higher than random perturbation and generated novel, high-affinity aptamers at a greater rate than observed by PD alone. Our approach also facilitated the design of truncated aptamers 70% shorter and with higher binding affinity (1.5 nM) than the best experimental ...

Research paper thumbnail of Author Correction: A robust benchmark for detection of germline large deletions and insertions

Nature Biotechnology, 2020

In the version of this article initially published online, orange and black were switched in the ... more In the version of this article initially published online, orange and black were switched in the legend to Fig. 5. The error has been corrected in the print, PDF and HTML versions of the article.

Research paper thumbnail of Deep diversification of an AAV capsid protein by machine learning

Nature Biotechnology, 2021

Research paper thumbnail of Guest Editorial: Advanced ICT and IoT Technologies for the Fourth Industrial Revolution

Intelligent Automation and Soft Computing, 2019

/agrariacad Metabolic profile and renal function of lambs fed with maniçoba hay replacement by sp... more /agrariacad Metabolic profile and renal function of lambs fed with maniçoba hay replacement by spineless cactus. Perfil metabólico e função renal de ovinos alimentados com palma forrageira em substituição ao feno de maniçoba

Research paper thumbnail of PathogenDB: A Modular Software Suite Integrating Genomic Clinical Microbiology and Epidemiology

Open Forum Infectious Diseases, 2016

Background. Next-generation sequencing (NGS) technologies have reduced the cost of acquiring geno... more Background. Next-generation sequencing (NGS) technologies have reduced the cost of acquiring genomic data from active infections in hospitals, with the potential to rapidly characterize patient-to-patient transmission with extreme precision. A barrier to widespread adoption of NGS in clinical microbiology is a lack of easy-to-use software for converting these data into species identifications, phylogenies, and drug susceptibilities. A clinical application should ideally provide a unified pipeline that could be deployed at a clinical microbiology lab, running semi-automated analyses that inform infection control interventions. Methods. We developed a modular open-source software suite called PathogenDB that implements major functionalities needed for genomic clinical microbiology and pathogen surveillance. A central laboratory information management system runs on a standard open-source Linux/Apache/MySQL/PHP stack. A modular genomics workflow, PathogenDB-pipeline, was publicly released in 2014. It automates de novo assembly of reads with HGAP, circularizes contigs with Circlator, annotates genes with Prokka, and predicts epigenetic motifs. The pipeline also post-processes assemblies to evaluate quality and provide visualizations using a custom genome browser (Chro-moZoom). A comparative genomics module, PathogenDB-comparison, performs semi-automated phylogenetic analysis with Mugsy and RAxML. Results. PathogenDB-pipeline has been used to assemble and annotate 232 genomes from 7 species, and runs in <12 hours end-to-end. At an urban tertiary-care hospital, PathogenDB-comparison has genomically characterized one MRSA outbreak, two transmissions via solid organ transplant, and pseudo-outbreaks of S. maltophilia and B. cepacia. Both software packages are freely available on GitHub. Conclusion. We have created modular, open-source software that automates significant portions of a genomic clinical microbiology workflow and can characterize transmissions within an outbreak. Further work could add visualizations based on epidemiological trend data and geospatial analysis, allowing rapid, unprecedented insight into transmission events and potential outbreaks occurring within a NGS-equipped hospital. Disclosures. All authors: No reported disclosures.

Research paper thumbnail of Impact of HCV core gene quasispecies on hepatocellular carcinoma risk among HALT-C trial patients

Scientific Reports, 2016

Mutations at positions 70 and/or 91 in the core protein of genotype-1b, hepatitis C virus (HCV) a... more Mutations at positions 70 and/or 91 in the core protein of genotype-1b, hepatitis C virus (HCV) are associated with hepatocellular carcinoma (HCC) risk in Asian patients. To evaluate this in a US population, the relationship between the percentage of 70 and/or 91 mutant HCV quasispecies in baseline serum samples of chronic HCV patients from the HALT-C trial and the incidence of HCC was determined by deep sequencing. Quasispecies percentage cut-points, ≥42% of non-arginine at 70 (non-R 70) or ≥98.5% of non-leucine at 91 (non-L 91) had optimal sensitivity at discerning higher or lower HCC risk. In baseline samples, 88.5% of chronic HCV patients who later developed HCC and 68.8% of matched HCC-free control patients had ≥42% non-R 70 quasispecies (P = 0.06). Furthermore, 30.8% of patients who developed HCC and 54.7% of matched HCC-free patients had quasispecies with ≥98.5% non-L 91 (P = 0.06). By Kaplan-Meier analysis, HCC incidence was higher, but not statistically significant, among patients with quasispecies ≥42% non-R 70 (P = 0.08), while HCC incidence was significantly reduced among patients with quasispecies ≥98.5% non-L 91 (P = 0.01). In a Cox regression model, non-R 70 ≥42% was associated with increased HCC risk. This study of US patients indicates the potential utility of HCV quasispecies analysis as a non-invasive biomarker of HCC risk.

Research paper thumbnail of Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

Microbiome, Jan 5, 2015

High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into... more High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq...

Research paper thumbnail of Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia

Antimicrobial agents and chemotherapy, Jan 31, 2015

Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient... more Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

Research paper thumbnail of Detecting epigenetic motifs in low coverage and metagenomics settings

BMC Bioinformatics, 2014

Background: It has recently become possible to rapidly and accurately detect epigenetic signature... more Background: It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes. Methods: Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with lowcoverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood. Conclusions: Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.

Research paper thumbnail of Resolving complex tandem repeats with long reads

Bioinformatics, 2014

Motivation: Resolving tandemly repeated genomic sequences is a necessary step in improving our un... more Motivation: Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington's diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs. Results: Here we present PACMONSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations.

Research paper thumbnail of Characterization of structural variants with single molecule and hybrid sequencing approaches

Research paper thumbnail of Hepatitis C virus genetics affects miR-122 requirements and response to miR-122 inhibitors

Nature communications, Jan 18, 2014

Hepatitis C virus (HCV) replication is dependent on a liver-specific microRNA (miRNA), miR-122. A... more Hepatitis C virus (HCV) replication is dependent on a liver-specific microRNA (miRNA), miR-122. A recent clinical trial reported that transient inhibition of miR-122 reduced viral titres in HCV-infected patients. Here we set out to better understand how miR-122 inhibition influences HCV replication over time. Unexpectedly, we observed the emergence of an HCV variant that is resistant to miR-122 knockdown. Next-generation sequencing revealed that this was due to a single nucleotide change at position 28 (G28A) of the HCV genome, which falls between the two miR-122 seed-binding sites. Naturally occurring HCV isolates encoding G28A are similarly resistant to miR-122 inhibition, indicating that subtle differences in viral sequence, even outside the seed-binding site, greatly influence HCV's miR-122 concentration requirement. In addition, we found that HCV itself reduces miR-122's activity in the cell, possibly through binding and sequestering miR-122. Our study provides insight ...

Research paper thumbnail of How do students react to analyzing their own genomes in a whole-genome sequencing course?: outcomes of a longitudinal cohort study

Genetics in Medicine, 2015

Results: All students (n = 19) opted to analyze their own genomes. At T5, 12 of 15 students state... more Results: All students (n = 19) opted to analyze their own genomes. At T5, 12 of 15 students stated that analyzing their own genomes had been useful. Ten reported they had applied their knowledge in the workplace. Technical WGS knowledge increased (mean of 63.8% at T3, mean of 72.5% at T4; P = 0.005). In-depth interviews suggested that analyzing their own genomes may increase students' motivation to learn and their understanding of the patient experience. Most (but not all) of the students reported low levels of WGS results-related distress and low levels of regret about their decision to analyze their own genomes. Conclusion: Giving students the option of analyzing their own genomes may increase motivation to learn, but some students may experience personal WGS results-related distress and regret. Additional evidence is required before considering incorporating optional personal genome analysis into medical education on a large scale.

Research paper thumbnail of A sequence-based survey of the complex structural organization of tumor genomes

Research paper thumbnail of A Context-Aware Service Discovery Consideration in 6LoWPAN

2008 Third International Conference on Convergence and Hybrid Information Technology, 2008

To make the revelation of ubiquity true different kind of network's integration is going on and t... more To make the revelation of ubiquity true different kind of network's integration is going on and this assimilation makes service discovery more challenging because different types of service discovery have been develop for the particular network. A Context-Aware Service Discovery for these amalgam networks is different from usual, especially when we talk about 6LoWPANs functioning with IP Networks, as 6LoWPAN are characterized by low power and low bandwidth, short range and low cost that make the challenge hard. Moreover Interworking of 6LoWPAN with IP networks brings in many challenges for Context-Aware based service discovery issues. In this paper we suggest an advanced service discovery architecture and mechanism that assist proximity and Context Aware based service discovery in IP network and LoWPAN interworked environment. The results show that our architecture helps to discover the closest and exact services from inside as well as outside the LoWPAN according to user requirement. It also reduces the traffic overhead for service discovery considerably.

Research paper thumbnail of Energy Efficient In-network RFID Data Filtering Scheme in Wireless Sensor Networks

Sensors, 2011

RFID (Radio frequency identification) and wireless sensor networks are backbone technologies for ... more RFID (Radio frequency identification) and wireless sensor networks are backbone technologies for pervasive environments. In integration of RFID and WSN, RFID data uses WSN protocols for multi-hop communications. Energy is a critical issue in WSNs; however, RFID data contains a lot of duplication. These duplications can be eliminated at the base station, but unnecessary transmissions of duplicate data within the network still occurs, which consumes nodes' energy and affects network lifetime. In this paper, we propose an in-network RFID data filtering scheme that efficiently eliminates the duplicate data. For this we use a clustering mechanism where cluster heads eliminate duplicate data and forward filtered data towards the base station. Simulation results prove that our approach saves considerable amounts of energy in terms of communication and computational cost, compared to existing filtering schemes.

Research paper thumbnail of Mobile RFID and its design security issues

IEEE Potentials, 2011

Radio Frequency Identification (RFID) is a technology that automatically identifies the objects i... more Radio Frequency Identification (RFID) is a technology that automatically identifies the objects in its vicinity by incorporating readers, tags and backend servers all together, forming a system that has been proved to be a versatile system having applications in many areas, a lot has been revealed and rest is yet to be explored. As RFID is that new paradigm which is very vulnerable to unauthorized attacks thus strict security measures have to be adopted. Laws have been made whose violation may result in total chaos only. The system using RFID needs proper and efficient scanning. RFID technology is mainly evidently foreseeable due to low cost RFID tags but sometimes cost is paid in the form of compromised privacy. Within less than a decade, quite a number of research papers that happens to deal the security issues RFID technology is facing, have appeared. In this paper we attempt to summarize current research works that has been done in the area of RFID security right from where it started and some of their open issues are also discussed. The paper is finally concluded with some suggestions for future work. Firstly, we outline some of the research work done so far, security risks faced to RFID and then we review some of the major applications of RFID.

Research paper thumbnail of Famine in Somalia: Evidence for a declaration

Global Food Security, 2012

Nations declared famine in parts of Somalia. Here, we report the methods, data and analysis that ... more Nations declared famine in parts of Somalia. Here, we report the methods, data and analysis that underpinned this declaration along with the review of trends in mortality and malnutrition. Methods: During July 2011, 16 population-based nutrition and mortality surveys were conducted in southern Somalia. Data on food access, collected through seasonal assessments and monthly monitoring, were analyzed using Household Economy methods. Results: In 11 of 16 survey locations, the prevalence of Global Acute Malnutrition exceeded the Integrated Food Security Phase Classification threshold for Phase 5 (Famine) of 30%. In five areas, Crude Death Rates exceeded the Integrated Food Security Phase Classification Phase 5 (Famine) threshold of 2/10,000/day. In agro-pastoral zones of the south, where access was most limited, more than 20% of households faced extreme food shortages. Comment: Survey findings and analysis confirm that a famine occurred in parts of southern Somalia during 2011 and raise the question of why strong early warning analysis did not trigger an earlier, better funded and more effective, response.

Research paper thumbnail of A hybrid approach for the automated finishing of bacterial genomes

Nature Biotechnology, 2012