Luis Meira - Academia.edu (original) (raw)

Papers by Luis Meira

Research paper thumbnail of Algoritmos para problemas de classificação e particionamento em grafos

O trabalho desenvolvido neste doutorado consistiu em conceber algoritmos para uma serie de proble... more O trabalho desenvolvido neste doutorado consistiu em conceber algoritmos para uma serie de problemas NP-dificeis sob a abordagem de aproximabilidade, complementado com resultados heuristicos e tambem de programacao inteira. O estudo foi focado em problemas de classificacao e particionamento em grafos, como classificacao metrica, corte balanceado e clusterizacao. Houve um equilibrio entre teoria e aplicabilidade, ao obterse algoritmos com bons fatores de aproximacao e algoritmos que obtiveram solucoes de qualidade em tempo competitivo. O estudo concentrou-se em tres problemas: o Problema da Classificacao Metrica Uniforme, o Problema do Corte Balanceado e o Problema da Localizacao de Recursos na versao continua. Inicialmente trabalhamos no Problema da Classificacao Metrica Uniforme, para o qual propusemos um algoritmo O (logn)-aproximado. Na validacao experimental, este algoritmo obteve solucoes de boa qualidade em um espaco de tempo menor que os algoritmos tradicionais. Para o Problema do Corte Balanceado, propusemos heuristicas e um algoritmo exato. Experimentalmente, utilizamos um resolvedor de programacao semidefinida para resolver a relaxacao do problema e melhoramos substancialmente o tempo de resolucao da relaxacao ao construir um resolvedor proprio utilizando o metodo de insercao de cortes sobre um sistema de programacao linear. Finalmente, trabalhamos com o problema de Localizacao de Recursos na variante continua. Para este problema, apresentamos algoritmos de aproximacao para as metricas l2 e l2 2. Este algoritmo foi aplicado para obter algoritmos de aproximacao para o problema k-Means, que ´e um problema classico de clusterizacao. Na comparacao ao experimental com uma implementacao conhecida da literatura, os algoritmos apresentados mostraram-se competitivos, obtendo, em varios casos, solucoes de melhor qualidade em tempo equiparavel. Os estudos relativos a estes problemas resultaram em tres artigos, detalhados nos capitulos que compoem esta tese. Abstract

Research paper thumbnail of Modelagem para o Problema de Entrega de Refei\c{c}\~oes em Rio Claro-SP

Problemas de Roteamento são problemas em que um conjunto de clientes é atendido por um conjunto d... more Problemas de Roteamento são problemas em que um conjunto de clientes é atendido por um conjunto de véıculos. Neste trabalho, modelamos em um mapa 2D um benchmark multiobjetivo baseado em um problema de roteamento reaĺıstico de entregas de marmitas por motocicletas na cidade de Rio Claro-SP. O mapa gerado de Rio Claro apresenta 1566 ruas com coordenadas extráıdas manualmente e modeladas através de cadeias poligonais. Geramos um total de 23 instâncias contendo de 2 a 7 depósitos e até 2000 pontos de entrega. Este trabalho é uma extensão de [23] no qual os autores modelam o problema de entrega de correspondências por carteiros na cidade de Artur Nogueira. O trabalho [23] possui um depósito, enquanto neste trabalho temos múltiplos depósitos. Em [23] foi modelada a cidade de Artur Nogueira, com 537 ruas. Neste trabalho, foi modelado um mapa de Rio Claro, com 1566 ruas. As instâncias geradas serão disponibilizadas para que a comunidade cient́ıfica valide e compare algoritmos de otimização.

Research paper thumbnail of Multi-Objective Vehicle Routing Problem Applied to Large Scale Post Office Deliveries

ArXiv, 2018

The number of optimization techniques in the combinatorial domain is large and diversified. Never... more The number of optimization techniques in the combinatorial domain is large and diversified. Nevertheless, real-world based benchmarks for testing algorithms are few. This work creates an extensible real-world mail delivery benchmark to the Vehicle Routing Problem (VRP) in a planar graph embedded in the 2D Euclidean space. Such problem is multi-objective on a roadmap with up to 25 vehicles and 30,000 deliveries per day. Each instance models one generic day of mail delivery, allowing both comparison and validation of optimization algorithms for routing problems. The benchmark may be extended to model other scenarios.

Research paper thumbnail of Improving representativeness in a scenario reduction process to aid decision making in petroleum fields

Journal of Petroleum Science and Engineering, 2019

This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Research paper thumbnail of Clustering through Continuous Facility Location Problems

Theoretical Computer Science, 2017

We consider the Continuous Facility Location Problem (ConFLP). Given a finite set of clients C ⊂ ... more We consider the Continuous Facility Location Problem (ConFLP). Given a finite set of clients C ⊂ R d and a number f ∈ R + , ConFLP consists in opening a set F ⊂ R d of facilities, each at cost f , and connecting each client to an open facility. The objective is to minimize the costs of opening facilities and connecting clients. We reduce ConFLP to the standard Facility Location Problem (FLP), by using the so-called approximate center sets. This reduction preserves the approximation, except for an error ε, and implies 1.488 + ε and 2.04 + ε-approximations when the connection cost is given by the Euclidean distance and the squared Euclidean distance, respectively. Moreover, we obtain approximate center sets for the case that the connection cost is the αth power of the Euclidean distance, achieving approximations for the corresponding problems, for any α ≥ 1. As a byproduct, we also obtain a polynomial-time approximation scheme for the k-median problem with this cost function, for any fixed k.

Research paper thumbnail of Selection of Representative Models for Decision Analysis Under Uncertainty

Computers & Geosciences, 2016

The decision-making process in oil fields includes a step of risk analysis associated with the un... more The decision-making process in oil fields includes a step of risk analysis associated with the uncertainties present in the variables of the problem. Such uncertainties lead to hundreds, even thousands, of possible scenarios that are supposed to be analyzed so an effective production strategy can be selected. Given this high number of scenarios, a technique to reduce this set to a smaller, feasible subset of representative scenarios is imperative. The selected scenarios must be representative of the original set and also free of optimistic and pessimistic bias. This paper is devoted to propose an assisted methodology to identify representative models in oil fields. To do so, first a mathematical function was developed to model the representativeness of a subset of models with respect to the full set that characterizes the problem. Then, an optimization tool was implemented to identify the representative models of any problem, considering not only the cross-plots of the main output variables, but also the risk curves and the probability distribution of the attribute-levels of the problem. The proposed technique was applied to two benchmark cases and the results, evaluated by experts in the field, indicate that the obtained solutions are richer than those identified by previously adopted manual approaches.

Research paper thumbnail of A Greedy Approximation Algorithm for the Uniform Labeling Problem Analyzed by a Primal-Dual Technique

Lecture Notes in Computer Science, 2004

In this paper we present a new fast approximation algorithm for the Uniform Metric Labeling Probl... more In this paper we present a new fast approximation algorithm for the Uniform Metric Labeling Problem. This is an important classification problem that occur in many applications which consider the assignment of objects into labels, in a way that is consistent with some observed data that includes the relationship between the objects. The known approximation algorithms are based on solutions of large linear programs and are impractical for moderated and large size instances. We present an 8 log n-approximation algorithm analyzed by a primal-dual technique which, although has factor greater than the previous algorithms, can be applied to large sized instances. We obtained experimental results on computational generated and image processing instances with the new algorithm and two others LP-based approximation algorithms. For these instances our algorithm present a considerable gain of computational time and the error ratio, when possible to compare, was less than 2% from the optimum.

Research paper thumbnail of Attribute-value specification in customs fraud detection: a human-aided approach

DG.O (Inter)National Conference on Digital Government Research, 2009

With the growing importance of foreign commerce comes also greater opportunities for fraudulent b... more With the growing importance of foreign commerce comes also greater opportunities for fraudulent behaviour. As such, governments must try to detect frauds as soon as they take place, if they are to avoid the profound damage to the so- ciety frauds may cause. Although current fraud detection systems can be used on this endeavour with reasonable ac- curacy, they still

Research paper thumbnail of Sim,\'E Poss\'iivel Ordenar Com Complexidade Estritamente Abaixo de $ n $ lg $ n$

Resumo O problema da ordenaçãoé sem dúvida um dos mais estudados na Ciência da Computação. No esc... more Resumo O problema da ordenaçãoé sem dúvida um dos mais estudados na Ciência da Computação. No escopo da computação moderna, depois de mais de 60 anos de estudos, ainda existem muitas pesquisas que objetivam o desenvolvimento de algoritmos que solucionem uma ordenação mais rápida ou com menos recursos comparados a outros algoritmos já conhecidos. Há vários tipos de algoritmos de ordenação, alguns mais rápidos, outros mais econômicos em relação ao espaço e outros com algumas restrições com relaçãoà entrada de dados. O objetivo deste trabalhoé explicar a estrutura de dadosÁvore de Fusão, responsável pelo primeiro algoritmo de ordenação com tempo inferior a n lg n, tempo esse que criou certa confusão, gerando uma errada crença de ser o menor possível para esse tipo de problema.

Research paper thumbnail of Fusion Tree Sorting

The sorting problem is one of the most relevant problems in computer science. Within the scope of... more The sorting problem is one of the most relevant problems in computer science. Within the scope of modern computer science the sorting problem has been studied for more than 70 years. In spite of these facts, new sorting algorithms have been developed in recent years. Among several types of sorting algorithms, some are quicker; others are more economic in relation to space, whereas others insert a few restrictions in relation to data input. This paper is aimed at explaining the fusion tree data structure, which is responsible for the first sorting algorithm with complexity time smaller than n lg n. The n lg n time complexity has led to some confusion and generated the wrong belief of being the minimum possible for this type of problem.

Research paper thumbnail of acc-Motif Detection Tool

Background: Network motif algorithms have been a topic of research mainly after the 2002-seminal ... more Background: Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al, that provided motifs as a way to uncover the basic building blocks of most networks. In Bioinformatics, motifs have been mainly applied in the field of gene regulation networks field. Results: This paper proposes new algorithms to exactly count isomorphic pattern motifs of sizes 3, 4 and 5 in directed graphs. Let G(V, E) be a directed graph with m = |E|. We describe an O(m √ m) time complexity algorithm to count isomorphic patterns of size 3. In order to count isomorphic patterns of size 4, we propose an O(m 2) algorithm. To count patterns with 5 vertices, the algorithm is O(m 2 n). Conclusion: The new algorithms were implemented and compared with FANMOD and Kavosh motif detection tools. The experiments show that our algorithms are expressively faster than FANMOD and Kavosh's. We also let our motif-detecting tool available in the Internet.

Research paper thumbnail of How Far Do We Get Using Machine Learning Black-Boxes?

International Journal of Pattern Recognition and Artificial Intelligence, 2012

With several good research groups actively working in machine learning (ML) approaches, we have n... more With several good research groups actively working in machine learning (ML) approaches, we have now the concept of self-containing machine learning solutions that oftentimes work out-of-the-box leading to the concept of ML black-boxes. Although it is important to have such black-boxes helping researchers to deal with several problems nowadays, it comes with an inherent problem increasingly more evident: we have observed that researchers and students are progressively relying on ML black-boxes and, usually, achieving results without knowing the machinery of the classifiers. In this regard, this paper discusses the use of machine learning black-boxes and poses the question of how far we can get using these out-of-the-box solutions instead of going deeper into the machinery of the classifiers. The paper focuses on three aspects of classifiers: (1) the way they compare examples in the feature space; (2) the impact of using features with variable dimensionality; and (3) the impact of usi...

Research paper thumbnail of Accelerated Motif Detection Using Combinatorial Techniques

Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from M... more Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al, that provided motifs as a way to uncover the basic building blocks of most networks. This article proposes new algorithms to exactly count isomorphic pattern motifs of size 3 and 4 in directed graphs. The algorithms are accelerated by combinatorial techniques. Let G(V, E) be a directed graph with m = |E|. We describe an O(m √ m) time complexity algorithm to count isomorphic patterns of size 3. To counting isomorphic patterns of size 4, we propose an O(m 2) algorithm. The new algorithms were implemented and compared with Fanmod motif detection tool. The experiments show that our algorithms are expressively faster than Fanmod. We also let our tool to detect motifs, the acc-MOTIF, available in the Internet.

Research paper thumbnail of A systematic approach to bound factor-revealing LPs and its application to the metric and squared metric facility location problems

Mathematical Programming, 2014

A systematic technique to bound factor-revealing linear programs is presented. We show how to der... more A systematic technique to bound factor-revealing linear programs is presented. We show how to derive a family of upper bound factor-revealing programs (UPFRP), and show that each such program can be solved by a computer to bound the approximation factor of an associated algorithm. Obtaining an UPFRP is straightforward, and can be used as an alternative to analytical proofs, that are usually very long and tedious. We apply this technique to the Metric Facility Location Problem (MFLP) and to a generalization where the distance function is a squared metric. We call this generalization the Squared Metric Facility Location Problem (SMFLP) and prove that there is no approximation factor better than 2.04, assuming P = NP. Then, we analyze the best known algorithms for the MFLP based on primal-dual and LP-rounding techniques when they are applied to the SMFLP. We prove very tight bounds for these algorithms, and show that the LP-rounding algorithm achieves a ratio of 2.04, and therefore has the best factor for the SMFLP. We use UPFRPs in the dual-fitting analysis of the primal-dual algorithms for both the SMFLP and the MFLP, improving some of the previous analysis for the MFLP.

Research paper thumbnail of A continuous facility location problem and its application to a clustering problem

Proceedings of the 2008 ACM symposium on Applied computing - SAC '08, 2008

We consider a new problem, which we denote by Continuous Facility Location (ConFL), and its appli... more We consider a new problem, which we denote by Continuous Facility Location (ConFL), and its application to the k-Means Problem. Problem ConFL is a natural extension of the Uncapacitated Facility Location Problem where a facility can be any point in R q. The proposed algorithms are based on a primal-dual technique for spaces with constant dimensions. For the ConFL Problem, we present algorithms with approximation factors 3+ and 1.861+ for euclidean distances and 9+ for squared euclidean distances. For the k-Means Problem (that is restricted to squared euclidean distance), we present an algorithm with approximation factor 54 +. All algorithms have good practical behaviour in small dimensions. Comparisons with known algorithms show that the proposed algorithms have good practical behaviour.

Research paper thumbnail of acc-Motif: Accelerated Network Motif Detection

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014

Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from M... more Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al. [1], which provided motifs as a way to uncover the basic building blocks of most networks. Motifs have been mainly applied in Bioinformatics, regarding gene regulation networks. Motif detection is based on induced subgraph counting. This paper proposes an algorithm to count subgraphs of size k + 2 based on the set of induced subgraphs of size k. The general technique was applied to detect 3, 4 and 5-sized motifs in directed graphs. Such algorithms have time complexity O(a(G)m), O(m 2) and O(nm 2), respectively, where a(G) is the arboricity of G(V, E). The computational experiments in public datasets show that the proposed technique was one order of magnitude faster than Kavosh and FANMOD. When compared to NetMODE, acc-Motif had a slightly improved performance.

Research paper thumbnail of Uses of artificial intelligence in the Brazilian customs fraud detection system

DG.O (Inter)National Conference on Digital Government Research, 2008

There is an increasing concern about the control of customs operations. While globalization incen... more There is an increasing concern about the control of customs operations. While globalization incentives the opening of the market, increasing amounts of imports and exports have been used to conceal several illicit activities, such as, tax evasion, smuggling, money laundry, and drug tra!c. This fact makes it paramount for governments to find automatic or semi-automatic solutions to guide the customs'

Research paper thumbnail of Squared Metric Facility Location Problem

Jain et al. proposed two well-known algorithms for the Metric Facility Location Problem (MFLP), t... more Jain et al. proposed two well-known algorithms for the Metric Facility Location Problem (MFLP), that achieve approximation ratios of 1.861 and 1.61. Mahdian et al. combined the latter algorithm with scaling and greedy augmentation techniques, obtaining a 1.52-approximation for the MFLP. We consider a generalization of the Squared Euclidean Facility Location Problem, when the distance function is a squared metric, which we call Squared Metric Facility Location Problem (SMFLP). We show that the algorithms of Jain et al. and of Mahdian et al., when applied to this variant of the facility location, achieve approximation ratios of 2.87, 2.43, and 2.17, respectively. It is shown that, for the SMFLP, there is no 2.04-approximation algorithm, assuming P = NP. In our analysis, we used nonlinear factor-revealing programs to obtain both lower and upper bounds on the approximation factors, and propose a systematic way to derive such factor-revealing programs. * This research was partially supported by CNPq and FAPESP.

Research paper thumbnail of How to assess your Smart Delivery System?

Smart Delivery Systems, 2020

Abstract The number of optimization techniques in the combinatorial domain is large and diversifi... more Abstract The number of optimization techniques in the combinatorial domain is large and diversified. Nevertheless, real-world-based benchmarks for testing algorithms are few. This work creates an extensible real-world mail delivery benchmark to the Vehicle Routing Problem (VRP) in a planar graph embedded in the 2D Euclidean space. Such problem is a multiobjective Smart Delivery System (SDS) case study on a roadmap with up to 30,000 deliveries per day. Each instance models one generic day of mail delivery, allowing both comparison and validation of optimization algorithms for routing problems. The benchmark may be extended to model other scenarios.

Research paper thumbnail of How Far You Can Get Using Machine Learning Black-Boxes

Graphics, Patterns and …, Jan 1, 2010

Research paper thumbnail of Algoritmos para problemas de classificação e particionamento em grafos

O trabalho desenvolvido neste doutorado consistiu em conceber algoritmos para uma serie de proble... more O trabalho desenvolvido neste doutorado consistiu em conceber algoritmos para uma serie de problemas NP-dificeis sob a abordagem de aproximabilidade, complementado com resultados heuristicos e tambem de programacao inteira. O estudo foi focado em problemas de classificacao e particionamento em grafos, como classificacao metrica, corte balanceado e clusterizacao. Houve um equilibrio entre teoria e aplicabilidade, ao obterse algoritmos com bons fatores de aproximacao e algoritmos que obtiveram solucoes de qualidade em tempo competitivo. O estudo concentrou-se em tres problemas: o Problema da Classificacao Metrica Uniforme, o Problema do Corte Balanceado e o Problema da Localizacao de Recursos na versao continua. Inicialmente trabalhamos no Problema da Classificacao Metrica Uniforme, para o qual propusemos um algoritmo O (logn)-aproximado. Na validacao experimental, este algoritmo obteve solucoes de boa qualidade em um espaco de tempo menor que os algoritmos tradicionais. Para o Problema do Corte Balanceado, propusemos heuristicas e um algoritmo exato. Experimentalmente, utilizamos um resolvedor de programacao semidefinida para resolver a relaxacao do problema e melhoramos substancialmente o tempo de resolucao da relaxacao ao construir um resolvedor proprio utilizando o metodo de insercao de cortes sobre um sistema de programacao linear. Finalmente, trabalhamos com o problema de Localizacao de Recursos na variante continua. Para este problema, apresentamos algoritmos de aproximacao para as metricas l2 e l2 2. Este algoritmo foi aplicado para obter algoritmos de aproximacao para o problema k-Means, que ´e um problema classico de clusterizacao. Na comparacao ao experimental com uma implementacao conhecida da literatura, os algoritmos apresentados mostraram-se competitivos, obtendo, em varios casos, solucoes de melhor qualidade em tempo equiparavel. Os estudos relativos a estes problemas resultaram em tres artigos, detalhados nos capitulos que compoem esta tese. Abstract

Research paper thumbnail of Modelagem para o Problema de Entrega de Refei\c{c}\~oes em Rio Claro-SP

Problemas de Roteamento são problemas em que um conjunto de clientes é atendido por um conjunto d... more Problemas de Roteamento são problemas em que um conjunto de clientes é atendido por um conjunto de véıculos. Neste trabalho, modelamos em um mapa 2D um benchmark multiobjetivo baseado em um problema de roteamento reaĺıstico de entregas de marmitas por motocicletas na cidade de Rio Claro-SP. O mapa gerado de Rio Claro apresenta 1566 ruas com coordenadas extráıdas manualmente e modeladas através de cadeias poligonais. Geramos um total de 23 instâncias contendo de 2 a 7 depósitos e até 2000 pontos de entrega. Este trabalho é uma extensão de [23] no qual os autores modelam o problema de entrega de correspondências por carteiros na cidade de Artur Nogueira. O trabalho [23] possui um depósito, enquanto neste trabalho temos múltiplos depósitos. Em [23] foi modelada a cidade de Artur Nogueira, com 537 ruas. Neste trabalho, foi modelado um mapa de Rio Claro, com 1566 ruas. As instâncias geradas serão disponibilizadas para que a comunidade cient́ıfica valide e compare algoritmos de otimização.

Research paper thumbnail of Multi-Objective Vehicle Routing Problem Applied to Large Scale Post Office Deliveries

ArXiv, 2018

The number of optimization techniques in the combinatorial domain is large and diversified. Never... more The number of optimization techniques in the combinatorial domain is large and diversified. Nevertheless, real-world based benchmarks for testing algorithms are few. This work creates an extensible real-world mail delivery benchmark to the Vehicle Routing Problem (VRP) in a planar graph embedded in the 2D Euclidean space. Such problem is multi-objective on a roadmap with up to 25 vehicles and 30,000 deliveries per day. Each instance models one generic day of mail delivery, allowing both comparison and validation of optimization algorithms for routing problems. The benchmark may be extended to model other scenarios.

Research paper thumbnail of Improving representativeness in a scenario reduction process to aid decision making in petroleum fields

Journal of Petroleum Science and Engineering, 2019

This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Research paper thumbnail of Clustering through Continuous Facility Location Problems

Theoretical Computer Science, 2017

We consider the Continuous Facility Location Problem (ConFLP). Given a finite set of clients C ⊂ ... more We consider the Continuous Facility Location Problem (ConFLP). Given a finite set of clients C ⊂ R d and a number f ∈ R + , ConFLP consists in opening a set F ⊂ R d of facilities, each at cost f , and connecting each client to an open facility. The objective is to minimize the costs of opening facilities and connecting clients. We reduce ConFLP to the standard Facility Location Problem (FLP), by using the so-called approximate center sets. This reduction preserves the approximation, except for an error ε, and implies 1.488 + ε and 2.04 + ε-approximations when the connection cost is given by the Euclidean distance and the squared Euclidean distance, respectively. Moreover, we obtain approximate center sets for the case that the connection cost is the αth power of the Euclidean distance, achieving approximations for the corresponding problems, for any α ≥ 1. As a byproduct, we also obtain a polynomial-time approximation scheme for the k-median problem with this cost function, for any fixed k.

Research paper thumbnail of Selection of Representative Models for Decision Analysis Under Uncertainty

Computers & Geosciences, 2016

The decision-making process in oil fields includes a step of risk analysis associated with the un... more The decision-making process in oil fields includes a step of risk analysis associated with the uncertainties present in the variables of the problem. Such uncertainties lead to hundreds, even thousands, of possible scenarios that are supposed to be analyzed so an effective production strategy can be selected. Given this high number of scenarios, a technique to reduce this set to a smaller, feasible subset of representative scenarios is imperative. The selected scenarios must be representative of the original set and also free of optimistic and pessimistic bias. This paper is devoted to propose an assisted methodology to identify representative models in oil fields. To do so, first a mathematical function was developed to model the representativeness of a subset of models with respect to the full set that characterizes the problem. Then, an optimization tool was implemented to identify the representative models of any problem, considering not only the cross-plots of the main output variables, but also the risk curves and the probability distribution of the attribute-levels of the problem. The proposed technique was applied to two benchmark cases and the results, evaluated by experts in the field, indicate that the obtained solutions are richer than those identified by previously adopted manual approaches.

Research paper thumbnail of A Greedy Approximation Algorithm for the Uniform Labeling Problem Analyzed by a Primal-Dual Technique

Lecture Notes in Computer Science, 2004

In this paper we present a new fast approximation algorithm for the Uniform Metric Labeling Probl... more In this paper we present a new fast approximation algorithm for the Uniform Metric Labeling Problem. This is an important classification problem that occur in many applications which consider the assignment of objects into labels, in a way that is consistent with some observed data that includes the relationship between the objects. The known approximation algorithms are based on solutions of large linear programs and are impractical for moderated and large size instances. We present an 8 log n-approximation algorithm analyzed by a primal-dual technique which, although has factor greater than the previous algorithms, can be applied to large sized instances. We obtained experimental results on computational generated and image processing instances with the new algorithm and two others LP-based approximation algorithms. For these instances our algorithm present a considerable gain of computational time and the error ratio, when possible to compare, was less than 2% from the optimum.

Research paper thumbnail of Attribute-value specification in customs fraud detection: a human-aided approach

DG.O (Inter)National Conference on Digital Government Research, 2009

With the growing importance of foreign commerce comes also greater opportunities for fraudulent b... more With the growing importance of foreign commerce comes also greater opportunities for fraudulent behaviour. As such, governments must try to detect frauds as soon as they take place, if they are to avoid the profound damage to the so- ciety frauds may cause. Although current fraud detection systems can be used on this endeavour with reasonable ac- curacy, they still

Research paper thumbnail of Sim,\'E Poss\'iivel Ordenar Com Complexidade Estritamente Abaixo de $ n $ lg $ n$

Resumo O problema da ordenaçãoé sem dúvida um dos mais estudados na Ciência da Computação. No esc... more Resumo O problema da ordenaçãoé sem dúvida um dos mais estudados na Ciência da Computação. No escopo da computação moderna, depois de mais de 60 anos de estudos, ainda existem muitas pesquisas que objetivam o desenvolvimento de algoritmos que solucionem uma ordenação mais rápida ou com menos recursos comparados a outros algoritmos já conhecidos. Há vários tipos de algoritmos de ordenação, alguns mais rápidos, outros mais econômicos em relação ao espaço e outros com algumas restrições com relaçãoà entrada de dados. O objetivo deste trabalhoé explicar a estrutura de dadosÁvore de Fusão, responsável pelo primeiro algoritmo de ordenação com tempo inferior a n lg n, tempo esse que criou certa confusão, gerando uma errada crença de ser o menor possível para esse tipo de problema.

Research paper thumbnail of Fusion Tree Sorting

The sorting problem is one of the most relevant problems in computer science. Within the scope of... more The sorting problem is one of the most relevant problems in computer science. Within the scope of modern computer science the sorting problem has been studied for more than 70 years. In spite of these facts, new sorting algorithms have been developed in recent years. Among several types of sorting algorithms, some are quicker; others are more economic in relation to space, whereas others insert a few restrictions in relation to data input. This paper is aimed at explaining the fusion tree data structure, which is responsible for the first sorting algorithm with complexity time smaller than n lg n. The n lg n time complexity has led to some confusion and generated the wrong belief of being the minimum possible for this type of problem.

Research paper thumbnail of acc-Motif Detection Tool

Background: Network motif algorithms have been a topic of research mainly after the 2002-seminal ... more Background: Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al, that provided motifs as a way to uncover the basic building blocks of most networks. In Bioinformatics, motifs have been mainly applied in the field of gene regulation networks field. Results: This paper proposes new algorithms to exactly count isomorphic pattern motifs of sizes 3, 4 and 5 in directed graphs. Let G(V, E) be a directed graph with m = |E|. We describe an O(m √ m) time complexity algorithm to count isomorphic patterns of size 3. In order to count isomorphic patterns of size 4, we propose an O(m 2) algorithm. To count patterns with 5 vertices, the algorithm is O(m 2 n). Conclusion: The new algorithms were implemented and compared with FANMOD and Kavosh motif detection tools. The experiments show that our algorithms are expressively faster than FANMOD and Kavosh's. We also let our motif-detecting tool available in the Internet.

Research paper thumbnail of How Far Do We Get Using Machine Learning Black-Boxes?

International Journal of Pattern Recognition and Artificial Intelligence, 2012

With several good research groups actively working in machine learning (ML) approaches, we have n... more With several good research groups actively working in machine learning (ML) approaches, we have now the concept of self-containing machine learning solutions that oftentimes work out-of-the-box leading to the concept of ML black-boxes. Although it is important to have such black-boxes helping researchers to deal with several problems nowadays, it comes with an inherent problem increasingly more evident: we have observed that researchers and students are progressively relying on ML black-boxes and, usually, achieving results without knowing the machinery of the classifiers. In this regard, this paper discusses the use of machine learning black-boxes and poses the question of how far we can get using these out-of-the-box solutions instead of going deeper into the machinery of the classifiers. The paper focuses on three aspects of classifiers: (1) the way they compare examples in the feature space; (2) the impact of using features with variable dimensionality; and (3) the impact of usi...

Research paper thumbnail of Accelerated Motif Detection Using Combinatorial Techniques

Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from M... more Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al, that provided motifs as a way to uncover the basic building blocks of most networks. This article proposes new algorithms to exactly count isomorphic pattern motifs of size 3 and 4 in directed graphs. The algorithms are accelerated by combinatorial techniques. Let G(V, E) be a directed graph with m = |E|. We describe an O(m √ m) time complexity algorithm to count isomorphic patterns of size 3. To counting isomorphic patterns of size 4, we propose an O(m 2) algorithm. The new algorithms were implemented and compared with Fanmod motif detection tool. The experiments show that our algorithms are expressively faster than Fanmod. We also let our tool to detect motifs, the acc-MOTIF, available in the Internet.

Research paper thumbnail of A systematic approach to bound factor-revealing LPs and its application to the metric and squared metric facility location problems

Mathematical Programming, 2014

A systematic technique to bound factor-revealing linear programs is presented. We show how to der... more A systematic technique to bound factor-revealing linear programs is presented. We show how to derive a family of upper bound factor-revealing programs (UPFRP), and show that each such program can be solved by a computer to bound the approximation factor of an associated algorithm. Obtaining an UPFRP is straightforward, and can be used as an alternative to analytical proofs, that are usually very long and tedious. We apply this technique to the Metric Facility Location Problem (MFLP) and to a generalization where the distance function is a squared metric. We call this generalization the Squared Metric Facility Location Problem (SMFLP) and prove that there is no approximation factor better than 2.04, assuming P = NP. Then, we analyze the best known algorithms for the MFLP based on primal-dual and LP-rounding techniques when they are applied to the SMFLP. We prove very tight bounds for these algorithms, and show that the LP-rounding algorithm achieves a ratio of 2.04, and therefore has the best factor for the SMFLP. We use UPFRPs in the dual-fitting analysis of the primal-dual algorithms for both the SMFLP and the MFLP, improving some of the previous analysis for the MFLP.

Research paper thumbnail of A continuous facility location problem and its application to a clustering problem

Proceedings of the 2008 ACM symposium on Applied computing - SAC '08, 2008

We consider a new problem, which we denote by Continuous Facility Location (ConFL), and its appli... more We consider a new problem, which we denote by Continuous Facility Location (ConFL), and its application to the k-Means Problem. Problem ConFL is a natural extension of the Uncapacitated Facility Location Problem where a facility can be any point in R q. The proposed algorithms are based on a primal-dual technique for spaces with constant dimensions. For the ConFL Problem, we present algorithms with approximation factors 3+ and 1.861+ for euclidean distances and 9+ for squared euclidean distances. For the k-Means Problem (that is restricted to squared euclidean distance), we present an algorithm with approximation factor 54 +. All algorithms have good practical behaviour in small dimensions. Comparisons with known algorithms show that the proposed algorithms have good practical behaviour.

Research paper thumbnail of acc-Motif: Accelerated Network Motif Detection

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014

Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from M... more Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al. [1], which provided motifs as a way to uncover the basic building blocks of most networks. Motifs have been mainly applied in Bioinformatics, regarding gene regulation networks. Motif detection is based on induced subgraph counting. This paper proposes an algorithm to count subgraphs of size k + 2 based on the set of induced subgraphs of size k. The general technique was applied to detect 3, 4 and 5-sized motifs in directed graphs. Such algorithms have time complexity O(a(G)m), O(m 2) and O(nm 2), respectively, where a(G) is the arboricity of G(V, E). The computational experiments in public datasets show that the proposed technique was one order of magnitude faster than Kavosh and FANMOD. When compared to NetMODE, acc-Motif had a slightly improved performance.

Research paper thumbnail of Uses of artificial intelligence in the Brazilian customs fraud detection system

DG.O (Inter)National Conference on Digital Government Research, 2008

There is an increasing concern about the control of customs operations. While globalization incen... more There is an increasing concern about the control of customs operations. While globalization incentives the opening of the market, increasing amounts of imports and exports have been used to conceal several illicit activities, such as, tax evasion, smuggling, money laundry, and drug tra!c. This fact makes it paramount for governments to find automatic or semi-automatic solutions to guide the customs'

Research paper thumbnail of Squared Metric Facility Location Problem

Jain et al. proposed two well-known algorithms for the Metric Facility Location Problem (MFLP), t... more Jain et al. proposed two well-known algorithms for the Metric Facility Location Problem (MFLP), that achieve approximation ratios of 1.861 and 1.61. Mahdian et al. combined the latter algorithm with scaling and greedy augmentation techniques, obtaining a 1.52-approximation for the MFLP. We consider a generalization of the Squared Euclidean Facility Location Problem, when the distance function is a squared metric, which we call Squared Metric Facility Location Problem (SMFLP). We show that the algorithms of Jain et al. and of Mahdian et al., when applied to this variant of the facility location, achieve approximation ratios of 2.87, 2.43, and 2.17, respectively. It is shown that, for the SMFLP, there is no 2.04-approximation algorithm, assuming P = NP. In our analysis, we used nonlinear factor-revealing programs to obtain both lower and upper bounds on the approximation factors, and propose a systematic way to derive such factor-revealing programs. * This research was partially supported by CNPq and FAPESP.

Research paper thumbnail of How to assess your Smart Delivery System?

Smart Delivery Systems, 2020

Abstract The number of optimization techniques in the combinatorial domain is large and diversifi... more Abstract The number of optimization techniques in the combinatorial domain is large and diversified. Nevertheless, real-world-based benchmarks for testing algorithms are few. This work creates an extensible real-world mail delivery benchmark to the Vehicle Routing Problem (VRP) in a planar graph embedded in the 2D Euclidean space. Such problem is a multiobjective Smart Delivery System (SDS) case study on a roadmap with up to 30,000 deliveries per day. Each instance models one generic day of mail delivery, allowing both comparison and validation of optimization algorithms for routing problems. The benchmark may be extended to model other scenarios.

Research paper thumbnail of How Far You Can Get Using Machine Learning Black-Boxes

Graphics, Patterns and …, Jan 1, 2010