Frequency Analysis of the Portuguese Language (original) (raw)
Related papers
Frequency Analysis of Languages Using Latin Alphabet
2018
The evaluation of the peculiarities of alphabets, particularly the frequency of letters is essential when designing keyboards, analysing texts, designing alphabet-based games, and doing some text mining. Thus, it is important to determine what might be useful for designers of text input tools, and of other technologies related to sets of letters. Knowledge of common features among different languages gives an opportunity to take advantage of the experience of other languages. Nowadays an increasing amount of texts is published on the Internet. In order to adequately compare the frequencies of letters in different languages used in the online space, Wikipedia texts have been selected as a source material for investigation. This paper presents the Method of the Adjacent Letter Frequency Differences in the frequency line, which helps to evaluate frequency breakpoints. This is a uniform evaluation criterion for 25 main languages using Latin script in order to highlight the similarities ...
Letter Frequency Analysis of Languages Using Latin Alphabet
International Linguistics Research, 2018
The evaluation of the peculiarities of alphabets, particularly the frequency of letters is essential when designing keyboards, analysing texts, designing alphabet-based games, and doing some text mining. Thus, it is important to determine what might be useful for designers of text input tools, and of other technologies related to sets of letters. Knowledge of common features among different languages gives an opportunity to take advantage of the experience of other languages. Nowadays an increasing amount of texts is published on the Internet. In order to adequately compare the frequencies of letters in different languages used in the online space, Wikipedia texts have been selected as a source material for investigation. This paper presents the Method of the Adjacent Letter Frequency Differences in the frequency line, which helps to evaluate frequency breakpoints. This is a uniform evaluation criterion for 25 main languages using Latin script in order to highlight the similarities ...
Letter counting: A stem cell for cryptology, quantitative linguistics, and statistics
Historiographia Linguistica, 2013
Counting letters in written texts is a very ancient practice. It has accompanied the development of Cryptology, Quantitative Linguistics, and Statistics. In Cryptology, counting frequencies of the different characters in an encrypted message is the basis of the so called frequency analysis method. In Quantitative Linguistics, the proportion of vowels to consonants in different languages was studied long before authorship attribution. In Statistics, the alternation vowel-consonants was the only example that Markov ever gave of his theory of chained events. A short history of letter counting is presented. The three domains, Cryptology, Quantitative Linguistics, and Statistics, are then examined, focusing on the interactions with the other two fields through letter counting. As a conclusion, the eclectism of past centuries scholars, their background in humanities, and their familiarity with cryptograms, are identified as contributing factors to the mutual enrichment process which is described here.
PLOS ONE, 2019
Languages have inherent characteristics that make them their own and differentiated entities within their phyla and families. Even messages written in any language and later encrypted by cryptographic systems do not lose all of their characteristics, there remain aspects that help the cryptanalyst to recover them without knowing the decryption keys. For the characterization of the languages we will consider the frequencies of their graphemic and phonetic units and the Index of Coincidence, tools of fundamental utility in the field of Cryptography. Their diachronic invariance or survival over time in one language and their ability to discriminate against other languages will be analized. In order to do so, we will examine a total of 101 languages of which 261 texts have been taken. All of them are very diverse in style and time, taking us through a wide linguistic and temporal spectrum that will cover the period from the 6th century BC to the present day.
Experimental Analysis of the Dorabella Cipher with Statistical Language Models
Proceedings of the 4th International Conference on Historical Cryptology HistoCrypt 2020, 2021
The Dorabella cipher is a symbolic message written in 1897 by English composer Edward Elgar. We analyze the cipher using modern computational and statistical techniques. We consider several open questions: Is the underlying message natural language text or music? If it is language, what is the most likely language? Is Dorabella a simple substitution cipher? If so, why has nobody managed to produce a plausible decipherment? Are some unusual-looking patterns in the cipher likely to occur by chance? Can stateof-the-art algorithmic solvers decipher at least some words of the message? This work is intended as a contribution towards finding answers to these questions.
Using Letters Frequency Analysis in Caesar Cipher with Double Columnar Transposition Technique.
International Journal of Engineering Sciences & Research Technology, 2013
In this paper we have some modification in Caesar Cipher Technique. We have proposed a method to enhancing the Caesar cipher for more efficient and secure. We use Relative Frequency of Letters in Alphabets. We arrange the sequence of letter according to the frequency in increasing ordered. And then we have made use of a Modified Caesar cipher technique with double Columnar Transposition Technique.
Analysis of Four Historical Ciphers Against Known Plaintext Frequency Statistical Attack
International Journal of Integrated Engineering, 2018
The need of keeping information securely began thousands of years. The practice to keep the information securely is by scrambling the message into unreadable form namely ciphertext. This process is called encryption. Decryption is the reverse process of encryption. For the past, historical ciphers are used to perform encryption and decryption process. For example, the common historical ciphers are Hill cipher, Playfair cipher, Random Substitution cipher and Vigenère cipher. This research is carried out to examine and to analyse the security level of these four historical ciphers by using known plaintext frequency statistical attack. The result had shown that Playfair cipher and Hill cipher have better security compare with Vigenère cipher and Random Substitution cipher.
Contemporary Advancements in Information Technology Development in Dynamic Environments
The purpose of this chapter is to present current research on the modern Bulgarian language. It is one of the oldest European languages. An information system for the management of the electronic archive with texts in Bulgarian language is described. It provides the possibility for processing the collected text information. The detailed and comprehensive researches on the letter and the word frequency in the modern Bulgarian language from varied sources (fiction, scientific and popular science literature, press, legal texts, government bulletins, etc.) are performed, and the obtained results are represented. The index of coincidence of the Bulgarian language as a whole and for the individual sources is computed. The results can be utilized by different specialists – computer scientists, linguists, cryptanalysts, and others. Furthermore, with mathematical modeling, the authors found the letter and word frequency distributions and their models and they estimated their standard deviati...
IJERT-A Comparative Study of Classical Substitution Ciphers
International Journal of Engineering Research and Technology (IJERT), 2014
https://www.ijert.org/a-comparative-study-of-classical-substitution-ciphers https://www.ijert.org/research/a-comparative-study-of-classical-substitution-ciphers-IJERTV3IS090345.pdf with the rapid development in the technology, the call for security has also raised its pitch and Information Security has become an important issue during last decades. Cryptography; emerged as a solution; has reserved its unvanquishable place in the field of security. The principle objective guiding the design of any cryptographic algorithm must be the security it provides against unauthorized attack. But, the performance and cost implementation of the algorithms are also those factors which we cannot ignore. So, there is always a deemed necessity to analyze, standardize and represent these algorithms to the future researchers and struggling students so that they can learn to design effective and innovative techniques for securing data. In this paper, 7 classical substitution algorithms i.e., Affine, Atbash, Caesar, Modified Caeser Baconian, Polybius square and Letter number ciphers are implemented, and their performance is compared by encoding input files of various sizes on LINUX platform. All the algorithms are implemented in C++ language using QT creator, so that a fair comparison of execution speeds can be done. On the basis of experiments, it is concluded that Caesar cipher the best amongst the algorithms selected for the implementation.