A refinement strategy for identification of scientific software from bioinformatics publications (original) (raw)
References
Alfred, R., Leong, L. C., On, C. K., & Anthony, P. (2014). Malay named entity recognition based on rule-based approach. International Journal of Machine Learning and Computing,4(3), 300–306. Article Google Scholar
Aryani, A., Poblet, M., Unsworth, K., Wang, J., Evans, B., Devaraju, A., Brigitte, H., Claus-Peter, K., Benjamin, Z., & Samuele, K. (2018). A research graph dataset for connecting research data repositories using RD-Switchboard. Scientific Data,5, 180099. Article Google Scholar
Bertin, M., Atanassova, I., Lariviere, V., & Gingras, Y. (2013). The distribution of references in scientific papers: An analysis of the IMRaD structure. In Proceedings of the international conference on scientometrics and informetrics (pp. 591–603), Vienna, Austria.
Borgman, C. L., Wallis, J., & Mayernik, M. (2012). Who’s got the data? Interdependencies in science and technology collaborations. Computer Supported Cooperative Work,21(6), 485–523. Article Google Scholar
Boudjellal, N., Zhang, H., Khan, A., Ahmad, A., & Dai, L. (2021). Abioner: A bert-based model for arabic biomedical named-entity recognition. Complexity,3, 1–6. Article Google Scholar
Bressan, B. (2013). The SciencePAD treasure hunt of persistent identifiers. CERN Bulletin.
Chassanoff, A., & Altman, M. (2019). Curation as “Interoperability with the Future”: Preserving scholarly research software in academic libraries. Journal of the Association for Information Science and Technology,71(3), 325–337. Article Google Scholar
Chen, L., & Davidson, S. B. (2020). Automating software citation using gitcite. In 2020 IEEE 36th international conference on data engineering (ICDE) (pp.1754–1757). Texas, USA.
Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T., Way, G. P., Ferrero, E., Agapow, P. M., Zietz, M., Hoffman, M. M., Xie, W., Rosen, G. L., Lengerich, B. J., Israeli, J., Lanchantin, J., Woloszynek, S., Carpenter, A. E., Shrikumar, A., Xu, J., … Greene, C. S. (2018). Opportunities and obstacles for deep learning in biology and medicine. Journal of the Royal Society Interface,15(141), 20170387. Article Google Scholar
Chiticariu, L., Li, Y., & Reiss, F. (2013). Rule-based information extraction is dead! long live rule-based information extraction systems! In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 827–832). Washington, USA.
Cho, M., Ha, J., Park, C., & Park, S. (2020). Combinatorial feature embedding based on cnn and lstm for biomedical named entity recognition. Journal of Biomedical Informatics,103, 103381. Article Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research,12(76), 2493–2537. MATH Google Scholar
Cosmo, R. D. (2020). Announcing biblatex-software: Software citation made easy. Software Engineering Notes,45(4), 22–23. Article Google Scholar
Devi, G. R., Kumar, M. A., & Soman, K. P. (2019). Extraction of named entities from social media text in tamil language using N-gram embedding for disaster management. Nature-Inspired Computation in Data Mining and Machine Learning,855, 207–223. Google Scholar
Dong, C., Zhang, J., Zong, C., Hattori, M., & Di, H. (2016). Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural language understanding and intelligent applications (pp. 239–250). Springer.
Farquad, M. A. H., & Bose, I. (2012). Preprocessing unbalanced data using support vector machine. Decision Support Systems,53(1), 226–233. Article Google Scholar
Ghosh, S., Matsuoka, Y., Asai, Y., Hsin, K. Y., & Kitano, H. (2011). Software for systems biology: From tools to integrated platforms. Nature Reviews Genetics,12(12), 821–832. Article Google Scholar
Goble, C. (2014). Better software, better research. IEEE Internet Computing,18(5), 4–8. Article Google Scholar
Goyala, A., Guptab, V., & Kumarc, M. (2018). Recent named entity recognition and classification techniques: A systematic review. Computer Science Review,29, 21–43. Article Google Scholar
Gridach, M. (2017). Character-level neural network for biomedical named entity recognition. Journal of Biomedical Informatics,70, 85–91. Article Google Scholar
Hakala, K., Pyysalo, S. (2019). Biomedical Named Entity Recognition with Multilingual BERT. Association for Computational Linguistics, In Proceedings of the 5th workshop on BioNLP open shared tasks (pp. 56–61). Hong Kong, China.
Hemati, W., & Mehler, A. (2019). Lstmvoter: Chemical named entity recognition using a conglomerate of sequence labeling tools. Journal of Cheminformatics,11, 3. Article Google Scholar
Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,6(2), 107–116. ArticleMathSciNetMATH Google Scholar
Howison, J., & Herbsleb, J. D. (2011). Scientific software production: incentives and collaboration. In Proceedings of the 2011 ACM conference on computer supported cooperative work (pp. 513–522). Hangzhou, China.
Howison, J., & Bullard, J. (2016). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science & Technology,67(9), 2137–2155. Article Google Scholar
Howison, J., Deelman, E., Mc Lennan, M. J., et al. (2015). Understanding the scientific software ecosystem and its impact: Current and future measures. Research Evaluation,24(4), 454–470. Article Google Scholar
Hsu, B. M. (2020). Comparison of supervised classification models on textual data. Mathematics,8(5), 851. Article Google Scholar
Katz, D. S., Bouquin, D., Hong, N., Hausman, J., Jones, C., & Chivvis, D., et al. (2019a). Software citation implementation challenges. arXiv, 1905.08674.
Katz, D. S., McInnes, L. C., Bernholdt, D. E., Mayes, A. C., Hong, N. P. C., Duckles, J., Gesing, S., Heroux, M. A., Hettrick, S., Jimenez, R. C., Pierce, M., Weaver, B., & Wilkins-Diehr, N. (2019b). Community organizations: Changing the culture in which research software is developed and sustained. Computing in Science & Engineering,21(2), 8–24. Article Google Scholar
Katz, D. S., Hong, N., Clark, T., Muench, A., & Yeston, J. (2020). The importance of software citation. F1000 Research, 9, 1257.
Kristina, T., Dan, K., Christopher, M., & Yoram, S. (2003). Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL (pp. 252–259). Edmonton, Canada.
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 260–270). California, USA.
Le, T. A., Arkhipov, M. Y., & Burtsev, M. S. (2017). Application of a hybrid Bi-LSTM-CRF model to the task of Russian named entity recognition. In Conference on artificial intelligence and natural language (pp. 91–103). Petersburg, Russia.
Leroy, D., Sallou, J., Bourcier, J., & Combemale, B. (2021). When scientific software meets software engineering. Computer,54(12), 60–71. Article Google Scholar
Li, J., Sun, A., & Joty, S. R. (2018). SegBot: A generic neural text segmentation model with pointer network. In Proceedings of the twenty-seventh international joint conference on artificial intelligence (pp. 4166–4172). Stockholm, Sweden.
Li, K., Chen, P. Y., & Yan, E. (2019). Challenges of measuring software impact through citations: An examination of the lme4 R package. Journal of Informetrics,13(1), 449–461. Article Google Scholar
Li, K., Yan, E., & Feng, Y. (2017). How is R cited in research outputs? Structure, impacts, and citation standard. Journal of Informetrics,11(4), 989–1002. Article Google Scholar
Lin, F., & Xie, D. (2020). Research on named entity recognition of traditional Chinese medicine electronic medical records. In Proceedings of ninth international conference on health information science (pp.61–67). Amsterdam and Leiden, Netherlands.
Liu, P., Choo, K. K. R., Wang, L., & Huang, F. (2017). SVM or deep learning? A comparative study on remote sensing image classification. Soft Computing,21(23), 7053–7065. Article Google Scholar
Luo, L., Yang, Z., Yang, P., Zhang, Y., Wang, L., Lin, H., & Wang, J. (2018). An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics,34(8), 1381–1388. Article Google Scholar
Löffler, F., Brandt, S. R., Allen, G., & Schnetter, E. (2014). Cactus: Issues for sustainable simulation software. Journal of Open Research Software,2(1), e12. Article Google Scholar
Marcot, B. G., & Hanea, A. M. (2021). What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis. Computational Statistics,36(3), 2009–2031. ArticleMathSciNetMATH Google Scholar
Marx, V. (2013). Biology: The big challenges of big data. Nature,498(7453), 255–260. Article Google Scholar
Mikolov, T., Karafiat, M., Burget, L., Cernock, J., & Khudanpur, S. (2010). Recurrent neural network-based language model. In Proceedings of eleventh annual conference of the international speech communication association (pp.1045–1048). Chiba, Japan.
Na, S. H., Kim, H., Min, J., & Kim, K. (2019). Improving LSTM CRFs using character-based compositions for Korean named entity recognition. Computer Speech & Language,54, 106–121. Article Google Scholar
Nandar, T. L., Soe, T. L., & Soe, K. M. (2020). A comparative study of named entity recognition on myanmar language. In Proceedings of 23rd conference of the oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (pp. 60–64). Yangon, Myanmar.
Nguyen, T., Nguyen, D., & Rao, P. (2003). Adaptive name entity recognition under highly unbalanced data. arXiv preprint, 10296.
Ordua-Malea, E., & Costas, R. (2021). Link-based approach to study scientific software usage: The case of VOSviewer. Scientometrics,126, 8153–8186. Article Google Scholar
Pan, X. L., Yan, E., Wang, Q. Q., & Hua, W. N. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics,9(4), 860–871. Article Google Scholar
Pan, X., Yan, E., & Hua, W. (2016). Disciplinary differences of software use and impact in scientific literature. Scientometrics,109(3), 1–18. Article Google Scholar
Park, H., & Wolfram, D. (2019). Research software citation in the data citation index: Current practices and implications for research software sharing and reuse. Journal of Informetrics,13(2), 574–582. Article Google Scholar
Piwowar, H. (2013). Altmetrics: Value all research products. Nature,493(7431), 159–159. Article Google Scholar
Rais, M., Lachkar, A., Lachkar A, & Ouatik, S. E. A. (2014). A comparative study of biomedical named entity recognition methods based machine learning approach. In Proceedings of 3rd IEEE international colloquium on information science and technology (pp. 329–334). Tetouan, Morocco.
Rau, L. F. (1991). Extracting company names from text. In Proceedings of the seventh IEEE conference on artificial intelligence application (pp. 29–32). FL, USA.
Shaalan, K., & Oudah, M. (2014). A hybrid approach to Arabic named entity recognition. Journal of Information Science,40(1), 67–87. Article Google Scholar
Smith, A. M., Katz, D. S., & Niemeyer, K. E. (2016). Software citation principles. PeerJ,2, e86. Google Scholar
Soito, L. & Hwang, L. J, (2016). Citations for Software: Providing Identification Access and Recognition for Research Software. International Journal of Digital Curation, 11(2), 48–63.
Sollaci, L. B., & Pereira, M. G. (2004). The introduction, methods, results, and discussion (IMRAD) structure: A 50-year survey. Journal of the Medical Library Association,92(3), 364–371. Google Scholar
Sundheim, B. M. (1995). Overview of results of the MUC-6 evaluation. In Proceedings of the 6th conference on message understanding (pp. 13–31). Maryland, USA.
Thelwall, M., & Kousha, K. (2016). Academic software downloads from Google code. Information Research,21(1), n1. Google Scholar
Ukov-Gregori, A., Bachrach, Y., & Coope, S. (2018). Named Entity Recognition with Parallel Recurrent Neural Networks. In Proceedings of the 56th annual meeting of the association for computational linguistics (pp. 69–74). Melbourne, Australia.
Wang, H. B., Gao, H. K., Shen, Q., & Xian, Y. (2019). Thai language names, place names and organization names entity recognition. Journal of System Simulation,31(5), 1010–1018. Google Scholar
Wang, S. J., Mathew, A., Chen, Y., Xi, L. F., Ma, L., & Lee, J. (2009). Empirical analysis of support vector machine ensemble classifiers. Expert Systems with Applications,36(3), 6466–6476. Article Google Scholar
Wu, J. (2011). Improving the writing of research papers: IMRAD and beyond. Landscape Ecology,26(10), 1345–1349. Article Google Scholar
Yang, B., Rousseau, R., Wang, X., & Huang, S. (2018). How important is scientific software in bioinformatics research? A comparative study between international and Chinese research communities. Journal of the Association for Information Science and Technology,69(9), 1122–1133. Article Google Scholar
Zeng, D., Sun, C., Lin, L., & Liu, B. (2017). LSTM-CRF for drug-named entity recognition. Entropy,19(6), 283. Article Google Scholar
Zhang, Y. C., Liu, J. Y., Liu, J., Sheng, J., & Lv, J. W. (2018). EEG recognition of motor imagery based on SVM ensemble. In Proceedings of the 5th international conference on systems and informatics (pp. 866–870). Nanjing, China.
Zhou, J. T., Zhang, H., Jin, D., Peng, X., Xiao, Y., & Cao, Z. (2019). Roseq: Robust sequence labeling. IEEE Transactions on Neural Networks and Learning Systems,31(7), 2304–2314. MathSciNet Google Scholar
Zhu, F., & Shen, B. (2012). Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing. PLoS ONE,7(6), e39230. Article Google Scholar
Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2019). Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities. Information Fusion,50, 71–91. Article Google Scholar
Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A., & Telenti, A. (2019). A primer on deep learning in genomics. Nature Genetics,51(1), 12–18. Article Google Scholar