Skip to content

Research Papers On Data Mining 2012 Gmc

1.Hautier, G., Jain, A., and Ong, S.P.: From the computer to the laboratory: Materials discovery and design using first-principles calculations. J. Mater. Sci.47(21), 7317–7340 (2012).

2.Rajan, K. and Mendez, P.: Materials informatics. Mater. Today8(10), 38–45 (2005).

3.Rupp, M., Proschak, E., and Schneider, G.: Kernel approach to molecular similarity based on iterative graph similarity. J. Chem. Inf. Model.47(6), 2280–2286 (2007).

4.Rupp, M., Tkatchenko, A., Müller, K.-R., Lilienfeld, V., and Anatole, O.: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett.108, 058301 (2012).

5.Hansen, K., Montavon, G., Biegler, F., Fazli, S., Rupp, M., Scheffler, M., Von Lilienfeld, O.A., Tkatchenko, A., and Müller, K.R.: Assessment and validation of machine learning methods for predicting molecular atomization energies. J. Chem. Theory Comput.9, 3404–3419 (2013).

6.Bergerhoff, G., Hundt, R., Sievers, R., and Brown, I.D.: The inorganic crystal-structure database. J. Chem. Inf. Comput. Sci.23(2), 66–69 (1983).

7.Allen, F.H.: The cambridge structural database: a quarter of a million crystal structures and rising. Acta Crystallogr., Sect. B: Struct. Sci.58, 380–388 (2002).

8.Villars, P.: The linus pauling file (LPF) and its application to materials design. J. Alloys Compd.279(1), 1–7 (1998).

9.Shannon, R.D.: Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallogr., Sect. A: Found. Adv.32(5), 751–767 (1976).

10.Brown, I.D. and Altermatt, D.: Bond-valence parameters obtained from a systematic analysis of the inorganic crystal structure database. Acta Crystallogr., Sect. B: Struct. Sci.244(2), 244–247 (1985).

11.O'Keefe, M. and Brese, N.E.: Bond–valence parameters for solids. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater.47, 192–197 (1991).

12.Brown, I. and Wu, K.: Empirical parameters for calculating cation-oxygen bond valences. Acta Crystallogr., Sect. B: Struct. Sci.32(31563), 1957–1959 (1976).

13.Brown, I.D.: On the geometry of OH…O hydrogen bonds. Acta Crystallogr., Sect. A: Found. Adv.32(31563), 24–31 (1976).

14.Yu, D. and Xue, D.: Bond analyses of borates from the inorganic crystal structure database. Acta Crystallogr., Sect. B: Struct. Sci.62, 702–709 (2006).

15.Mackay, A.L.: The statistics of the distribution of crystalline substances among the space groups. Acta Crystallogr.22, 329–330 (1967).

16.Urusov, V.S. and Nadezhina, T.N.: Frequency distribution and selection of space groups in inorganic crystal chemistry. J. Struct. Chem.50, 22–37 (2009).

17.Abrahams, S.C.: Inorganic structures in space group P3m1; Coordinate analysis and systematic prediction of new ferroelectrics. Acta Crystallogr., Sect. B: Struct. Sci.64, 426–437 (2008).

18.Avdeev, M., Sale, M., Adams, S., and Rao, R.P.: Screening of the alkali-metal ion containing materials from the inorganic crystal structure database (ICSD) for high ionic conductivity pathways using the bond valence method. Solid State Ionics2–5 (2012).

19.Muller, O. and Roy, R.: The Major Ternary Structural Families (Springer-Verlag, New York, 1974).

20.Pettifor, D.G.: The structures of binary compound: I. Phenomenological structure maps. J. Phys. C: Solid State Phys.19, 285–313 (1986).

21.Pettifor, D.G.: Structure maps in alloy design. J. Chem. Soc., Faraday Trans.86(8), 1209–1213 (1990).

22.Pettifor, D.G.: Structure maps revisited. J. Phys.: Condens. Matter15, 13–16 (2003).

23.Morgan, D., Rodgers, J., and Ceder, G.: Automatic construction, implementation and assessment of Pettifor maps. J. Phys.: Condens. Matter15, 4361–4369 (2003).

24.Kong, C.S., Luo, W., Arapan, S., Villars, P., Iwata, S., Ahuja, R., and Rajan, K.: Information-theoretic approach for the discovery of design rules for crystal chemistry. J. Chem. Inf. Model52, 1812–1820 (2012).

25.White, P.S., Rodgers, J.R., and Le Page, Y.: Crystmet: A database of the structures and powder patterns of metals and intermetallics. Acta Crystallogr., Sect. B: Struct. Sci.58, 343–348 (2002).

26.Villars, P. and Cenzual, K.: (ASM International/Material Phases Data System, Vitznau, Switzerland, 2010).

27.Glasser, L.: Crystallographic information resources. J. Chem. Educ. (2015). .

28.SpringerMaterials: .

29.Bale, C., Bélisle, E., Chartrand, P., Decterov, S., Eriksson, G., Hack, K., Jung, I-H., Kang, Y-B., Melançon, J., Pelton, A., Robelin, C., and Petersen, S.: FactSage thermochemical software and databases recent developments. Calphad33(2), 295–311 (2009).

30.Linstrom, P. and Mallard, W.: NIST Chemistry WebBook, NIST Standard Reference Database Number 69 (National Institute of Standards and Technology, Gaithersburg MD 20899, 2015).

31.L. MatWeb: .

32.MatNavi: . (2014).

33.Kubaschewski, O., Alcock, C.B., and Spencer, P.J.: Thermochemical Data, in: Materials Thermochemistry, 6th ed. (Pergamon Press, Oxford, 1993); , p. 376.

34.Okamoto, H.: In Handbook of Ternary Alloy Phase Diagrams, Villars, P., Prince, A., and Okamoto, H. eds.; (ASM International: OH, 1995); pp. 10378–10379.

35.Hohenberg, P. and Kohn, W.: Inhomogeneous electron gas. Phys. Rev.136, B864–B871 (1964).

36.Kohn, W. and Sham, L.: Self-consistent equations including exchange and correlation effects. Phys. Rev.140, 1133–1138 (1965).

37.Jong, M.D., Chen, W., Angsten, T., Jain, A., Notestine, R., Gamst, A., Sluiter, M., Ande, C.K., Zwaag, S.V.D., Plata, J.J., Toher, C., Curtarolo, S., Ceder, G., Persson, K.A., and Asta, M.: Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data2, 1–13 (2015).

38.Jain, A., Ong, S.P., Hautier, G., Chen, W., Richards, W.D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G., and Persson, K.A.: Performance of genetic algorithms in search for water splitting perovskites. APL Mater.1, 011002 (2013).

39.Curtarolo, S., Setyawan, W., Wang, S., Xue, J., Yang, K., Taylor, R.H., Nelson, L.J., Hart, G.L.W., Sanvito, S., Buongiorno-Nardelli, M., Mingo, N., and Levy, O.: A distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci.58, 227–235 (2012).

40.Saal, J.E., Kirklin, S., Aykol, M., Meredig, B., and Wolverton, C.: Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM65(11), 1501–1509 (2013).

41.Hachmann, J., Olivares-Amaya, R., Atahan-Evrenk, S., Amador-Bedolla, C., Sanchez-Carrera, R.S., Gold-Parker, A., Vogt, L., Brockway, A.M., and Aspuru-Guzik, A.: The Harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett.2(17), 2241–2251 (2011).

42.Ortiz, C., Eriksson, O., and Klintenberg, M.: Data mining and accelerated electronic structure theory as a tool in the search for new functional materials. Comput. Mater. Sci.44(4), 1042–1049 (2009).

43.Blokhin, E., Pardini, L., Mohamed, F., Hannewald, K., Ghiringhelli, L., Pavone, P., Carbogno, C., Freytag, J-C., Draxl, C., and Scheffler, M.: .

44.Stevanović, V., Lany, S., Zhang, X., and Zunger, A.: Correcting density functional theory for accurate predictions of compound enthalpies of formation: fitted elemental-phase reference energies. Phys. Rev. B: Condens. Matter Mater. Phys.85(11), 1–12 (2012).

45.Landis, D.D., Hummelshøj, J.S., Nestorov, S., Greeley, J., Dulak, M., Bligaard, T., Norskov, J., and Jacobsen, K.: The computational materials repository. Comput. Sci. Eng.14, 51–57 (2012).

46.Hummelshøj, J.S., Abild-Pedersen, F., Studt, F., Bligaard, T., and Nørskov, J.K.: CatApp: A web application for surface chemistry and heterogeneous catalysis. Angew. Chem., Int. Ed. Engl.51(1), 272–274 (2012).

47.Togo, A. and Tanaka, I.: First principles phonon calculations in materials science. Scr. Mater.108, 1–5 (2015).

48.Togo, A.: PhononDB at Kyoto University (

49.Gorai, P., Gao, D., Ortiz, B., Miller, S., Barnett, S.A., Mason, T., Lv, Q., Stevanović, V., and Toberer, E.S.: Te design lab: A virtual laboratory for thermoelectric material design. Comput. Mater. Sci.112, 368–376 (2016).

50.Yuan, G. and Gygi, F.: Estest: A framework for the validation and verification of electronic structure codes. Comput. Sci. Discovery3(1), 015004 (2010).

51.Pence, H.E. and Williams, A.: ChemSpider: An online chemical information resource. J. Chem. Educ.87(11), 1123–1124 (2010).

52.Lin, L.: Materials databases infrastructure constructed by first principles calculations: A review. Mater. Perform. Charact.4, (2015).

53.Ong, S.P., Richards, W.D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D., Chevrier, V.L., Persson, K.A., and Ceder, G.: Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci.68, 314–319 (2013).

54.Bahn, S. and Jacobsen, K.: An object-oriented scripting interface to a legacy electronic structure code. Comput. Sci. Eng.4(3), 56–66 (2002).

55.Curtarolo, S., Setyawan, W., Hart, G.L., Jahnatek, M., Chepulskii, R.V., Taylor, R.H., Wang, S., Xue, J., Yang, K., Levy, O., Mehl, M.J., Stokes, H.T., Demchenko, D.O., and Morgan, D., AFLOW: An automatic framework for high-throughput materials discovery, Comput. Mater. Sci.58, 218–226 (2012).

56.Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N., and Kozinsky, B.: AiiDA: automated interactive infrastructure and database for computational science. Comput. Mater. Sci.111, 218–230 (2016).

57.Jain, A., Ong, S., Chen, W., Medasani, B., Qu, X., Kocher, M., Brafman, M., Petretto, G.,

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.

Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE [1] or PubMed Central (PMC) [2], emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) [3]. In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction [4], feature extraction [5], soft and rough computing [6], semantic text mining [7], and scalable classification methods [8]. In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.

Contest Participation Rules:

  • The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons. Each team needs to designate a leader responsible for communication with the Organizers. One person may be incorporated in maximally 2 teams.
  • The total number of submission for any single team is limited to 200 solutions.
  • Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 1000 words and they should be sent in the pdf format to by April 2, 2012. Only submissions made by teams that provided the reports will qualify for the final evaluation.

JRS 2012 conference special session: There will be a special session at the JRS 2012 conference devoted to the competition. We will invite authors of selected reports to extend them for publication in the proceedings (after reviews by Organizing Committee members) and presentation at the conference. The invited teams will be chosen based on their rank and innovativeness of approach.

Awards: Top ranked solutions (based on the final evaluation scores) will be awarded with prizes:

  • First Prize: 1,000 USD + free JRS 2012 conference registration,
  • Second Prize: 500 USD + free JRS 2012 conference registration,
  • Third Prize: free JRS 2012 conference registration.
Additionally, at the conference, authors of all papers accepted for presentation at the special session will receive a diploma and a competition T-shirt.


  • Jan. 2, 2012: start of the challenge, data sets become available,
  • Mar. 30, 2012: deadline for submitting the predictions,
  • Apr. 2, 2012: deadline for sending the reports, end of the challenge,
  • Apr. 6, 2012: on-line publication of final results, sending invitations for submitting short papers for the special session,
  • May 10, 2012: deadline for submissions of camera-ready papers selected for presentation at the JRS special session.

Contest Organizing Committee:

  • Andrzej Janusz (Chairman), University of Warsaw
  • Hung Son Nguyen, University of Warsaw
  • Dominik Ślęzak, University of Warsaw & Infobright Inc.
  • Sebastian Stawicki, University of Warsaw
  • Adam Krasuski, Main School of Fire Service & University of Warsaw


[1] National Library of Medicine: PubMed: The Bibliographic Database. In McEntyre J., Ostell J.(Eds.): The NCBI Handbook. Available online,

[2] National Library of Medicine: PubMed Central (PMC): An Archive for Literature from Life Sciences Journals. In McEntyre J., Ostell J. (Eds.): The NCBI Handbook. Available online,

[3] National Library of Medicine: Introduction to MeSH - 2012. Available online (2012),

[4] Greco S., Pawlak Z., Słowiński R.: Generalized Decision Algorithms, Rough Inference Rules, and Flow Graphs. J. J. Alpigini, J. F. Peters, A. Skowron and N. Zhong (Eds.): Rough Sets and Current Trends in Computing 2002, LNCS 2475, Springer-Verlag, London, UK (2002)

[5] Guyon I. et al.: Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing. Springer (August 2006)

[6] Hassanien A. E., Suraj Z., Ślęzak D., Lingras P. (Eds.): Rough Computing: Theories, Technologies and Applications. Idea Group Inc (2007)

[7] Stavrianou A., Andritsos P., Nicoloyannis N.: Overview and semantic issues of text mining. SIGMOD Rec. 36, 3, pp. 23-34, (September 2007)

[8] Nguyen H. S.: Scalable Classification Method Based on Rough Sets. In Alpigini J. J., Peters J. F., Skowronek J., Zhong N. (Eds.): Rough Sets and Current Trends in Computing 2002, LNCS 2475, pp. 433-440. Springer-Verlag, London, UK (2002)