Interoperability of electronic health records using Semantic Knowledge Graphs. A use case applied at the UTPL University Hospital

— Patient medical information is diverse, extensive and of high value in supporting informed medical decision-making. This information is highly complex, is distributed among different systems, presents high heterogeneity, is stored in different formats, and has different structuring levels. The management of this information poses interoperability challenges in tasks related to data integration and reuse. In this paper, an alternative is presented to face these challenges using semantic technologies. We propose to transform this heterogeneous, distributed, and unstructured information in a way that ensures high interoperability, reuse, and direct processing by machine agents. The pilot of this proposal was developed at the UTPL Hospital.


INTRODUCTION
The national health ecosystems encompass public and private medical care systems, individuals, organizations, technologies, protocols, and processes involved in the management of different forms of information on patients' health. Each technology of the health system has specific information management languages and standards. These specific capabilities hinder the connection of the systems that intend to integrate, share, exchange, and access to all the forms of health information. This information includes clinical notes, surgical descriptions, medical examinations, complementary laboratory studies, diagnostic imaging, pharmacological prescriptions, billing data, insurance information, social and cultural information of the patient, and multimedia, among others.
These environments are home to many disparate systems, and few are interconnected or automatically exchange information between each other. For example, for a doctor to have a complete view of the medical history of a patient, they will have to be authorized and login in several systems to see the previous procedures, the x-rays, the medical prescriptions, and the laboratory results. The information is often duplicated, ambiguous or incomplete, which impairs patient care.
In view of the lack of standardization and of architectures that enable integration and interoperability, different ecosystems are experimenting and implementing strategies to connect their systems. The use of electronic medical records based on universal standards is encouraged, with programs exerting pressure towards a generalized adoption that improves information integration and the interoperability of the health systems.
The objective of interoperability is to eliminate the information silos across the health data sources, as well as to connect the systems that are related to the patients' outcomes. In a general manner, the main types of architectures that are used to coordinate the exchange of medical care data are the following: centralized architectures -the patient's data are stored and managed in a central database with full control over the information; decentralized architectures -the databases are interconnected but are independent and the information exchanges are specific; and the hybrid architectures [1].
Interoperability allows that the medical care systems work together, which ultimately leads to better provision of medical care and to better outcomes for the patients. By means of interoperability, the efficiency of the processes is enhanced because paperwork can be replaced by digital tools that reduce the opportunity cost of time and resources. By using privacy and security measures, the patient' medical records and other confidential information can be safeguarded.
In Ecuador, the Public Health Ministry (Ministerio de Salud Pública, MSP), by means of the IS4H (Information System for Health) projects promoted by the Pan American Health Organization, works towards strengthening the health information systems through the implementation of electronic clinical documents and interoperability standards, enabling greater capability in the management of the medical information generated in the Public and Complementary Health Network (for-profit and non-for-profit private health providers), with the aim of consolidating the National Electronic Clinical History project. Currently, the MSP has implemented the Electronic Clinical History System in more than 1,500 operational units, recording around 15 million service appointments [2]. On the other hand, the institutions from the complementary network, as is the case of the UTPL Hospital (Hospital UTPL, HUTPL), have trademark systems for the management of electronic clinical histories.
Due to the heterogeneity present in the storage formats, standards and structuring level of the data, the health management systems are complex and expensive in tasks related to information integration and exchange. This complexity imposes challenges in process automation and in decision-making. Further, diversity of electronic medical record systems in health institutions warrants a strategy to exchange information without having to resort to a unified national system, but to the use of semantic web technologies. The objective of this work is to contribute to the interoperability efforts of medical data in our region through an architecture for the semantic transformation of information related to electronic medical records. Specifically, our work focuses on improving the ability to exchange data and meanings of electronic medical records, using the Semantic Knowledge Graphics approach. Therefore, an architecture for the semantic transformation of medical data is proposed and a case study was developed to pilot this architecture. The pilot was developed from electronic medical records managed by a private hospital that is part of the Complementary Health Network of Ecuador.
This article includes a related works section, where semantic techniques and resources (controlled vocabularies, thesaurus, and ontologies of biomedical concepts) used in medical information projects were reviewed, as well as approximations based on medical standards, the Semantic Web [3], Linked Data and Semantic Knowledge Graphs. In the Proposed Architecture section, the extraction, cleaning, anonymization, semantic modeling, transformation, linking and publication of medical data cycle is described; as an important aspect, we must mention the identification of classes, relationships and properties based on the Unique Clinical History Forms [4], which are the standard at the national level for recording clinical information. An ontology was created, in which concepts, properties, relationships and restrictions were coded; and the classes, entities and properties were mapped to SNOMED CT and ICD 10 terminologies. A transformation process is defined, for the transformation process, the information conversion process to RDF was performed, observing the design guides of the linked data (URIs, HTTP, RDFx/SPARQL and linked). Below we report a case study in the outside consultation service of the HUTPL in Ecuador. Finally, the results obtained are discussed, and conclusions and future papers are reported.

II. RELATED PAPERS
This section analyzes the scientific literature related to the semantic description of medical data, as well as the current ways to improve the capability of exchanging data and meanings across heterogeneous systems. Semantic interoperability improves the capability of exchanging information related to the medical care provided to the patients. Attaining semantic interoperability in medical information is a challenge.
In [5] present below a partial selection of implementations of interoperability standards gathered from different countries, such as French, United Kingdom, United States of America, The Netherlands, Australia, and Turkey. These countries have implemented HL7-CDA R2, LOINC, SNOMED CT, HL7 FHIR, among others. In order to improved interoperability in their national health systems.
In Ecuador, in ministerial agreement 1190, the Ministry of Public Health approves the use of the Health Level Seven (HL7) standard and ISO / TC 215 [6]. TC 215 is the technical committee that works on the standardization of Information Technologies and Communications in Health (ICT), to allow compatibility and interoperability between independent systems. Among the standards that the TC 215 working groups have delivered, is the ISO 13606 standard, Health Informatics. Then again, the Ecuadorian Institute for Standardization (INEN), the body in charge of the analysis and adoption of quality standards in the country. By ministerial agreement 1190, it has carried out the analysis and adoption of parts 1 and 5 of the international standard ISO 13606. In order to the country's health institutions to have a standard that regulates the communication of information generated in electronic health records.
A proposal made by [7], [8], [9] to promote semantic interoperability in electronic health records presents the most used health standards, such as openEHR, HL7, DICOM, and EN13606. On the other hand, they define the challenges implied in attaining semantic interoperability in electronic health records, first due to the excessive number of existing pieces of information, to the different standards used in information storage, and to their reduced structuring. The authors propose the use of diffuse ontologies, along with the use of medical terminologies as a way to solve these challenges and attain interoperability in health information systems and, as future work, they mention applying their proposal on real data in a hospital.
In the [10] paper, a model for the conversion of clinical information to RDF is described; the use case is applied to information of patients with dengue. The Text2Ontology framework is used to extract text relationships, with a limited capacity. In addition, the TypedDependency algorithm based on the analysis of typified dependency is used to extract RDF data from the of the patients' case sheets and for their later conversion in RDF models. A semantification approach was also designed to map the clinical data extracted from the patients' reports to their corresponding triplets, for their later conversion to RDF.
On the other hand, [11] presents a semantic transformation approach with three steps: 1) transformation of medical data from heterogeneous sources to RDF, 2) application of semantic conversion rules to obtain data as instances for the CDM (Common Data Models) ontological model, and 3) population of the repositories, which meet the CDM specifications, by means of processing the RDF instances generated in step 2. The proposed approach has been applied in real health care environments where the Observational Medical Outcomes Partnership (OMOP) has been chosen as the common data model.
Other work related to the use of semantic web technology is [12]. This research paper proposes an interoperability solution for the tuberculosis treatment and follow-up scenario in Brazil supported by an ontology. Like a result an interoperability layer was developed to retrieve data with the same meaning and in a structured way enabling semantic and functional interoperability. Thus, health professionals could use the data gathered from several data sources to enhance the effectiveness of their actions and decisions. On the other hand, in [13], a mechanism is proposed to build ontologies of healthcare data, which are subsequently stored in a triplestore. Subsequently, for each built ontology, the syntactic and semantic similarities with the different HL7 FHIR Resources ontologies are calculated, based on their Levenshtein distance and their semantic fingerprints accordingly. That work concluded, that according to the mechanism's evaluation results, it is almost impossible to create syntactic or semantic patterns for understanding the nature of a healthcare dataset. Therefore, further work should be done in evaluating the developed mechanism, comparing it with similar ontology matching mechanisms and medical data of multiple nature.
Finally, in [14], [15], [16], [17], [18], and [19] show the use of knowledge graphs to improve queries of complex medical information. To identify hidden and unknown relationships between patients and treatment outcomes, or drugs and allergic reactions for given individuals. To create knowledge graphs to identify specialized classes for tumor diagnosis or improve biomedical relationship extraction tasks. Additionally, frameworks to provide a secure and semantic approach to facilitate secure data sharing among healthcare organizations.
The future roadmap for semantic interoperability in healthcare systems is highly promising. The efforts of several of the papers reviewed focus on the creation of standards, class identification mechanisms, certification, and security. This paper presents an application case to model the clinical data stored in the electronic clinical records as Semantic Knowledge Graphs, by using the Semantic Web view and the Linked Data technologies.

A. Approach overview
In general, the patients receive care from a series of clinics, independent medical offices, and hospitals. These individual interactions constitute the totality of their medical history. This history documents past symptoms, procedures applied, vaccines, vital signs along time, medical examinations, allergies, and complications, among others. Not having access to all those data points due to lack of integration is dangerous as maximum, and at least extremely inconvenient for the patient.
The main benefit of interoperability is total visibility and access to the patient's data, both for the health institution and for the patient. In this sense, the general model of this work has focused on the integration of the various sources of information generated from the different clinical areas. This information should be processed through extraction, anonymization, and cleaning tasks. Then to be mapped to EHR standards, terminologies, as well as with ontological and non-ontological resources.
In Fig. 1, the EHR component describes patient information that is collected through different health systems. In the context of Ecuador, the Public Health Ministry of Ecuador has established the use of the Unique Clinical History forms as the national standard. We have used these forms as a base for the identification of the concepts and relationships of the outside consultation service that we intend to model. Most of the data in the electronic health records systems are not normalized under standards; the systems present interfaces for free text digitalization, where the health professionals describe the information briefly or with acronyms, lacking the necessary attributes to be quality data. This hinders text processing and information understanding by machine agents. Together with techniques such as natural language processing and ontologies, the UMLS (SNOMED CT -Systematized Nomenclature of Medicine -Clinical Terms, ICD-10 -International Classification of Diseases, etc.) thesaurus or terminologies help to identify the meaning and the relationships existing between the entities of an electronic health record, as can be seen in Fig. 1. Therefore, it is necessary to apply strategies that facilitate information management. The information extracted and mapped with ontological resources must then be transformed into a medical information knowledge graph. The resulting semantic knowledge graph has the potential to be used in actions focused on improving the semantic exchange of electronic medical records, reducing clinical errors, improving safety, chronic disease management, enhance clinical research, see Fig. 1.

B. Interoperability standards
Among the main standards that include reference models, clinical data structure definitions, message exchange, are ISO 13606, openEHR, HL7 FHIR, the objective of these standards is to support the representation of clinical information. One of the most popular interoperability standards to join heterogeneous systems and implement applications for interoperability and health information exchange is FHIR, or Fast Healthcare Interoperability Resource. FHIR provides the specification required by the systems to exchange data. The standard is proposed by Health Level Seven International (HL7) [20], a non-for-profit standard development organization accredited by ANSI devoted to providing a comprehensive framework, as well as standards related to the exchange, integration, and recovery of electronic health information.
OpenEHR is a health informatics standard, consisting of open specifications, clinical models and software that can be used to create standards and create information and interoperability solutions for healthcare [21].
ISO 13606 is an International Organization for Standardization (ISO) standard, originally designed by the European Committee for Standardization (CEN). This standard defines a rigorous and stable information architecture for communicating part or all of a patient's electronic health record (EHR) between EHR systems, or between EHR systems and a centralized EHR data repository system [22].
An important work in relation to promoting the creation of interoperable medical systems is Unified Medical Language System (UMLS), a set of files and software that gathers health and biomedical vocabularies (CPT, ICD-10, MeSH, SNOMED CT, DSM-IV, LOINC) and standards. UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote the creation of more effective and interoperable biomedical information systems and services, including electronic medical records, classification tools, dictionaries, and language translators [23].
The Semantic Web [3] is one of the current paradigms that improve interoperability and data structuring in the Web. The Semantic Web is a Web extension that adds semantics to the current format of data representation. The semantic web technologies are driven by W3C (World Wide Web Consortium). These technologies allow for the global identification of information resources, for the creation of knowledge items in the Web, for the creation of vocabularies, and for writing rules to handle the data through RDF (Resource Description Framework) and SPARQL [24]. Linked data refers to the application of best practices in order to publish and connect the information [25] so as to improve interoperability in the systems [26].
Among the different ways to represent knowledge in the Semantic Web are the ontologies and the vocabularies. They are used with the objective of attributing sense and meaning to the content of the documents, serving as a tool for knowledge representation, for example, OWL (Web Ontology Language), RDFS (Resource Description Framework Schema) and SKOS (Simple Knowledge Organization System) [24].
In [27], an ontology is defined as a common, sharable and reusable view of a specific domain. Therefore, ontologies can be used to construct semantic representations of the domain knowledge so that they can be interpreted by machine agents, which will allow them to make inferences on assertions in a specific domain model. In addition, ontologies accelerate the implementation of interoperability due to the availability of robust tools and technological frameworks that promote information reuse [28].

IV. PROPOSED ARCHITECTURE
The overall medical data transformation moves through a life cycle, as depicted in Figure 2. Through the six phases, we explore how established best practices can be used to transform medical data into a graph of semantic knowledge. The stages in life cycle are: 1) Visioning process -developing a requirements specification by identifying the intended purpose and scope; 2) Selection of data sources -the phase analyzes and specifies the internal and external data sources to perform the data transformation; 3) Ontology development, the phase includes the process of acquisition of medical knowledge and re-using related resources from which the ontology will be built. 4) RDF generation and publication, including data cleansing, disambiguation, anonymization, transformation, linked and publication; 5) RDF Exploitation, the phase covers the production of end-user interfaces that provides discovery and enhanced navigation. 6) Continual improvement is the ongoing improvement of the data or processes through incremental improvements.

A. Analysis of the data sources
The purpose of this phase is the verification and selection of the various sources of information in which a health medical record is stored, such as structured and unstructured data, documents, videos, among others.
The modeled clinical information was obtained from the database of electronic health records maintained by the UTPL Hospital. The tables involved correspond to information of patients, medical staff, medical specialties, record of personal and family history, record of vital signs, record of visits in outside consultation, orders for complementary tests such as laboratory tests, diagnostic imaging, and drug prescription.

B. Personal data and legislation
Ecuador recognizes the protection of personal data as a fundamental right, based on its Constitution, as well as on international human rights treaties that protect privacy, but it lacks an internal legal structure to guarantee such protection, which leaves patients defenseless. Ecuador is one of the few countries in Latin America that does not have a Personal Data Protection Law, while other countries in the region have done so for more than 20 years or have recently approved it [29]. However, Ministry of Public Health (MSP) of Ecuador supports its action in the health legislation and the principles contained in the Constitution in order to protect the privacy of personal data. On September 19, 2019, the President of the Republic of Ecuador submitted to the National Assembly the Draft Organic Law on Personal Data Protection. This law is still under discussion in the assembly. Once this norm is approved, it will guarantee security in the handling of Ecuadorians' information. In addition, it will make it easier for health institutions to create robust regulations that facilitate access to information without risking the privacy of their patients.
In the context of this work, the data that are part of the scope of the case study were anonymized. According to the legal framework in force in Ecuador, there is a prohibition to disclose medical information without the express authorization of its owner, because it contravenes the right to confidentiality and protection of personal data of Ecuadorians. The following legislation has been considered in this case study: The Constitution of the Ecuadorian State, in its Art. 66 1 , numeral 19, establishes as a citizen's right "to protection of personal information, including access to and decision about information and data of this nature, as well as its corresponding protection. The gathering, filing, processing, distribution or dissemination of these data or information shall require authorization from the holder or a court order.". Art. 7 of the Organic Health Law 2 states that every person has the right to have a "single clinical history written in precise,  3 , states that: "Every patient has the right to have the consultation, examination, diagnosis, discussion, treatment and any type of information related to the medical procedure to be applied to him/her, have the character of confidentiality". Art. 6 of the Law of The National System of Public Data Registry 4 declares confidential personal data, such as state of health, and other data related to personal privacy. Access to this data will only be possible with the express authorization of the owner of the information, by law or by court order.

C. Data anonymization
Differences exist between nations in the definitions, approaches, and legal practices about de-identification or anonymization [30]. Data holders can mitigate patient privacy risk through data reduction techniques. Anonymization is a data processing technique that eliminates or modifies personal identification information; its result is anonymous data that cannot be associated with any person. In this way, patient privacy is improved, and the risk of data breaches and cyberattacks is reduced. It is also a critical component of the commitment inherent to the HUTPL and of the scope of this paper regarding privacy.
By ensuring anonymous data, safe and valuable semantic features will be available, all while safeguarding the users' identities. We can also share anonymous data externally in a safe manner, making them useful for others without compromising patient privacy.
The recommendations proposed in [31] are that data holders generate de-identified datasets by masking or generalizing direct and some indirect identifiers. Additionally, access control and the use of appropriate security levels to transfer data or providing access; one solution is the use of a secure "locked box" system.

D. Reuse of ontological resources
Attaining the integration of computerized health systems depends on an interoperability base semantically rich in systems, processes, and information. The interoperability standards and the use of ontologies related to medical care can facilitate access to information, by increasing the accuracy of the searches for information which would otherwise present problems like polysemy, ambiguity and synonyms that increase and diffuse the results. Through the state-of-the-art, ontological, and non-ontological resources related to Health Care and Life Sciences have been detected.
Among the ontologies that we have used to model the information is the SNOMED CT multilingual taxonomy, which includes more than 300,000 concepts of clinical terms divided into 19 categories and with support in five languages [32]. The concepts and relationships of this taxonomy are available for download, as well as the OWL SNOMED CT guide, which includes detailed information on the structure, content and use of OWL reference sets [33].
The ICD 10 international classification of diseases was also used for the representation of the medical diagnoses. These classifications are part of the set of UMLS files and software [34].
Another ontological resource used is the Anatomical Therapeutic Chemical Classification (ATC) ontology [35], which allows for the classification of drug active components according to the organ or system on which they act and to their therapeutic, pharmacological, and chemical properties. Ecuador has adopted this international classification to elaborate the National Chart of Basic Medications, which is the regulation in effect in the pharmacological scope in the country[36].
These ontologies add context to the patient's medical history documents, and automatically create links between the concepts, terms, medications, diagnoses and procedures, laboratory tests and exams. As a result, the queries are more effective, and the results are closer to the search terms.

E. Creation of the ontology
The ontology has been developed in Protégé and covers the domain of the electronic clinical documents of the outside consultation area. The definition of classes, relationships and properties is based on the Unique Clinical History forms established by the MSP. A partial view of the ontology can be seen in Fig. 3. 4 Law of The National System of Public Data Registry (2010), see https://www.globalregulation.com/translation/ecuador/3350052/law-of-thenational-system-of-public-data-registry.html

F. Data mapping and ontology
In this part, the work consisted in mapping some classes of the electronic health records system with the existing ontologies (OWL Snomed CT, ICD10CM, ATC).
The SNOMED CT version of February 2020 was downloaded from the official site of the National Library of Medicine [37]. In order to transform the taxonomy concepts to OWL, the Snomed OWL Toolkit [38] was used, which is an open code kit that is used in SNOMED's international audit environment. The transformation can be performed through the command line or by implementing an API (Application Programming Interface) in Java [39].

G. Data cleaning and conversion to RDF
For the extraction of information from the electronic health records system, views in SQL were created. The identification information corresponding to patients and doctors was anonymized through an MD5 hash function of the database. The process performed in this stage is described in detail in the following section.

V. APPLICATION CASE: UTPL HOSPITAL
The system for the electronic management of the medical records maintained by the HUTPL allows obtaining information required by the State health bodies, such as the MSP, through the Unique Clinical History forms. With the current computerized system, the doctors working in the HUPTL must assume the daily task of typing all the medical information into the electronic clinical history system. This hinders the processing of the content stored as free text, added to the fact that it is not possible that both data and meanings are directly understood by the machine agents.
The motivation of the present paper lies in improving the medical care practices and the development of better medical products from the capability of sharing and linking the medical data collected. By obtaining results in real time from different systems, the medical practices can reduce the number of repetitive tasks and enhance the quality of the care provided, and even treat more patients. The patients will have more control over their own data in an environment where their privacy is safeguarded. The remaining administrative, clinical and information reporting functions will be simplified due to the integration of data and knowledge coming from dispersed systems.

A. Project scope
A sample of medical records from 1,300 patients of the outside consultation service of the HUTPL was considered. Both the patients' and the medical staff's identification information was anonymized, ensuring due privacy.

B. Data sources
The information was extracted from the relational database that manages the electronic medical records systems maintained by the HUTPL. Around 40 tables of the database scheme were used. The data dictionary of each table was obtained in order to identify the field to be used for the SQL views. This allowed us to reduce the complexity of data extraction. As an example, Table 1 presents the information of the Patients entity. Although the ontology gathers all the patient's identification information, this information is not modeled for the case study.
A mirror version of the HUTPL database was created and modification techniques were applied, such as the combination/substitution of characters, which allowed masking some of the data. This hampers to the extremes or even make impossible to conduct reverse engineering or detection.
The individual identification data of the clinical history were anonymized in order to minimize the risk of disclosing patient's information.

C. Data anonymization
There are certain data elements that connect more easily to certain individuals. To protect them, we use generalization to eliminate part of the data or replace some part with a common value combined with some level of diversity in the sensitive values. For example, when substituting segments from all the ZIP codes or telephone numbers by the same sequence of numbers, combined with certain level of diversity. Therefore, we would not be able to identify any person by their ZIP code.
In addition, differential privacy [40] has been used to identify each individual; this is a technique to add mathematical noise to the data, so as to mask them. It is to be noted that, in this way, it will be difficult to determine if a person is part of the dataset, because the output of a given algorithm will look essentially the same, regardless of whether the person's information is included or omitted.
Anonymization is a process that we use to maintain our commitment to the privacy of the people involved. Other data governance processes to ensure a consistent level of protection in the entire system include the following: strict controls over the access to the user's data, policies to control and limit the use of datasets that can identify the users, and centralized review of the anonymization techniques [41].

D. Ontology design
In this phase, the ontology classes, and properties (coming from data and from object properties) were identified, as well as the base classes of the existing ontologies with which they are linked were determined. The latter action helps to attain interoperability of the HUPTL data with other systems.
In order to simplify the names of the classes, prefixes are used to identify the name spaces. Thus, hutpl corresponds to the name space of the outside consultation ontology. The prefix sct identifies the classes of the SNOMED CT taxonomy. For the description of the person-type classes (Person class), the name space foaf is used and, for the semantic description of organizations (Organization class), the name space org is used.
According to the data sources contained in the scope of this paper, 49 concepts were identified. Classes are described in Table II. The classes were categorized according to the SNOMED CT taxonomy. The main class is hutpl:OutsideConsultation, which is linked to all Electronic clinical document classes. From this class, requests for complementary examinations, medication prescription, and patient personal and social data are generated. In addition, in Table III shows a description of the properties identified.

E. Semantic Description of Outside Consultation
The Ecuadorian national standard for medical data collection is composed of 18 Unique Clinical History Forms. The forms that were modeled through the ontology are the following: Form 002 -Outside Consultation, Form 010 -Clinical Laboratory, Form 012 -Imaging, Form 013 -Histopathology, Form 007 -Consultation specialty (interconsultation), Form 022 -Drug Administration, Form 053 -Reference. In the ontology for outside consultation, 12 of the 19 categories from the OWL SNOMED CT class hierarchy were used. Fig. 4 shows the categories that were reused. Around 1,500 concepts were exported among classes, properties, axioms, and equivalences between classes. The outside consultation ontology classes were created as subclasses of OWL SNOMED CT. This enhances the interoperability of the Unique Clinical History Forms, which are mapped to one of the most widely used terminologies at the international level. Other ontologies used are ATC and ICD 10 -CM, which were fully loaded into the outside consultation ontology.

F. Mapping of data sources with the ontology
In this stage, the data obtained in the Ontology design section are represented with the classes of the outside consultation ontology. We describe the mapped classes in Table IV. G. Data pre-processing, prior to conversion to RDF SQL views were created in order to dynamize the conversion process and reduce the complexity of the extraction of data stored in multiple tables in the electronic health records system.

H. Transformation to RDF
The mapping schemes indicated in the Ontology design y Mapping of data sources with the ontology sections were used and applied to the information obtained in the pre-processing. Based on that, the information was transformed to RDF. Fig. 5 shows the RDF graph of a patient's visit in outside consultation.

I. RDF storage
The information was stored in GraphDB, an RDF [43] database that handles large volumes of linked data. In Fig. 7 shows the active repository of the electronic clinical documents that were uploaded. A total of 2 million medical consultation triples were uploaded to the RDF database. GraphDB allows navigating the class hierarchy. An example of a class is shown in Figure 8. Class OutsideConsultation is presented with domain and range.  Figure 9 shows the dependencies between 10 classes. The resource explorer facilitates the navigation of RDF data. In Fig. 10, it shows the statements of a lab order. Another option to query RDF data is through SPARQL. Fig. 11 shows a query of patients diagnosed with diseases of the respiratory system.

VI. DISCUSSION
One of the most challenging problems in the health systems is the capability to integrate data and interoperate across the different information systems. However, some clinical management systems are not standardized with clinical standards, which makes it difficult to map information with medical terminologies and standards. In our case study, the UTPL Hospital's clinical data management system only has a national standard for data collection (Unique Clinical History Forms). Its architecture does not contemplate the HL7 standards for data transmission and exchange, and it is not associated with any semantic clinical terminology such as SNOMED CT. To this fact it is added that, generally, the data are not complete, that is, they do not contain sufficient information to describe the context that is necessary to adequately understand them. The input of information into the system is not adequate, we have evidenced the use of non-formal terms of medical description and the use of acronyms or abbreviations. The use of medical data without an adequate processing does not contribute enough value. In this work the costliest tasks have been data cleaning and mapping of data sources with the ontological resources, due to the difficulties mentioned above.
The semantic web uses vocabularies and ontologies to create a common language that results in more precise and unambiguous specifications. This is a key aspect, to enable the exchange of information between healthcare institutions. In fact, the Semantic Web establishes a relationship between the different information sources by mapping the data with repositories of biological, medical, and clinical practice terminologies and taxonomies. And it allows the extracted knowledge to be modeled. For example, if myocardial infarction is described in the medical history of a patient, it acquires meaning according to the specific context. The meaning could be a statement indicating that the patient was diagnosed with myocardial infarction or that it is a family history of the patient. It could also be used to document a smoking complication or a drug counter-indication. It can refer to the application of a protocol in a cardiac complication or be considered as a possible diagnosis that justifies a laboratory test or diagnostic imaging, or even the exclusion of a diagnosis because of the tests. The terminologies and ontologies of the medical scope allow mapping this information according to the context, which not only allows improving the quality of the patient's clinical history but also facilitates an accurate recovery of the data; for example, it is incorrect to recover those with a family history of myocardial infarction when searching for patients with a myocardial infarction diagnosis.

VII. CONCLUSIONS AND FUTURE WORKS
The preliminary case study results showed that the Semantic Web approach and the linked data technologies applied to the electronic medical records worked as expected. The use of interoperability standards (URI, RDF, SPARQL, OWL) also reinforces the Semantic Web to attain high levels of interoperability, as they allow representing the diversity of information sources in different types of formats [26]. By means of this paper, we have experimented how the Semantic Web technologies are a feasible alternative to solve data interoperability and integration problems, as well as to improve data semantics in the electronic clinical histories. The use of semantic resources like SNOMED CT or ICD 10 CM facilitates the identification of the knowledge contained in the clinical histories and allows us to have quality data because the specifications are more accurate and unmistakable.
The proposal offers a perspective to solve the need to have a health ecosystem at the national level for the exchange of data and semantics across different health organizations and networks. This would be significantly useful for the patients and reduce the costs for all members of the ecosystem, as the providers gradually adopt more intuitive and faster systems and become connected to all the necessary data sources.
The proposal is aligned with the view to structuring medical information towards a global database, with linked entities and meanings. These linked entities and meanings are important to improve the current health tools for doctors and patients, as well as to further enhance the local and national health services. Medical information interoperability is the driving force to provide real-time results and improve quality of care. Ultimately, both health professionals and patients want sanitary efficiency and efficacy.
The benefits for the researchers are centered on the fact that information semantics will allow them to find more evidence to propose more accurate therapeutic treatments for the diseases, while the doctors will have better tools at their disposal for individualized clinical treatments of their patients. The States also reap benefits from the information, which allows them to elaborate evidence-based public policies and allocate resources to the areas that need them most.
As we have already exposed in this work, information exchange in medical care, as is the case with the all the other facets of modern life, is fundamental for the digital transformation. While interoperability addresses this need, it also increases the quality of the health care received, improves medical care efficiency and even its optimization. In this sense, the capability to transform structured and unstructured medical information into a Semantic Knowledge Graph is a key component in the process to advance towards the availability of patient-related information in semantic medical records capable of allowing high levels of integration and interoperability.
Having an efficient and effective way to manage health data at their disposal can help doctors achieve a high probability of adequate diagnoses and treatments. The main challenge to attain this ambitious objective, which is our future work, is not only to allow for the integration of data encompassing heterogeneous sources and formats but also the development of instruments and good practices that allow conducting flexible searches, searches based on natural language processing, medical assistants, automated annotations of medical texts, data analysis, and easy-to-use interfaces.
We plan to advance towards the mining and machine learning techniques to find useful patterns and knowledge in the unstructured data which are freely introduced in the medical records. Refine identification of knowledge in the Spanish language and in written text in acronyms or abbreviations of medical terms. This will enable techniques for the detection or typing of symptoms and signs related to certain diseases. Tasks of medication and adverse drug event relation extraction. On the other hand, identification of hospital processes to reduce costs due to excessive requests for unnecessary complementary diagnostic studies. Finally, given that the HUTPL is a university hospital that supports the training of medical students, it is an important step to transform the anonymized medical information into linked data that can be used in various educational resources [44].