Students in Information Management Teams have built a cloud-connecting data transfer process

The phase one team: Alex Thorpe, and Olga Matytsina, and Ted Maurstad,

and the phase two team: Matt Janachek, Christian Ellison, & Scott Conover

are members of the Students in Information Management (SIM) club. With support from Dr. Byron Marshall, the club’s Faculty Advisor the students worked with Malcolm LeMay, Gwen Wolfram, Nitan Mohan and others in the college of business (COB) to improve the flow of data.

Focus: Moving data from Digital Measures to Drupal, SQLServer, and Salesforce.com

Purpose: Facilitating data transfer between multiple, cloud-based systems

Techniques and Technologies: C#, Credentialed HTML access, XML, SQL stored procedures, Salesforce data loader

What the project means

Today’s organizations often use multiple use cloud-based systems and need to accurately transfer data from one to another. The scholarly data project explores a diverse microcosm of ETL (extract transform load) issues and technologies.

The process focuses on disseminating data about publications from faculty curriculum vitae (CV). The COB faculty generates journal publications, books, conference submissions, presentations, and other intellectual contributions. These contributions are used to assess faculty performance, to promote the value and attractiveness of the college to various stake holders, and to support the accreditation of the institution. Contribution records are initially entered by faculty members in a cloud-based system developed by Digital Measures to generate materials for performance assessment. However, the college promotes itself using a Drupal web site which presents some of the same data and other integration and analysis tasks requiring that data are accomplished SalesForce.  Our task is to extract the data from digital measures, transform it for sharing, track changes, and load it into other systems.

Organizational challenges

This project demonstrates some of the technological and organizational complexities related to the repurposing of data. The faculty members who enter the data in the Digital Measures system use it to create an integrated report of their educational, scholarly, and service activities over time. The Digital Measures system nicely supports this task but the COB web presence is managed in Drupal, a widely-used content management system and additional analysis is done both in an SQLServer environment and on the SalesForce platform. Each of these platforms needs the data publication data but comes with its own set of organizational and technological features and constraints. Digital Measures provides access through a secure web link that serves up data in XML. Drupal and its ‘Biblio’ module support a variety of value-added bibliographic functions but the resources are appropriately and tightly secured by Oregon State’s Central IT folks. Real-time linkages between the Drupal and Digital Measures systems are possible but would invoke a variety of security and reliability concerns.

Components we built

  • A C# program supplies and caches credentials to programmatically request and ingest XML files containing the desired data. This data is processed and reorganized into a set of flat files with lists of articles, presentations, authors, and linkages.
  • One of the flat files is uploaded to the Biblio module in Drupal. Custom Php code we developed on the Drupal site leverages Biblio functions to nicely select, format, and organize the items on faculty profile pages, adding links to Google Scholar and citation export modules.
  • Another C# program loads the lists into a database and executes a series of SQLServer stored procedures to integrate the data, tracking which items have been added or changed.
  • Scripted functions in the Salesforce Apex Dataloader transfer the data (using ‘Upsert’) into a set of custom designed SalesForce objects. SalesForce triggers manage the process so that old items are recognized and deleted from the dataset.

Why it’s a cool project

Our project focuses on bringing technology, processes, policies, and people together to move important data between local and cloud services to provide the College of Business – and eventually fellow colleges – with the ability to gather, collate and present citation data as never before. Although it represents only a small subset of the cloud computing sector, this project allowed us to explore essential knowledge elements necessary for moving, parsing and reassembling existing data to meet new opportunities – a key set of skills for today’s Information Systems professionals.

The members of the project team have increased in knowledge, honed their skills, and given back to their college and university of choice – the College of Business at Oregon State University.

  • Credential cached html access/ingesting XML
  • Interacting with the Drupal Biblio module
  • SQL stored procedures for data synchronization
  • Apex loading scripts for SalesForce/SaleForce triggers

Interested in getting involved? Contact the SIM Club President, Matt Janachek at: [email protected]

Media Personnel – Questions? Thoughts? Contact the SIM Media Relations Manager: [email protected]