You are on page 1of 1

Saritor:"A"Hadoop"Ecosystem"to"Advance"Clinical"Research"and"Prac/ce" Charles"Boicey,"MS,"RNFBC ,"Lisa"Dahm,"PhD ,"David"Gonzalez ,"Mahesh"Rangarajan ,"Rushipriya"Panda ,"Je"Markham ""

1 1 1 2 2 3" 1University"of"California,"Irvine,""2CMC"Americas,"3Hortonworks"

The"Clinical"and"Transla/onal"Science"Awards"(CTSA)"is"a"registered"trademark"of"DHHS.""

"
Introduction Facebook, Twitter, LinkedIn and Yahoo share the same underlying infrastructure, Apache Hadoop. All three of these applications consume, process and store millions of records consisting of structured, unstructured, image and video data. As healthcare data shares many of the characteristics of the data found in Facebook, Twitter, LinkedIn and Yahoo, Hadoop should be an ideal environment for the ingestion, storing and utilization of healthcare data. Methods A virtual Apache Hadoop version 1.0 infrastructure consisting of a single NameNode server and four Task Node servers was set up within the UCI Medical Center data center. Ubuntu Linux running on VMware was the chosen OS. The Hadoop modules utilized were: Hadoop Common, Hadoop Distributed File System (HDFS), MapReduce, Pig, Mahout and Zookeeper. Java scripted routines processed the legacy data. Mirth HL7 listener and a java scripted routine processed the HL7 data. Results The legacy data of 1.2 million patients, contained in 9 million patient medical records was successfully ingested into the Saritor Hadoop Distributed File System. For researchers the drag and drop query and visualization tool allowed for the visualization of the legacy data. For clinicians in patient care complete patient records were retrieved via a web browser. HL7 messages from all source systems, physiological monitoring data in one-minute intervals, and ventilator data in one-minute intervals and EMR generated data was ingested and stored. Algorithms for sepsis, hospital acquired conditions and 30 day readmits are able to be built into Mahout for real time surveillance. Discussion Our initial findings demonstrated the Hadoop ecosystem is well suited for the ingestion, storage and retrieval of both legacy EMR data and runtime EMR data. Minimal programing is required to process legacy data and the processing of runtime EMR data requires the cloning of existing interfaces. The functionality of real time clinical surveillance presents unlimited use cases. Hadoop is an ecosystem that is affordable, scalable, highly available, allows for clinical research and clinical practice to coexist in the same system.

Clinician"Viewer"

Events"(Sepsis)"/"Chronic" Disease"Monitoring" Legacy"Data"Viewer" Predic/ve"Analy/cs"

Research"Viewer!

Legacy"+"EMR"Data" Cohort"Discovery" Rela/onship"/"Graph"Analysis" DeFiden/ed"at"presenta/on"

Quality/Opera/ons"Viewer"
Pa/ent"Throughput"(RTLS)" Quality"Measures" Pa/ent"Engagement" Asset"U/liza/on"Metrics"

Cohort"Discovery"

User/Role"Based"Access"Control"
Saritor"Business"Services"

Request"/"Reply"processing"Engine"(HTML"5"/"Resiul"Services"/"JSON"driven)"

Saritor!Surround!Ecosystem!

External"Interfaces"

Legacy"Data"Visualiza/on"

Compute"pa^ern"

Mahout"

" Graph"Database"

Neo"4j"

Mongo"DB"
"

MapReduce" "
Generate"and"lter"raw" data"from"HDFS"

Hive"
Input'Data'AIributes,'Rules,'Parameters'

Available'Data'Set'

Algorithm)Management) Algorithm"Management"

Input'Data'AIributes,'Rules,'Parameters'

Store"data"matrix"for" pa^ern"recogni/on"

Query" Language""

Training' Data'Set'

Hypothesis'/'Algorithm'Model' Sta#s#cal' (Core'Engine'with'the'Equa#ons'/'Analysis)' Techniques' Output'/'Results'(Actual)'

Test'Data' Set'

Hypothesis'/'Algorithm'Model' (Core'Engine'with'the'Equa#ons/'Analysis)'

Sta#s#cal' Techniques'

Output'/'Results'(Actual)'

Analyze'Output'for'Model'Behavior'' (Actual'versus'Desired)' Not$Sa'sfactory$ Iden#fy'Improvements' Sa'sfactory$Result$ Matches'Expecta#on' Release'for'Tes#ng'the' Model'

Analyze'Output'for'Model'Behavior'' (Actual'versus'Desired)' Not$Sa'sfactory$ Iden#fy'Improvements' Feedback'and'Rene'the' Model' Sa'sfactory$Result$ Matches'Expecta#on' Baseline'the'PaIern' Publish'new'version'to' Repository'

Hadoop"Distributed"File"System"(HDFS)"
External"Data"

Feedback'and'Rene'the' Model'

Diagnosis' PaIerns'Repository'

TDS"(Legacy"System)"

22"Years"Pa/ent"Data" 1.2M"Pa/ents" 9M"Records" Orders" Labs" Transcribed"Results" Pa/ent"Record"

HL7"Feed"

EMR"Generated"Data"
RN"Documenta/on" Provider" Documenta/on" "

FeedFforward"Learning"
RealLEme$Data$Feeds$

Feed$forward*Learning*

Lab"Results "" Physiological"Monitors" Ven/lators" Transcribed"Reports" Radiology"Results" Endoscopy"Results" Orders"

Home"Monitoring" Personal"Health"Record" Social"Media" """""""""""*Twi^er" """"""""""""*Foursquare" """"""""""""*Yelp" """"""""""""*RSS"&"Blog"

Training*and*Test* Data*sets*for* tes;ng*the*model** hypothesis*


Historical$ Data$Sets$

Input$Data$A-ributes,$Rules,$Parameters$

Modeling)Possibili-es:) Linear*Equa;on*(to*start*with)* Regression*Models*(Linear*/*Mul;variate)* Neural*Networks*(Layers*of*knowledge)*

Hypothesis$/$Algorithm$Model$ (Core$Engine$with$the$EquaEons$/$Analysis)$

StaEsEcal$ Techniques$ New$Learning$(Pa-ern$Renement)$ Publish$new$ version$to$ Repository$

Output$/$Results$(Actual)$

Use*the*new*baseline*for* real$;me*analysis*of*the* incoming*feeds*

Create*layers*of* knowledge*that* improves*the* understanding,*one* layer*at*a*;me*

You might also like