You are on page 1of 12

1

DefinitionandsomefactsofBIGDATA:
Atthestartpeoplewhoworkincompaniescalledemployeesusedtoenterdatainto computersystems. Thenthesecondgenerationcamewhereususersonlinestartedenteringourowndata intosocialnetworkingsites. Nowathirdgenerationhascome.Thisgenerationiswheremachinesincompaniesor factoriesareautomaticallyenteringdataintocomputersystems. OverallBIGDATAisthetermforacollectionofdatasetssolargeandcomplexthatit becomesdifficulttoprocessusingonhanddatabasemanagementtoolsortraditional dataprocessingapplications. Bigdataisapopulartermusedtodescribetheexponentialgrowthandavailabilityofdata, bothstructuredandunstructured. InBigDatathereare3Vs.The3VsareBigVolume,BigVelocityandBigVariety. Thesearethedefiningpropertiesandthedimensionsofbigdata. Volumereferstotheamountofdata. Varietyreferstothenumberoftypesofdata. Velocityreferstothespeedofthedataprocessing. Bigvolume:WithSimple(SQL)analytics,Withcomplex(nonSQL)analytics. BigVelocity:Drinkfromthefirehose. BigVariety:Largenumberofdiversedatasourcestointegrate. SQLstandsforStructuredQueryLanguage. SQLisastandardizedquerylanguageforrequestinginformationfromadatabase. SQLwasfirstintroducedasacommercialdatabasesystemin1979bythedOracle Corporation. Historically,SQLhasbeenthefavoritequerylanguagefordatabasemanagement systemsrunningonminicomputersandmainframes Bigdataisabuzzword,orcatchphrase,usedtodescribeamassivevolumeofboth structuredandunstructureddatathatissolargethatit'sdifficulttoprocessusing MStraditionaldatabaseandsoftwaretechniques.Inmostenterprisescenariosthedatais toobigoritmovestoofastoritexceedscurrentprocessingcapacity. Bigdataisatermdescribingthestorageandanalysisoflargeandorcomplexdatasets usingaseriesoftechniquesincluding,butnotlimitedto:NoSQL,MapReduceand machinelearning.

VARIETY
UnstructuredDatareferstoinformationthateitherdoesnothaveapredefineddatamodelor isnotorganizedinapredefinedmanner.Unstructuredinformationistypicallytextheavy.Inother wordsunstructureddataissomethingthatisattheotherendofthespectrum.Itmightbeinany form:text,audio,video.Wedefinitelydontknowfromlookingatthedatawhatitmeans,unless weapplyhumanunderstandingtoit. ExamplesofUnstructuredData Book Story Heavytext audio video RSSFeeds Worddocuments ExcelSpreadsheets Emailmessages StructuredDataDatathatresidesinafixedfieldwithinarecordorfileiscalledstructureddata. Thisincludesdatacontainedinrelationaldatabasesandspreadsheets.Structureddatahasthe advantageofbeingeasilyentered,stored,queriedandanalyzed. ExamplesofStructuredData: Censusrecords(birth,income,employment,placeetc.) LibraryCatalogues(date,author,place,subject,etc) Phonenumbers(andthephonebook) Economicdata(GDP,PPI,ASXetc.) XMLTEI(bringingstructuretothetextthroughtaggingparticularelementslikeversionsof thewordcanalin17thCDutch. Databases Datawarehouse Enterprisesystems(CRM,ERP,etc) RelationalDataRelationaldataisadatathatspeaksforitselftypicallythisisthestandard farefordatawarehouses.ThisisextractedfromERPandotheroperationalsystems.We alreadyknowwhatthedatameansandwhatitsstructureis.

VELOCITY
VelocityRates RealTime(Fastest) NearRealTime Periodic Batch(Slowest) RealTimearealtimebigdataanalyticsplatform,deliversultrafast,interactiveanalytical resultswithsubsecondresponsetime. Batch:isanothertypeofstreamingdatabutisaslowerthantheRealtime. BenefitsofBatchProcessing: Itcanshiftthetimeofjobprocessingtowhenthecomputingresourcesarelessbusy. Itavoidsidlingthecomputingresourceswithminutebyminutemanualinterventionand supervision. Bykeepinghighoverallrateofutilization,itamortizesthecomputer,especiallyan expensiveone. Itallowsthesystemtousedifferentprioritiesforbatchandinteractivework. Ratherthanrunningoneprogrammultipletimestoprocessonetransactioneachtime, batchprocesseswillruntheprogramonlyonceformanytransactions,reducingsystem overhead.

BIGDATASoftwares:
1. ApacheHadoop:ApacheHadoopisanopensourcedataframeworkforstorageand largescaleprocessingfordatasetsonclustersofcommodityhardwares.Itislicensed undertheApacheLicense2.0.ThisiswritteninJava.TheApacheHadoopframeworkis composedofthefollowingmodules: a. HadoopCommoncontainslibrariesandutilitiesneededbyotherHadoop modules. HadoopApacheFoundation MongoDBMongoDB,Inc SplunkSplunkInc(Canbeaccepted)

b. HadoopDistributedFileSystem(HDFS)adistributedfilesystemthatstores dataoncommoditymachines,providingveryhighaggregatebandwidthacross thecluster. c. HadoopYARNaresourcemanagementplatformresponsibleformanaging computeresourcesinclustersandusingthemforschedulingofusers' applications. d. HadoopMapReduceaprogrammingmodelforlargescaledataprocessing. 2. MongoDB:MongoDBisabigdatasoftwarewhichcamefromthewordhumongous. ThisiswritteninC++MongoDBisacrossplatformdocumentorienteddatabase.A documentorienteddatabaseisacomputerprogramdesignedforstoring,retrieving,and managingdocumentorientedinformation,alsoknownassemistructureddata.Thisis classifiedasNoSQL.ANoSQLdatabaseprovidesamechanismforstorageand retrievalofdatathatismodeledinmeansotherthanthetabularrelationsusedin relationaldatabases.MarklogicisanAmericanbusinesscompanythatmakesNoSQL database.

MarkLogic
EnterpriseNoSQLDatabaseTechnology BestBigDataSearch RealtimeYourHadoop

EnterpriseNoSQLDatabaseTechnology Formorethanadecade,MarkLogichasdeliveredapowerful,agile,andtrustedenterprisegrade NoSQL(NotOnlySQL)databasethatenablesorganizationstoturnalldataintovaluableand actionableinformation.KeyfeaturesincludeACIDtransactions,horizontalscaling,realtime indexing,highavailability,disasterrecovery,governmentgradesecurity,andmore.

BestBigDataSearch MarkLogicsscaleout,realtimeplatformismorethanasearchenginelinkedtoacontent repositoryitisthemostcompleteplatformforbuildingsearchorientedapplications. Searchalldataformorevalue.Bringallrelevantcontentbacktousersunstructuredand structured,internalandpublic. Realtimeupdates.Realtimeresults.Whendocumentsareupdatedorinserted,theyare availableforsearchimmediately. Abletoqueryalltypesofdata.Structured,semistructured,andunstructuredcontentareall supportedwithinthesamequeries. Realtimealertsforfastresponse.MarkLogichasthehighestperformancealertingengine available,capableofrunningmillionsofcustomqueriesoneachandeverychangetothe documentrepositorynopollingrequired. Searchyoucanbankon.Businessesthatcountonrevenuethroughpaidcontentsearchand retrievaltrustMarkLogictodeliver. RealtimeYourHadoop SeamlesslycombinethepowerofMapReducewithMarkLogicsrealtime,interactiveanalysis andindexingonasingle,unifiedplatform. GetmorepoweroutofHadoop.HadoopandMarkLogictogethercanallowyoutotackle problemsthatwouldbedifficultorimpossibletoaddressbyeithertechnologyalone. Savemoneybyleveragingcommoninfrastructure.UsingMarkLogicandHadoopDistributedFile System(HDFS)enablescommonbatchprocessinginfrastructuretobeusedacrossmany differentprojectsandapplications. EnterpriseclasssupportforHadoop.OurpartnershipwithIntelprovidesastrong,supported platformforbuildingsecure,enterpriseclassBigDataApplicationswithApacheHadoop.

AdvantagesandDisadvantagesofBIGDATA:
Advantages: Dataminingallowsusesarethatyoucanfindcorrelationseasier. Morecalculatednowthereforeaccuracyishigher. Dataisnowcombinedintoabigmasswhichallowsforlinkstobefound. Forexample:companywithdecadesofinformationcanmakeuseofBigDataanddata. analysistocreatecompetitiveadvantagesandopennewbusinessopportunities. Startedbecausecompanieshavebeenfindingithardtomanagealltheirdata. Createsnewgrowthopportunities,lotsofjobs. Disadvantages: Bigrisksonsecurityandprivacy. Challengesarise:expensive,needtospendalottogetitworking. Alotofanalyzing:uncoverpatterns,applyalgorithms,connectionsrelationships. Stillneedspecializationregardingtheanalystshardtofindtherightskillset.

OraclesBigDataSolution:Oracleisthefirstvendortoofferacompleteandintegrated solutiontoaddressthefullspectrumofenterprisebigdatarequirements.Oraclesbigdata strategyiscenteredontheideathatyoucanextendyourcurrententerpriseinformation architecturetoincorporatebigdata.Newbigdatatechnologies,suchasHadoopandOracle NoSQLdatabase,runalongsideyourOracledatawarehousetodeliverbusinessvalueand addressyourbigdatarequirements.

ThingsthatyoucanaccomplishwithBIGDATA
1.DialoguewithConsumers Todaysconsumersareatoughnuttocrack.Theylookaroundalotbeforetheybuy.You wanttomakecustomerstobuyyourproducts. BigDataallowsyoutoprofiletheseincreasinglyvocalandficklelittletyrantsina farreachingmannersothatyoucanengageinanalmostoneonone,realtime conversationwiththem.Thisisnotactuallyaluxury.Ifyoudonttreatthemliketheywant to,theywillleaveyouintheblinkofaneye. 2.RedevelopyourProducts BigDatacanalsohelpyouunderstandhowothersperceiveyourproductssothatyou canadaptthem. Analysisofunstructuredsocialmediatextallowsyoutouncoverthesentimentsofyour customersandevensegmentthoseindifferentgeographicallocationsoramongdifferent demographicgroups. 3.PerformRiskAnalysis Successnotonlydependsonhowyourunyourcompany.Socialandeconomicfactors arecrucialforyouraccomplishmentsaswell.Predictiveanalytics,fueledbyBigData allowsyoutoscanandanalyzenewspaperreportsorsocialmediafeedssothatyou permanentlykeepuptospeedonthelatestdevelopmentsinyourindustryandits environment. Detailedhealthtestsonyoursuppliersandcustomersareanothergoodiethatcomes withBigData.Thiswillallowyoutotakeactionwhenoneofthemisinriskofdefaulting. 4.Keepingyourdatasafe YoucanmaptheentiredatalandscapeacrossyourcompanywithBigDatatools,thus allowingyoutoanalyzethethreatsthatyoufaceinternally. Youwillbeabletodetectpotentiallysensitiveinformationthatisnotprotectedinan appropriatemannerandmakesureitisstoredaccordingtoregulatoryrequirements.

UtilizationofBIGDATA:
BigDataisusedinmanyfieldslike: CarMakers(Toyota): FaultLoggingandcostpredictionsCarmakersplacehundredsofsensorson componentsaroundthecarwhichconstantlylogdataonperformanceandfaults.Allof thisdatacanbeusedtoreengineerdesignsformoreefficientproductsandtopredict whatthestrainofwarrantyrepairsarelikelytobeoncostandmanresource. Finance(Visa): B2BsupplierprofilingFinanceprofessionalscanusebigdatatocheckonthehealthof theirsuppliersandbusinesspartners.Theycanmonitoravarietyofindicatorsincluding whencreditorspaytheirbillsandwhetherthereisanychange. FrauddetectionCompanieslikeVisaareusingbigdatatocreatefrauddetectionmodels whichcanflaguppotentialfraudsters. Utilities(oil&gas)(ChevronCorporation): AssetmonitoringAswiththemachinesinmanufacturingplants,theutilitiescompanies usebigdatatokeeptrackonalloftheirassetsspreadacrossacountry,continentorthe globe.Thisenablesthemtofixanybrokenasset(suchasasewagecleansingplant,a leakingpipeoragaspump),performpreemptiverunningmaintenanceorisolateareas inwhichrepairactionshavebeenineffective. GeneralManufacturing(GeneralMotorsIndiaLimited,GM): SimulationsManufacturerscantakerealdatafromtheirproductsonthemarketand thenrunsimulationsbasedonwhatwouldhappeniftheychangedoneparticular componentordesignaspect.Theycanthenfindwaystomaketheproductcheaper, morereliableormoreenvironmentallyfriendly.TheFormula1racingteamsare particularlyadeptinthisarea,asareadvancedaerospacecompanies. ExpandedproductdesignmodelingSimilarly,withnewbigdataenabledcomputeraided designprograms,productdesignerscansubstitutecomponentsormaterialsfromhuge databasesandthenaccessindepthinformationonhowthisaffectsthefinalproduct, includingtheramificationsoncost,productionprocesses,environmentaleffects, legislativerequirements,supplychainandsoon.

10

Policing(CBI): SuspecttrackingBycombiningCCTVimages,facialrecognitionsoftware,traveltrends andidentifiersontravelcards,policeforcescancapturecriminalsbyautomatically linkingpeopletotheirlikelydestinationsonbusesandmetrosystems.Thisallowspolice tocatchthosethattheymissatthesceneofthecrimeandalsotocontrolarrest statistics,meetingtargetsforarrestsinoneLondonborough,forinstance,asneeded. RetailandMarketing(AirJordans): MoodmappingRetailersusefeedsfromsocialnetworkstobuildanunderstandingof howtheirproductsandcompanyreputationisseenamongthepublic.Withtheconstant streamsofopinionsfromFacebook,Twitter,Google+andthelike,companiesareableto cheaplyandquicklygatherlargesamplesofcustomeropinion.

Title
1.CarMakers (Toyota)

Where

Needs

Benefits Feedbackfrom design.

Fromthefactories SafetyandQuality andfromthesensors analysis. tothedatacenter (headquarters) TypeofData:What conditionthecarisin. Whereevertheybuy. DetectFraud Customers TypeofData:What behavior theybuy,wherethey buy,whentheybuy, howmuchtheybuyit for.

2.Finance(Visa)

Personal Recommendation.

3.General Severalbranches SafetyandQuality Manufacturing(GM) Headquartersin analysis. Gurgaon TypeofDataWhat conditionthemotoris in.

Awarenessand indicationonwhatto fix.

11

4.Policing(CBI)

SeveralPolice departmentstothe mainCBI Headquarters. TypeofDataDetail ofthepersonwho theyaretracking

Detectingpersons Giveawarenessfor behaviorandactions. whatthatpersonis goingtodonext. Whatistheirnext plan.

5.Utilities,oil&gas Fromthemachines (Chevron) inthemanufacturing plantsdatacenter (headquarters). TypeofDataWhat isgoingoninthe Manufacturingplant. 6.Retailand Marketing(Air Jordan) Fromsocialmedia networkingsites headquartersof company(data center) TypeofData Customersopinion orfeedbackonthe product.

keeptrackofwhat isgoingoninthe Manufacturingplants likebrokenpipes, leakageandetc...

Thisgivesthem feedbackfrom designssotheyknow howtoimprovethe constructionofthe manufacturingplant becausethatistheir mainsourceofhow theygetoilandgas. Thisgivesthem feedbackonwhatthe customersare thinkingaboutthe product.Gives feedbackfrom audiencestoimprove product.

Customers behaviors(likeitor not) Helpstofindout consumersopinions andfeelings.

12

You might also like