Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction...
Transcript of Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction...
PythonvsBigData"Python??WhyPython?"h$ps://www.youtube.com/watch?v=Vru9xOEtOM
WhatIsPython• Pythonisahigh-level,interpretedandgeneral-purposedynamicprogramminglanguagethatfocusesoncodereadability.
• ThePythoniswidelyusedandhavealargeandac9veprogrammercommunity.
• Ithasacomprehensiveandlargestandardlibrarythathasautoma9cmemorymanagementanddynamicfeatures.
• Iteasilyextensiblebyotherprogramminglanguage• h$ps://www.python.org/• h$ps://en.wikipedia.org/wiki/Python_(programming_language)
WhyPython...somestepback...It's a dirty job, but someone have to do it
• Peopleneedstoelaboratedatainordertoextractresults
data resultTransforma9on
data
program
result
Digitaliza0on Rendering
computer
Datacoding• Digitalcomputerscanhandleonlybinarysignals:sequencesof0and1(bit=binarydigit)
• Inordertotransformdatabydigitalcomputers,itneedstodigitalizedata,i.e.transformrealsamples(images,sound,etc.)intosequencesofbits,packedfortechnologicalandhostoricalreasonsintogroupof8bit,calledbytes.
• Themeaningofasequenceisgivenbytheformatusedtocodeandinterpreterthesequence,eg.ASCII,bitmap,mp3.
0100.00100011.11000100.0010 ASCIIcodes666066 Characters"B<B"
BITMAP3x8
h5ps://en.wikipedia.org/wiki/ASCII h5ps://en.wikipedia.org/wiki/BMP_file_format
ComputersathardwarelevelAveryschema9candsimplifieddraYofadigitacomputer
CPU
RAM
IO
Keyboard
Printer
Network.....
program
program
data
data
Executor
Workingarea
Storingarea
Codingtransformations• Aclassicaldigitalcomputertransformsdigitaldatabyfollowingaprogram,i.e.asequenceofcommandsthatdescribesthetrasforma9onstobeappliedtodata.
• Aprogramcanbewri$enusingvariouseHi-Levelprogramminglanguages,i.e.languageforhumans,eg.ADA,C,C++,Perl,Python,Java,Pascal,Basic.
• Computers,athardwarelevel,understandonlyaverytrivialsetofcommands,theAssembly,aLow-Levelprogramminglanguage,alanguageforCPUs.
Hi-LevellanguagesBASIC:10INPUT"Yourname?:",NAME$20PRINT"Hello";NAME$
C:#include<stdio.h>char*name[100];intmain(){prinj("Yourname?:");scanf("%s",name);prinj("Hello%s\n",name);return0;}
Python:name=input("Yourname?:")print("Hello",name)
Java:packagestringvariables;importjava.u9l.Scanner;publicclassStringVariables{Scanneruser_input=newScanner(System.in);Stringname;System.out.print("Yourname?");name=user_input.next();System.out.print("Hello"+name);}
Assembly
h$ps://www.researchgate.net/figure/Assembly-instruc9ons-of-an-x86-example-op9mizing-frequently-executed-pieces-of-code_fig2_3881320
instruc9oninmemoryusedbyCPU instruc9ontransliteratedforhumans
data
program
result
Usecomputers?startproblems!https://www.youtube.com/watch?v=tiq6v39YliQ
• Datamanagement• Portability
• Codereadibility• Codemaintenance• Codeestensibility
• speed?• cost!
DevelopeCode:ajobforteams• Codeshould(must?)be:• readable:projectspassthorughmanyhandsandmaylive,fromchangetochange,formanyyears• easytodevelope:• easysyntaxàfastlearning• noterror-prone:syntaxshouldaidgoodprogramming
• withalotofalreadymadewheels:awidelibrarycollec9onofgoodfunc9onsaidtobuildupgoodcoderapidly(dontreinventthewheel)• Cool:alargeconnectedcommunityofgeeksthatcodewithyourprogramminglanguageprobablyhavealreadysolvedallofyourpossibleproblems.
Speed• Speedgenerallyconflictswithcodemaintenance.• Fastcodesinordertofullcontroltheflowoftheistruc9ons(usually):• iscodedusinga"raw"programminglanguage(eg.C,C++)thusitresultoYenunreadable.
• itdon'tuse"abstrac9ons"forimplemen9ngalgorithmandmanagingdatathusitbacameeasytomakemistakesandbugs
• librariesareimplementedfromscratchinordertoop9mizecodeorremoveunusedpartofcode,thus"newcode,newbugs".
"Don'trunifthereisnotneeds"
InterpretervsCompiler• TheprocessoftranslatefromHItoLowLevelcanbemadeintwoway:translatetheprogramwithacompileroexecutetheprogramwithaninterpreter
• Compilers:• takealotof9meforcompilephasebuttheresult,theexecutable,runfastonCPU.
• Anynewreleaseofthecodehavetobecompiledagain• therenoeasywaystorunthecodestepbystepfortest(youhavetouseadebugger)
• Interpreters:• designedforinterac9vemode:easytodebugcode• codeisexecutedbyanagent,notdirectlybyCPU
• easytoporttonewkindofcomputer
• Notsofast:eachlinehavetobetranslatedany9meisexecuted
Speed
Cchar*aword=malloc(typeof(char)*10);scanf("%s",aword);for(i=0;strlen(aword);i++){
prinj("%c\n",aword[i]);}free(aword);+fast:compiledfortherunningCPU+smallbinary-unreadable-memorymgmtisourduty-easytomakemistakesonsyntax
pythonaword=input()forcinaword:
print(c)+easytoundestand+easytofinderrors+memorymgmtisdelagatedtosystem-notsofast:managingobjectrequiresabackgroundprocessthatsinksomecpu9me,itisinterpreted.
Speedconstrains• Speeddependsmainly:• datamanagement:
• howobjectsfordataarecreateand,moreimportants,destroyed.• howaccesstodataismaderespecttothelayeredcachedmemory
• CPUparallelism:• modernCPUsaresuperscalar:candomanystepsatthesame9me,concurrently,ifthecodepermitsit.
Datamanagementado-it-yourselfview(Cstyle)
INPUT"ABC",1
ABC
RAM
1. createaword2. createanumber3. createaXtype
4. put"ABC"inthefirstword
5. put1inthefirstnumber
6. destroytheword7. destroythenumber
wordtype
numbertype
Xtype
1
garbageuncollectedwastememory
WN
Xtype(dead)
Datamanagementadata-as-serviceview(Javastyle)
INPUT"ABC",1
ABC
RAM
1. IneedawordWfor"ABC"
2. IneedanumberNfor1
1
Xtype
garbagecollec9on9medservice
Pythonspec• Generalpurpouselanguage• Focusedonreadability• Interpreted• Modular• Dynamic• Object-oriented• Portable• ExtensibleinC++&C
Snakify• Snakifyisaplajormfore-LearningPython3• Connecttoh$ps://snakify.org/• Signupusing
• [email protected](dontuseyourprivateemail,ifpossible)
• apasswordDIFFERENTfromtheonausedforemail
• flagtheop9on"Ihaveateacher"• put"[email protected]"inthefield"Teacher'semail"