temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated...

24
temDM MSA basic version Pavel Potapov April 5, 2018 CONTENTS 1 Installation 2 2 Basic PCA treatment 2 3 Treat XEDS data 7 4 View Results 8 5 Rotation of PCA results 10 6 Clustering 13 7 View rotated and clustered results 16 8 EndMembers 16 9 Settings 20 10 Scripting support 22 11 Troubleshoot 23 References 24

Transcript of temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated...

Page 1: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

temDM MSA basic version

Pavel Potapov

April 5, 2018

CONTENTS

1 Installation 2

2 Basic PCA treatment 2

3 Treat XEDS data 7

4 View Results 8

5 Rotation of PCA results 10

6 Clustering 13

7 View rotated and clustered results 16

8 EndMembers 16

9 Settings 20

10 Scripting support 22

11 Troubleshoot 23

References 24

Page 2: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

2 Pavel Potapov • temDM MSA basic version 1.9

1 INSTALLATION

You should place the temDM MSA.gtk plugin intoa PlugIns folder of DigitalMicrograph. There are typi-cally several such folders, for instanceC:ProgramData/Gatan/Plugins. If import ofVelox XEDS spectrum-images is desired, you shouldalso place there an appropriated HDF5-reading plugin.These open-source plugins were compiled by Tore Nier-mann, TU Berlin :hdf5_GMS2X_amd64.dll for the 64-bit system,hdf5_GMS2X_x86.dll for the 32-bit system.

The script find plugins folders.s included in thedistribution package will help you to localize desiredfolders. Open find plugins folders.s in DigitalMicro-graph and run it by pressing execute or by pressingENTER with holding the CNTR key. Read the list ofavailable plugins folders. The first folder in the list ismost appropriated for placing the temDM plugins.

Some folders can be hidden in Windows. If you donot see all folders, make them visible in the Windowsexplorer:

Windows 7: Organize tab - Folders and search options- View tab - click show hidden files, folders anddrivers checkbox.

Window 10: View tab - click hidden items checkbox.

• Drop temDM MSA.gtk andhdf5_GMS2X_amd64.dll into the chosen Plu-gins folder.

• Restart DigitalMicrograph and find the MSA itemsin the Menu temDM : MSA.

To update the version, just overwrite the plugin of aprevious version in the Plugins folder. All versions oftemDM MSA have the same name to avoid confusionwith loading ambiguous commands. If you have sev-eral versions, it is recommended to keep PlugIns in in-dividual folders with meaningful names liketemDM MSA basic version 1_XX.

2 BASIC PCA TREATMENT

This chapter will introduce you in how to perform Prin-cipal Component Analysis (PCA) of STEM EELS (Elec-tron Energy-Loss Spectroscopy) or XEDS (X-ray En-ergy Dispersive Spectroscopy) spectrum-images.

First of all you have to open the data cube wherethe results of your spectrum-imaging are stored. Thatmight be an EELS data acquired by the Gatan soft-ware and stored in the standard format - with spatialdimensions along X and Y and the energy axis along Z(not directly visible). For the purpose of learning youmight open the simulated EELS spectrum-image avail-able in http://temdm.com/web/msa/. The simulateddatacube EELScube.dm3 mimics the format and hid-den information tags of real experimental data cubes.You can inspect how the spectrum is changing frompixel to pixel in EELScube.dm3.

• Put EELScube.dm3 in front and choosetemDM : MSA : view spectrum in the DigitalMi-crograph menu.

A green marker appears inside the image while aspectrum is displayed in the separate window. Younotice that the spectrum is always very noisy. This

Page 3: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 3

spectrum-image was generated noisy by purpose. Youmay drag the marker across the image - the spectrumwill be live updated. By default, the marker points tothe one-pixel area but you can enlarge the green rect-angle and get the spectrum summed over the largerarea. If the area is too large for live update, the spec-trum will be updated as soon as you release the mousebutton. A spectrum from larger areas look much nicer,is not it? Well, averaging removes the random noise.However, you probably need the spatial information,not just an spectrum averaged over a large area, thusyou wish to know how spectra change from pixel topixel. With the PCA denoising, you can obtain such anice spectrum at EACH pixel. Just continue, you getit.

How to stop viewing spectrum ?

• Close live-spectrum display without saving (youmight keep alt-key pressed to omit DigitalMcro-graph saving prompt)

Lets treat EELScube.dm3 and try to improve it.

All results of the statistical analysis will be kept ina special image EELScube MSA.dm3. Before yougenerate such an image, decide whether your Princi-pal Component Analysis (PCA) will be weighted ornot. The weighting treatment is needed to equalize thePoisson noise across the dataset. Weighting is stronglyrecommended. In particular, unweighted PCA willNOT pick up the relevant data variation in the exampleEELScube.dm3. Instead, the random variations of thelow-energy background will be highlighted. Weight-ing really makes a good job!

You may choose to weight over spatial and energydimensions or over energy only. Weighting for onlyspatial dimensions makes no sense for typical STEMspectrum-images.

To generate EELScube MSA.dm3:

• Open the proceed PCA tool by choosing temDM: MSA : proceed PCA,

• press make matrix button while having the orig-inal EELScube.dm3 in front. The spectrum av-eraged over all pixels appears. You are promptto choose the energy region you are interestedin. Move the selected region as you desire.

• press Treat selected. In case you click Cancel,the whole available energy range will be used.This guaranties you do not loose any useful in-

formation but also implies the higher memoryconsumption.

• If you feel something goes wrong, you can stopthe conversion into EELScube MSA.dm3 at anymoment by pressing stop.

You have obtained a new entity called EELScubeMSA.dm3. This 2D image displays nothing but val-ues averaged over all energy channels, i.e. an “average

Page 4: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

4 Pavel Potapov • temDM MSA basic version 1.9

image”. However the 3D data distribution is not lost! This is attached to the image in a matrix form suit-able for further processing. Such image will be calledMSA image through this guide. Do not be surprisedthat MSA images are quite large when stored. Thereis a lot of hidden information in it. All MSA resultssuch as PCA loadings and scores will be kept in thisimage as tags. The key point is - you may play withdifferent parameters of the PCA decomposition (likenumber of components, centering, clustering) and stilluse the same EELScube MSA.dm3. No need to startagain with the original data cube. But you do have togenerate a new MSA image if you wish to change theenergy range or to evaluate weighted vs unweightedtreatment.

The proceed PCA tool is your first temDM MSAtool. Later you will learn more tools. For your conve-nience, all temDM MSA tools “remember” their po-sitions. Next time you open a tool, it appears exactlywhere you left it previously. If the position was ac-cidentally set too crazy and you cannot find it whenreopened, open the tool with holding the SHIFT key.The tool’s position will be reset.

Now you are ready to extract PCA components. Be-fore doing that, it is a good idea to check few parame-ters. One is centering, which means that all your datavariations will be counted relative the average spec-trum. Centering is strongly recommended. What isalso important is the number of extracted components.Keep in mind that a real system seldom consists ofmore then 5-7 independent variation trends. Thus, if

you see some non-noise signal in your 15th PCA com-ponent, your treatment is most probably not optimal.However, the number of the extracted components mustexceed the number of the expected variation trends be-cause you should ensure that the noise level has beenapproached and there is no need in the further PCAdecomposition.

It is recommended to start always with extracting 2PCA components. That would give you a good ideaabout your dataset. Later you may add more PCAcomponents without overwriting the first ones.

• press PCA using 2 (default) components whilehaving EELScube MSA.dm3 in front. Wait un-til the extraction is finished and the results ap-pear.

• Input more components, for example 5, and pressPCA again. Decline overwriting the previouslyextracted components by clicking cancel. Con-firm that you do wish to add more components.Extraction will start. You can interrupt the ex-traction procedure at any moment by pressing

Page 5: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 5

stop.

Once the PCA decomposition was executed success-fully, several forms of presenting the results show up.

One is a scree plot, i.e. the variance of data withineach PCA component. The PCA components are sortedin such a way that the variance decreases with increas-ing the component index (this is actually the essenceof PCA). Inspection of the scree plot might help youto find the number of meaningful components in thesystem. For instance, the EELScube.dm3 data showstwo PCA components with large variance while com-ponents of index 3-5 exhibit the lower variance thatchanges little from index to index. That is just a noise!It is clear that all forthcoming PCA components wouldnot be very different because you have approached thenoise level already. Thus, we conclude that this datacube exhibits two PCA components, i.e. two indepen-dent variation trends.

A screeplot is important but not the only hint todetermine the number of non-noise components in adataset. More tricks will be demonstrated below.

Another important PCA results are spectra of PCAcomponents, or loadings. In this example, the onlytwo first components show the meaningful spectra. PCAcomponents 3-5 (hidden in the shown display) are sim-ply noise. The average spectrum is also displayed inthe same graph. All PCA components should be con-sidered as the deviations from this average spectrum.Do not try to interpret what each component actuallymeans at this stage of treatment. The interpretationaspects will be touched later.

This all will look differently if you choose to do un-centered PCA. Then, the first component is effectivelythe average spectrum, while the total number of com-ponents is increased by one. Such a style of treatment

does not make much sense. If you disagree, just try it!

A very important characteristic is a scatterplot, i.e.the joint distribution of two selected component’s scores.After the extraction of the PCA components, the scat-terplot of 1st vs 2nd score will be automatically dis-played. Later you may inspect the scatterplots betweenany couple of components. The scatterplot of the twofirst components is typically most informative, that iswhy it is displayed automatically. In the present exam-ple you see that EELScube.dm3 shows two indepen-dent trends of the data variation (two branches in thescatter figure).

All these three pictures are for your information only.After you have a look, just close them without saving.They are automatically stored in EELScube MSA.dm3.You can easily display them any time when necessary.

If you agree that EELScube.dm3 consists of onlytwo non-noise components, you can reconstruct thedata cube using these two components only and getrid of noise.

• Choose the number of components for recon-struction. As a little exercise input 3, not 2 forEELScube.dm3.

Page 6: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

6 Pavel Potapov • temDM MSA basic version 1.9

• Click reconstruct. You get the list of compo-nents ready to include in reconstruction. That isyour last chance to reduce the number of com-ponents. Uncheck PCA 3 and press OK.

Theoretically you can remove a PCA componentin between the meaningful ones, for example removePCA 2 but leave PCA 3. This however is NOT rec-ommended to do unless you see a clear artefact in theintermediate component.

Finally the reconstructed data cubeEELScube REC(3).dm3 appears.

You might wish to compare it with the original datacube.

• Place EELScube REC(3).dm3 front-most andEELScube.dm3 second front-most; choosetemDM : MSA : twin view spectrum in the Dig-italMicrograph menu.

• To stop twin-view-spectrum close spectrum win-dow without saving.

Green markers will appear in both the images whilethe extracted spectra will be displayed as an overlayin the separate window. You may drag or resize themarker at any image. The companion marker will movesynchronously.

Do you see the difference between the original andthe reconstructed data cubes ? Well, this is the sim-ulated data. Real examples are be not as straightfor-ward... Nevertheless if you tune all steps carefully, theresults might be astonishing as a plenty of unwantednoise is gone.

Now you probably wish to store the results of yourPCA decomposition. Try to store EELScube MSA.dm3.

Ooops... Why this 2D image is so large ? It is evena bit larger in size then the original spectrum-image.This is because the original 3D data is kept as a hiddenmatrix in EELScube MSA.dm3. But you can get ridof that.

• Click compress while having EELScube MSA.dm3in front.

A new 2D image - EELScube COMP.dm3 - seem-ingly identical to EELScube MSA.dm3 - appears. Itconsists of all the results of your analysis except of

Page 7: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 7

the original noisy data. Store it and find out that it isVERY COMPACT.

What is important - for 95% of further operations,EELScube COMP.dm3 is identical toEELScube MSA.dm3. You may do the further MSA(Multivariate Statistical Analysis) : rotation, cluster-ing, endmembering and others. Enjoy the wide rangeof treatment possibilities with the very compactly com-pressed format ! You do not need necessarily to storethe large EELScube MSA.dm3 file. Just find the rightnumber of PCA components, ensure that the recon-struction goes correctly and convertEELScube MSA.dm3 into EELScube COMP.dm3.Then, you may close EELScube MSA.dm3 withoutsaving.

You can always get the reconstructed spectrum-imagefrom your compressed image.

• Click extract while havingEELScube COMP.dm3 in front.

This would take a while because the reconstructionprocess is to be repeated. Actually, extract is equiv-alent to pressing the reconstruct button with the onlydifference that you need not specify the list of com-

ponents for reconstruction. The components list usedlast time before the compression will be utilized forbuilding the denoised spectrum-image.

At this point, job is basically done. However, youare advised to explore more features of the temDMMSA package.

3 TREAT XEDS DATA

XEDS (X-ray Energy Dispersive Spectroscopy, some-times called EDX or EDS) data cubes can be treatedin the same way. You can convert the XEDS data col-lected by the Bruker software (ESPRIT) into a Digi-talMicrograph image. As an example, use the Mag-iCal EDX example free down loadable at http://

temdm.com/web/msa/.

• Store the h-4.bcf data from ESPRIT as h-4.rawfile. If you do not have ESPRIT installed in yourcomputer, use h-4.raw from the downloaded pack-age.

• Open the import tool by choosing temDM : MSA: import in the DigitalMicrograph menu.

• Click import button in import Bruker ESPRITbox, browse and choose the desired h-4.raw file(the ripple file h-4.rpl file must also be in thesame folder). Conversion take a while. You canstop it anytime by clicking stop.

Unfortunately the calibrations are not kept in the h-4.raw file, thus you should calibrate the converted data

Page 8: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

8 Pavel Potapov • temDM MSA basic version 1.9

cube manually by inputting the proper numbers in thefields “scale” (how many nanometers are in one pixel)and “dispersion” (how many keV are in one energychannel). The value “origin” is usually 0 for spatialcalibration but must be set at some positive numberfor energy one. It varies from system to system butfixed for a given spectrometer and a given spectrome-ter dispersion. Just check which channel is assigned toZERO in your ESPRIT. For the MagiCal EDX exam-ple, the calibration should bescale: 0.67 nmdispersion: 0.01origin: 48.

• Click calibrate while having the convertedspectrum-image front most.

You can also import XEDS spectrum-images storedby FEI Velox (version > 1.2). As an example, use thesuperalloy 1510 example free down loadable at http://temdm.com/web/msa/.

• Click import button in import FEI Velox box,browse and choose the superalloy 1510.emd file.

• The range of available frames to integrate in thespectrum-image appears. If you are sure all framesare appropriated for integration, just click OK.

This spectrum-image will be already properly cal-ibrated. If it is not (might happen because of misin-terpretation of attached metadata), perform calibrationmanually.

XEDS data are typically quite sparse. This makessevere problems in PCA treatment. Therefore, it isstrongly recommended to pre-treat the XEDS datasetbefore the PCA processing. You can, for instance, binit.

• Open the filtering tool by choosing temDM :MSA : filtering in the DigitalMicrograph menu.

• Choose the binning factors 2 on both X and Y.

• Click run button.

This would reduce the sparseness in the data andgreatly reduce its spatial dimensions although the num-ber of energy channels will be unchanged (for the mo-ment, there is no energy binning in the temDM MSApackage) .

In case your PC has no problem to handle largespectrum-images, you might smooth the very noisy

data by application of the “gaussian” filter. This filtermakes the weighted averaging according the Gaussianfunction kernel. The parameter is the standard devia-tion sigma (expressed in pixels) in the Gaussian distri-bution. Both spatial and energy filtering is available.Currently, the allowed range is 0.6 < sigma < 2.6. Justto try it:

• Having the binned h-4_2x2.dm3 spectrum-imagein front and leaving the default sigma=1 in Gaus-sian filtering SI’ box, press spatial or energy toproceed filtering.

The Gaussian filtering is rather long procedure inthe basic version, thus reserve some time for it.

Finally you might wish to check how your averageimage (image averaged over all energy channels) oraverage spectrum (spectrum averaged over all probes)looks like.

• Click spectrum or image.

Then, you may denoise the resulted h-4_2x2_xGsscube like it was described in the previous section.

4 VIEW RESULTS

• Open the view results tool by clicking temDM :MSA : view results.

Page 9: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 9

With this tool you can any time retrieve the storedinformation from EELScube MSA/COMP.dm3. Forinstance you can

• type the two component’s indexes in the inputfields and plot the corresponding scatterplot byclicking plot X-Y.

The sequence of the indexes plays a role! The indexwhich stays in the left-hand field will be your horizon-tal axis while the right-hand index will be the verticalone.

You can now investigate the data distribution in thefactor space with the desired deepness. Actually, youwill quickly discover that all scatterplots with the PCAcomponents higher than 3 are essentially same - theyform a round distribution of the uncorrelated noise na-ture. This is another way to determine the number ofthe non-noise PCA components in your dataset.

You can also display the extracted PCA loadings inthe desired range of indexes.

• Type the initial and final components and clickloadings.

You have seen already such a display immediatelyafter the PCA decomposition was finished. You candisplay these spectra at any moment without storing itin individual files.

More information is retrieved by

• clicking scores with specifying the desired com-ponent ranges.

Then you get the maps showing how each PCA com-ponent is distributed across the object. The scores canbe delivered as a number of 2D images or may be com-pacted into a data cube (chose your preference throwthe spanner button). In the latter case, you can scrollamong different PCA components using the standardDigitalMicrograph tool - Slice. Again you see that thescores of the PCA components with the index 3 andgreater show nothing but noise.

Explore now the three buttons at the bottom of theview results tool.

• Clicking variance will plot the standard screeplot that you have seen already after finishingyour PCA decomposition.

• Clicking kurtosis will calculate and plot the kur-tosis for each extracted component.

Kurtosis is a measure of non-Gaussianity in the datadistribution. It may be negative or positive for an ar-bitrary distribution but it must be exactly zero for anideally Gaussian-distributed data.

You might notice that the components with the in-dex 3 and greater show almost zero kurtosis. A Gaussian-distributed component is very likely just a noise. Thisis an alternative way to determine the number of non-

Page 10: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

10 Pavel Potapov • temDM MSA basic version 1.9

noise components. Caution: a very large Kurtosis isalso not good - this is a fingerprint of heavy outliers inthe component.

Finally you can inspect how the PCA decomposi-tion was proceeded.

• Click iterations.

You have now access to the number of iterations re-quired for finding each component. You can also seehow the variance and precision (difference betweentwo subsequent iterations) were progressing during theiterations. This inspection help to identify the caseswhen the PCA algorithm did not converge for any rea-

sons.

5 ROTATION OF PCA RESULTS

You probably noticed that the obtained PCA loadingsare not well interpretable. This is a typical featureof PCA - you can describe successfully the importantdata variation with few loadings but you cannot saywhat each loading actually means. You might try toimprove interpretability by rotation of the PCA load-ings in the factor space. Three rotation methods areincorporated in the temDM MSA package:

Independent Component Analysis (ICA)

Maximal simplicity (varimax)

Free rotation

How to do that?

• Open the MSA rotation tool by clicking temDM: MSA : rotate results.

The rotation operation implies visualization of a scat-terplot for the selected couple of components. This

Page 11: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 11

is needed for interaction with a user. Before start-ing the rotation, define the indexes of the most rep-resentative components. For the example EELScubeMSA.dm3 (you can use also the compact EELScubeCOMP.dm3 version), leave the default 1-2 scatterplot.

Lets start with ICA. ICA attempts to find the direc-tions along which the data distribution deviates mostdrastically from the Gaussian distribution. A criterionto be maximized is the kurtosis which is zero for theGaussian distribution and positive or negative for thenon-Gaussian ones.

• Click ICA. You get the list of PCA componentsto be rotated. You may uncheck any of the com-ponents, then they will be fixed, i.e. not rotated.Do not do that for the moment. Press OK.

The rotation matrix appears which shows how thenew components are composed from the old ones. Youmight notice that components 1-2 are rotated moder-ately while components 3-5 very significantly (and ran-domly) altered. This is because components 3-5 arepure noise, thus the algorithm is not able to find anyoptimal combination of them. The only rotation ofcomponents 1, 2 is meaningful.

Finally you see the kurtosis of the new componentsand their loadings. The kurtosis of two first compo-nents is pretty large (although negative for component1 and positive for component 2) while it is close tozero for components 3-5. That is because the latter arenoise.

You also see the 1-2 scatterplot, which indicates thatthe major variation trend is almost along with the load-ing of ICA 1 component. This is the best job ICA candeliver for this set. In fact, ICA results do not differvery much from the PCA ones for this example. Inother datasets, this might be very different.

Lets explore now the varimax method. This methodattempts to achieve the best simplicity of data - thatmeans the majority of pixels must have the large coor-dinate along one axis while zero or small coordinatesalong the other axes. In other words, the data pointsshould be as close (in average) to the coordinate axesas possible.

• Click varimax. You get the list of PCA compo-nents to be rotated. You may uncheck any ofthe components, then they will be fixed, i.e. notrotated. Do not do that for the moment. PressOK.

The new components loadings and scatterplot appears.

Page 12: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

12 Pavel Potapov • temDM MSA basic version 1.9

You don’t see much changes? The PCA compo-nents are rotated by only 3 degree and there is no anyparticular simplicity in the results ? That is becausethe rotation center was fixed. With such a rotation cen-ter (which is an average spectrum) you cannot get thebetter simplicity. Try another option - varimax withredefinition of the center.

• Click varimax(C). The scatterplot appears withthe red spot denoting the rotation center. Dragthe spot to the desired place in the 1-2 compo-nent plane. Press New center. As before youget the list of PCA components to be rotated.Change nothing, press OK.

The results are now much more reasonable. The ma-jority of the data pixels are close to 1 or 2 axis andloadings are better understandable. Of coarse, the ex-tracted loadings correspond to the expected latent fac-tors only approximately. This is because two trendsare not perpendicular to each other in the factor spacewhile the rotation is kept orthogonal in the temDMMSA package. More accurately interpretable loadingscan be extracted by finding EndMembers as will be de-scribed in the corresponding section.

Generally, the center of the factor space seldom lieson the PCA component axis. In this situation, the clas-sical varimax rotation is not very successful unless youredefine the center. For the moment temDM MSA re-quires manual adjustment of the rotation center.

Such recentering may be performed prior the ICAanalysis too (button ICA(C)). This however would notdeliver much better results in this particular example.

Finally you might practice in choosing the compo-nents to rotate. Do the varimax rotation as above butuncheck the components 3-5 in the list. You get thesame varimax results as before but much faster be-cause the senseless rotation of the noise PCA compo-nents is omitted.

The trick with redefinition of the center can be nat-urally extended to the rotation fully defined by a user.This is called free rotation in the temDM MSA pack-age.

• Click free. The scatterplot appears with the redarrow in the middle. The live spectrum profileshows the loading corresponding to the currentdirection of the arrow. Move around the redarrow on the scatterplot while watching the re-sulted loading spectrum. Orient the arrow such away that its beginning denotes the rotation cen-ter while its end points to the desired directionfor the best interpretable loadings. Press Rotate.

You get components 1 and 2 rotated as you speci-fied. Of coarse, the rotation was proceed in the 1-2plane only, while the rest components are unchanged.If you wish to rotate another axes, change the compo-nents indexes at the bottom of the MSA rotation tool.Free rotation affects only a couple of components atonce. This is different from ICA and varimax that canrotate all available PCA components in one action.

This style of rotation is of coarse subjective, no quan-titative criterion stays behind. However the real sys-tems are often so complicated that any established ro-

Page 13: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 13

tation criteria fail. Still human brains have a chance topick up the hidden structure in the data distribution. Aflexible look-inside sometimes overperforms any fixedalgorithm !

Note that you may use the free rotation tool to get anidea about the correspondence between the given di-rection in the factor space and the shape of a spectrumeven if you are not going to rotate the components.

• Click free. Move around the red arrow and learnwhat kind of spectrum appears in the live profile.After you have investigated all directions, pressCancel. The rotation will be NOT executed.

The free rotation may be proceed iteratively usingthe different couples of components. The results ofthe previous free rotation are not lost but suggested toyou as the initial point for the next rotation operation.This is different from ICA and varimax which alwaysstart from the original PCA components.

If you feel your multiple free rotations has cameto the configuration you don’t understand anymore,you always can return to the original orientation of thePCA components.

• Just click erase. The next free rotation will startfrom the original PCA components.

6 CLUSTERING

The example EELScube MSA.dm3 shows the twotrends of variation, more or less independent of eachother. This is a typical feature of STEM datasets; thereal systems might show even more complicated struc-tures. One way to deal with the complexity of datasetsis to break it on several clusters in the factor space.

• Open the clustering tool by clicking temDM :MSA : clustering.

• Having EELScube MSA.dm3 in front, gener-ate the scatterplot of the representative PCA com-ponents by clicking plot X-Y. The scatterplot ofthe two first PCA components is usually mostuseful.

Page 14: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

14 Pavel Potapov • temDM MSA basic version 1.9

• Click new. The ellipse appears in the scatterplot.

• Orient the ellipse in the way it covers the pointsto be included in the cluster. You can drag themain axis of the ellipse and tune its proportionby stretching/expanding the minor axis. Clickstore when ready. You may also exit the processwithout storing a cluster by clicking cancel.

• Click again new cluster and manage the secondcluster. Click store. If some pixels from the oldcluster appear to lay within the new one, theywill change their affiliation, i.e. will be attachedto the new cluster.

You might notice that every time you create a newcluster, a certain binary image appears that tells youhow your cluster looks in the real space. As usual, you

do not need to store it. They are stored automaticallyin EELScube MSA.dm3. You may check it:

• press show all. The binary images showing thegeometry of each cluster in the real space ap-pear. Additionally the average spectrum for eachcluster will be calculated and shown.

Clusters are shown by points of different colors in ascatterplot. The default colors are - 1: red, 2: green,3: blue, 4: pink, 5: yellow, 6: light-blue, 7: brown.For your convenience the pallet is always shown onthe tool. All clusters with the index higher than 7 willbe orange because the colors in a pallet are limited.

One point should be stressed: Although you designyour clusters in a 2D-plot, the actual structure is multi-dimensional. Thus, it is recommended to view it fromseveral scatterplots.

• type 3 in the first field of the scatter components.Press plot X-Y. A new scatter plot appears withthe same clusters but seen from the other per-spective.

You may continue to design your clusters at anyscatterplot you want. The changed clusters will be au-tomatically repainted in the other scatterplots. Such

Page 15: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 15

multi-dimensional manipulation sounds challenging butafter you get used to scatterplots, you will really havea fun exploring the n-dimensional space !

• You can anytime delete the last designed cluster.Then all its pixels will be marked as not belong-ing to any cluster and repainted in black.

• You can also correct any of existing clusters.In that case you will be prompt to choose thecolor of the corrected cluster, then the ellipse forchoosing the new area will appear. Just drag itto the desired position and press store.

Important: the old pixels are not excluded from thecorrected cluster but are ADDED to those located inthe new ellipse. This way you can design clusters of avery complicated geometry.

You have now two clusters stored inEELScube MSA.dm3. You should proceed PCA again,this time in each cluster individually. To do that

• Open the proceed PCA tool by choosing temDM: MSA : proceed PCA.

• Input 2 in number components (it is sufficient).Click PCA. The package recognizes that dataare clustered and ask you whether you wish totreat clustered.

• Choose treat all clusters and press OK. Clusterswill be treated non-stop one after the other.

Do you see the difference ? Each cluster is now es-sentially a one-componential object. It shows a simplescatterplot and a very transparent loading. As previ-ously, close all displayed loadings, variances and scat-terplots without saving. Do not create mess on theDigitalMicrograph desk ! All these results are alreadystored in EELScube MSA.dm3.

So, clustering might be used to break the data ontothe elementary pieces of simply interpretable struc-tures.

Often, users are interested in one particular part ofan investigated area. In that case you might design the

Page 16: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

16 Pavel Potapov • temDM MSA basic version 1.9

only one cluster and then treat cluster 1 while the restof the data will be ignored.

Finally, you can make reconstruction using the clus-tered data.

• Click Reconstruct. A list of available clusterswill appear.

• Choose the desired clusters and the desired num-ber of components in each cluster. One compo-nent in each cluster is sufficient for EELScube.Press OK.

The reconstructed cube combines the pixels from allchosen clusters. The pixels which do not belong to anychosen clusters will appear dark.

7 VIEW ROTATED AND CLUSTEREDRESULTS

The view result tool can be used to display loadings,scores and scatterplots for your rotated or clustereddata.

• Open the view results tool by clicking temDM :MSA : view results.

• Choose cluster index. Index 0 implies NO clus-tering.

• Choose the rotation method in the drop down

menu.

• Use the appropriate buttons to view the desiredstuff.

You can also check the variance or kurtosis of theobtained rotated components. Get iteration will showyou the available information about how your rotatedcomponents were obtained. This is content-dependent.The free rotation has no iterations at all. The iterationin varimax is done for all components at once.

We have included extra methods in our extendedtreatment - clustering and rotation (and more methodslike alternating least-square (ALS) fit will be furtherexplored). This means we are now not within Prin-cipal Component Analysis (PCA) anymore but ratherstepped into the more general area - Multivariate Sta-tistical Analysis. (MSA).

8 ENDMEMBERS

You see that the MSA treatment can produce a lot ofresults to view and analyze. Maybe too much...

Review of such output implies strong abstract think-ing: the loadings are differential, i.e. counted fromsome average spectra; the scores may be both positiveand negative, et cet. et cet.

Can we come to some simpler, intuitively clear de-scription of the results ? Yes.

Try to recast it in terms of EndMembers. In such de-scription you deal with few well interpretable spectraand few maps showing the contribution of these spec-tra across the object. Use again the example EELScubeMSA where you have extracted at least 5 PCA com-ponents.

• Open the End Members tool by clicking temDM: MSA : end members.

• Click tune new while having EELScube MSA.dm3or EELScube COMP.dm3 in front.

A couple of new windows will appear. Find the 1-2 scatterplot. The red and green markers are locatedin the center of the scatter plot. They denote the po-sitions of two EndMember spectra in the factor spacewhile two other windows display the actual spectra.Try to drag the EndMember markers within the scat-terplot and see how the spectra will change.

Page 17: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 17

For your convenience, you may move the displayedspectra somewhere to the side of the DigitalMicro-graph working space, zoom the intensity scale and therange of the displayed energy channels using the usualtricks of DigitalMicrograph (scale is shifted by drag-ging mouse with left button down while the CNTR keyshould be hold to zoom the scale).

From the very name of the method you might guessthat the reasonable EndMember positions are usuallyfound at the periphery of your scatterplot. How to ar-rive at the right positions ?

By playing around you might discover some kind ofa “saddle point” where the spectrum looks particularlysimple. The deviation from that point results in theunphysical profile or in complification of the spectrum.In the given example, place the red marker such as the

peak at 530eV disappears.

Page 18: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

18 Pavel Potapov • temDM MSA basic version 1.9

Similarly, move the green marker a bit up-down andfind the position where the peak at 1310eV is NOTobserved at the corresponding spectrum.

Have you noticed that the significant portion of thescatterplot remains unattended ? This is because thisobject requires at least 3 EndMembers for the adequatedescription.

• Click add. The third (blue) EndMember markerwill appear with the corresponding spectrum.

As previously, try to find an optimal “saddle point”for its position. In this case it would be the conditionof the complete disappearance of the peak at 1530eV.

You might note that during the tune of EndMem-bers, a special image - “PCA fractions” - appears. Itshows the content of each EndMember in terms of theoriginal PCA components. Do not close it until youfinally tuned everything - this mixture matrix is impor-

tant for the tuning process: when it is opened, tune isactive.

Up to now you explored the only one scatterplot.It is OK for this particular example but usually in-sufficient for real-life objects. You must look at yourdataset from the perspective of SEVERAL scatterplots.

• Click plot X-Y. The fields at the right denote thedesired components to be plotted. A new scat-terplot will appear while your tuned EndMem-bers are already there.

Now you can have fun dragging the EndMembermarker at different scatterplots. They will move syn-chronously in both displays. Not bad exercise for then-dimensional thinking, is not it ?

Page 19: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 19

The 3-2 scatterplot is not very useful for the givenexample. The theoretically reasonable EndMembersshould all sit at the line coinciding with the 2nd PCAcomponent axis while the contribution along the 3rdone must be zero.

This scatterplot is however useful to learn anotheroption of the temDM MSA program - Alternating LeastSquare (ALS) fit.

Even with this simplified example, it is not possi-ble to check all and any scatterplots. If you followedthe present guide, you have already extracted at least 5PCA components - that is a 5-dimensional space ! Inreal systems, 5-10 dimensions are quite typical.

To handle it, temDM MSA package employs thecombination of the manual movement of EndMembersin the most important dimensions with the automaticALS fit along the other ones.

Just to play:

• displace one of the EndMember in the 3-2 scat-terplot outside the axis of the 2nd component.

• Click fit ALS. You see that the EndMember mark-ers order them self in one line. Click severaltimes fit ALS (ALS is an iterative method) andyou notice that this line will approach the axisof the 2nd component.

Finally, plot the maps showing the fractions of allEndMember spectra in the dataset.

• Click show maps. Three maps each correspond-ing the specific EndMember will appear. All theEndMember spectra will be displayed overlaidin the separate window. .

Page 20: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

20 Pavel Potapov • temDM MSA basic version 1.9

You probably guess that the 1st EndMember spec-trum corresponds to pure Al, the 2nd - to AlO and the3rd to MgO EELS spectra. The maps show nicely thedistribution of these compounds. You might combinethem into one colored map by using the standard Dig-italMicrograph tool - Color Mix.

9 SETTINGS

Some tools have a spanner icon giving you access tothe processing and displaying parameters. Here is ashort description of tunable parameters. Note that sev-eral tool box may have access to the same parameter.

import tool:

show HDF5: The tag structure of the Velox HDF5 for-mat may be shown. This helps to retrieve the standardnames for data pieces if they differ from the defaultones.default scale: Default scale for spatial calibration.default spatial origin: Default origin pixel in spatialcalibration.default scale unit: Default units for spatial calibration.default dispersion: Default dispersion for energy cali-bration.default energy origin: Default origin channel in energycalibration.

default dispersion unit: Default units for energy cali-bration.

filtering tool:

default binning: Default binning value.default spatial sigma: Default standard deviation sigmain Gaussian smearing among pixels.default energy sigma: Default standard deviation sigmain Gaussian smearing among energy channels.clip after spatial filtering: Filtering is not perfect atthe border of spectrum image because there are lessneighbors to average. You may choose to clip the bor-der pixel by checking this box:clip after energy filtering: The same for energy filter-ing - you may clip the border energy channels

proceed PCA tool:

PCA convergence: Convergence of the NIPALS algo-rithm of PCA can be estimated either from the angu-lar difference between two subsequently found loadingvectors (loading spectra are vectors in the n-dimensionalenergy space) or from the difference in their eigenval-ues.PCA max deviation: Criterion for stopping iterationsin the NIPALS algorithm - maximal allowed differencebetween two subsequent iterations. Here: 0.02 rad (ifangular convergence is chosen) or 0.02% (in the caseof value convergence).PCA max iterations: Maximal number of iterations in

Page 21: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 21

the NIPALS algorithm of PCA. If it is reached, the pro-cedure will be stopped even if the stopping criterion isstill not satisfied. The warning message “Precision ?”will be issued. This usually happens when exploringthe pure noise components.

view results tool:

sparsity in X-Y plot: Sparsity parameter defines a meshto fill in a X-Y scatterplot. In a too sparse plot, tinyblack pixels might be hardly visible. A too dense plotlooks rough. Here the number of the empty (white)pixels will be 10 times the number of the occupiedones.max displayed loadings: Maximal number of displayedloads (too many slices might make the display hard tohandle).scores cubed: Scores can be displayed as many indi-vidual images or as one datacube.

rotation tool:

ICA:max deviation: ICA convergence criterion: max-imal angular difference between two subsequently it-erated loadings, here: 0.02 rad.ICA:max iterations Maximal number of iterations in

the fast ICA algorithm.ICA:contrast function: fast ICA algorithm requires acertain “contrast” function for iterations. temDM MSAcan use cubical polynomial (p3),hyperbolic-tangential (tanh) or exponential (exp) con-trast functions.VRM:max deviation: Varimax convergence criterion:maximal angular difference between two subsequentlyiterated loadings, here: 0.02 radVRM:max iterations: Maximal number of iterations inthe Varimax algorithm.

end members tool:

sparsity in X-Y plot: Sparsity parameter defines a meshto fill in a X-Y scatterplot. In a too sparse plot, tinyblack pixels might be hardly visible. A too dense plotlooks rough. Here the number of the empty (white)pixels will be 10 times the number of the occupiedones.member pointer size: The size of the marker pointingthe position of loading in the X-Y scatterplot.lambda: Regularization parameter in Alternate LeastSquare (ALS) fit.maps cubed: Maps can be displayed as many individ-ual images or as one data cube

clustering tool:

sparsity in X-Y plot: Sparsity parameter defines a mesh

Page 22: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

22 Pavel Potapov • temDM MSA basic version 1.9

to fill in a X-Y scatterplot. In a too sparse plot, tinyblack pixels might be hardly visible. A too dense plotlooks rough. Here the number of the empty (white)pixels will be 10 times the number of the occupiedones.ellipse points: Ellipse for capturing clusters may beplotted with different number of points (here 10 points).The larger number of points makes the shape smootherbut works slower.ellipse gap: To move easier the ellipse its axes extendfew pixels outside the ellipse itself (here 3 pixels out-side).mask cubed: Cluster masks can be displayed as manyindividual images or as one datacube

10 SCRIPTING SUPPORT

You might design your own automatic treatment flowif you know the basics of DigitalMicrograph scripting.The package temDM MSA extends the library of Dig-italMicrograph scripting language by adding the spe-cific MSA functions. The file temDM MSA script-ing.chm included in the distribution package consistsof description of the available commands for MSA.

There are a couple of scripting examples in the dis-tribution package. Note that the temDM MSA com-mands consist of “MSA_112” in their name. The restcommands are the part of the standard DigitalMicro-graph scripting interface.

The following example simulates an “EFTEM se-ries” from a STEM datacube.

//This script simulates an "EFTEMseries" from a STEM datacube

//Energy limits for EFTEM seriesnumber E_Start_eV = 1230number E_End_eV = 1380number E_Step_eV = 50number E_Window_eV = 40

//get DataCubeimage DataCube := getFrontImage()

//ead its dimensionsnumber width,height,depthDataCube.get3Dsize(width,height,depth)

//read energy calibrationnumber dispersion = DataCube.

ImageGetDimensionScale(2)number origin = DataCube.

ImageGetDimensionOrigin(2)

result("\ndispersion "+dispersion +"origin "+Origin)

//itterate through energy and integratewithin an energy window

for (number E_eV = E_Start_eV; E_eV <=E_End_eV;E_eV += E_Step_eV)

{//convert from eV to channelsnumber Start = MSA112_eVtoCh(

E_eV-E_Window_eV/2,origin,dispersion)

number End = MSA112_eVtoCh(E_eV+E_Window_eV/2,origin,dispersion)

result("\n"+E_eV)

//select a section from theDataCube

image CubeSection = DataCube[0,0,Start,width,height,End]

//render along energy axiaimage Eftem =

MSA112_Render3DtoFront(CubeSection)

//set name ans displayEftem.setName("EFTEM "+E_eV+"eV

")Eftem.showimage()}

The next script investigates the dependence of theeigenvalues on the number of pixels in the dataset.

//This function generates a random maskimage randomCluster(number N, number

fract){image Cluster:= realImage("

cluster",4,1,N)//mask image of total N pixels

number range = 1/(1-fract)

Cluster = random()*rangeCluster = tert(cluster>1,1,0)//fract*N of pixels are 1, (1-

fract)*N pixels are 0//choice is random

return cluster}

//parameters for convergencenumber deviaSpec = 0.02number maxItt = 50

Page 23: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

Pavel Potapov • temDM MSA basic version 1.9 23

//get an MSA image where thedataMatriox is kept as an attachment

image MSAimage := getFrontImage()image DataMatrix = MSA112_GetMSAMatrix(

MSAimage)

//get the number of pixelsnumber m = DataMatrix.

ImageGetDimensionSize(1)

//define range and steps forfractioning the dataMatrix

number fractIni = 0.1number fractFin = 1number fractStep = 0.1

//fractioningfor(number fract = fractIni;fract <=

fractFin;fract = fract + fractStep){//make a random mask to select

a certain fraction fromDataMatrix

image Cluster:= integerImage("cluster",2,0,1,m)

Cluster = randomCluster(m,fract)

number actualFract = mean(Cluster)

//actual fraction can slightlydiffer from the nominal one

result("\nfraction "+actualFract)

//apply the mask to DataMatriximage DataMatrixSmaller =

MSA112_TakeCluster(dataMatrix,1,cluster,1)

//1: docentering;0: do not

//mask withindex 1 ischosen

//declare values and imagesnumber EigenValue //

eigenvalue of a componentimage Pvec,Tvec //loading and

score of a componentnumber Nitt //

number of itterationsrequired for convergence

//extract 1st PCA componentNitt=MSA112_FindNIPALS(

DataMatrixSmaller,Pvec,Tvec,EigenValue,0,deviaSpec,maxItt)

EigenValue /= (m*actualFract)//now the eigenvalue is the

variance within a componentresult(" eigenValue1 "+

EigenValue)

//reduce DataMatrix by the 1stPCA component

DataMatrixSmaller =MSA112_ReduceDataMatrix(DataMatrixSmaller,Pvec,Tvec)

//extract 2nd PCA componentNitt = MSA112_FindNIPALS(

DataMatrixSmaller,Pvec,Tvec,EigenValue,0,deviaSpec,maxItt)

EigenValue /= (m*actualFract)//now the eigenvalue is the

variance within a componentresult(" eigenValue2 "+

EigenValue)}

11 TROUBLESHOOT

An error with the message “ambiguous function” isgenerated.

Most probably you have loaded tem DM MSA.gtktwice. Localize all Plugins folders with script findplugins folders.s and check them. Only one tem DMMSA.gtk plugin must be there.

You cannot load a temDM MSA tool from the menu.There is a failure message or the tool frame simplydoes not appear.

Last time, the tool has finish functioning incorrectly.Open the tool again with holding the “shift” key. Thetool will be refreshed.

You played too much with the temDM MSA settings.The program now fails or runs too long. You have noidea what have you changed

Choose New Script in the DigitalMicrograph menu.A new text window appears. Type there MSA112_delete()and press Execute. The default setting will be restored.

Page 24: temDM MSA basic version · Pavel Potapov temDM MSA basic version 1.9 3 spectrum-image was generated noisy by purpose. You may drag the marker across the image - the spectrum will

24 Pavel Potapov • temDM MSA basic version 1.9

REFERENCES

[1] P. Potapov. Why principal component analysis of stemspectrum images results in abstract, uninterpretableloadings? Ultramicroscopy, 160:197–212, 2016.

[2] P. Potapov, P. Longo, and E. Okunishi. Enhancementof noisy edx hrstem spectrum-images by combinationof filtering and pca. Micron, 96:29–37, 2017.

[3] P. Potapov. On the loss of information in pcaof spectrum-images. Ultramicroscopy, 182:191–194,2017.