{"id":553287,"date":"2022-07-05T02:34:17","date_gmt":"2022-07-05T06:34:17","guid":{"rendered":"https:\/\/gijn.org\/?p=553287"},"modified":"2023-06-25T07:31:09","modified_gmt":"2023-06-25T11:31:09","slug":"kodlama-becerisi-gerektirmeyen-ucretsiz-veri-cekme-araclari","status":"publish","type":"post","link":"https:\/\/gijn.org\/tr\/kaynak\/kodlama-becerisi-gerektirmeyen-ucretsiz-veri-cekme-araclari\/","title":{"rendered":"Kodlama Becerisi Gerektirmeyen \u00dccretsiz Veri \u00c7ekme Ara\u00e7lar\u0131"},"content":{"rendered":"<p><a href=\"https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/shutterstock_2158660431-1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-543103\" src=\"https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/shutterstock_2158660431-1.jpg\" alt=\"\" width=\"771\" height=\"514\" srcset=\"https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/shutterstock_2158660431-1.jpg 1000w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/shutterstock_2158660431-1-336x224.jpg 336w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/shutterstock_2158660431-1-771x514.jpg 771w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/shutterstock_2158660431-1-768x512.jpg 768w\" sizes=\"auto, (max-width: 771px) 100vw, 771px\" \/><\/a><\/p>\n<p>Ara\u015ft\u0131rmac\u0131 gazeteciler i\u00e7in en son ipu\u00e7lar\u0131n\u0131 ve ara\u00e7lar\u0131 ara\u015ft\u0131rd\u0131\u011f\u0131m\u0131z <a href=\"https:\/\/gijn.org\/series\/the-toolbox\/\">GIJN Ara\u00e7 Kutusu<\/a>&#8216;na tekrar ho\u015f geldiniz. Bu yaz\u0131da muhabirlerin belgelerden veri kaz\u0131mak i\u00e7in kullanabilecekleri \u00fc\u00e7 \u00fccretsiz arac\u0131 ve nispeten kolay \u00e7\u00f6z\u00fcm y\u00f6ntemleri ke\u015ffedece\u011fiz. Bu teknikler <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/www.ire.org\/training\/conferences\/ire-2022\/\">2022 Ara\u015ft\u0131rmac\u0131 Muhabirler ve Edit\u00f6rler konferans\u0131nda (IRE22)<\/a> anlat\u0131ld\u0131.\u00a0 Gazeteciler b\u00fcy\u00fck ilgi g\u00f6sterdi. Muhabirler ara\u015ft\u0131rmalar\u0131 i\u00e7in ihtiya\u00e7 duyduklar\u0131 verileri nihayet elde ettiklerinde, genellikle ikinci bir sorunla kar\u015f\u0131 kar\u015f\u0131ya kal\u0131rlar: bu verilerin nas\u0131l se\u00e7ilece\u011fi ve \u00e7\u0131kar\u0131laca\u011f\u0131, b\u00f6ylece e-tablolara aktar\u0131l\u0131p nas\u0131l kullan\u0131laca\u011f\u0131. Bir\u00e7ok k\u00fc\u00e7\u00fck haber odas\u0131 i\u00e7in manuel giri\u015f, geli\u015fmi\u015f kodlama veya maliyetli ticari OCR (optik karakter tan\u0131ma) hizmetleri ger\u00e7ek\u00e7i veri kaz\u0131ma se\u00e7ene\u011fi olmayabilir.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-539206 alignright\" src=\"https:\/\/gijn.org\/wp-content\/uploads\/2022\/06\/IRE22-logo.png\" alt=\"IRE22 logosu\" width=\"183\" height=\"170\" \/>Dahas\u0131, IRE22&#8217;deki birka\u00e7 k\u0131demli g\u00f6zlemci gazeteci, taranm\u0131\u015f belgeler veya &#8220;d\u00fcz&#8221; PDF&#8217;ler gibi yap\u0131land\u0131r\u0131lmam\u0131\u015f veya &#8220;\u00f6l\u00fc&#8221; bi\u00e7imlerde yay\u0131nlanan kamuya a\u00e7\u0131k belgelerin miktar\u0131nda bir art\u0131\u015f g\u00f6rmediklerini, ayn\u0131 zamanda baz\u0131 devlet kurumlar\u0131n\u0131n kas\u0131tl\u0131 olarak kulland\u0131klar\u0131n\u0131 kaydetti. Bu formatlar habercililk s\u00fcrecine y\u00fck bindiriyor.<\/p>\n<p>Son bir meydan okumada, d\u00fcnya \u00e7ap\u0131ndaki bir\u00e7ok ajans, muhabir istenen veriler i\u00e7in web sayfalar\u0131n\u0131 kontrol ederler,\u00a0 bunu tek tek kutular\u0131 tablolara kopyalay\u0131p yap\u0131\u015ft\u0131r\u0131rlar ve tam veri setinin sonuna ula\u015fmak i\u00e7in \u00e7ok say\u0131da sekmeyi veya sayfay\u0131 manuel olarak t\u0131klamalar\u0131 gerekir.<\/p>\n<p class=\"hnews item post-542068 post type-post status-publish format-standard has-post-thumbnail hentry category-africa category-english category-resources category-tips-tools tag-data-mining tag-data-scraping tag-documental tag-freedom-of-information-2 tag-investigative-journalism tag-investigative-reporting tag-open-source-tools tag-pdf-extraction tag-spreadsheets tag-state-secrecy prominence-top-story series-the-toolbox featured-media featured-media-image\"><span class=\"\">Bug\u00fcn ABD&#8217;de ara\u015ft\u0131rmac\u0131 gazeteci olan\u00a0<\/span><a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/kennyjacoby\"><span class=\"\">Kenny Jacoby<\/span><\/a><span class=\"\"> , &#8220;Bir ton kamuya a\u00e7\u0131k kay\u0131t talebinde bulunuyorum ve istedi\u011fim belge veya verileri istedi\u011fim formatta almam\u0131n art\u0131k son derece nadir oldu\u011funu g\u00f6r\u00fcyorum&#8221; dedi .<\/span><span class=\"\">\u00a0<\/span>&#8220;Bazen size belgeyi veren ajans kas\u0131tl\u0131 olarak hayat\u0131n\u0131z\u0131 zorla\u015ft\u0131rmak istiyormu\u015f gibi g\u00f6r\u00fcn\u00fcyor metni bir PDF&#8217;den \u00e7\u0131kar\u0131rlar veya g\u00f6ndermeden \u00f6nce tararlar veya veriler s\u00fctunsuz ve yap\u0131land\u0131r\u0131lmam\u0131\u015f bir bi\u00e7imdedir. Bu engeller bizi ger\u00e7ekten yava\u015flatabilir, bu y\u00fczden bunlarla ba\u015fa \u00e7\u0131kmak i\u00e7in ara\u00e7lara sahip olmak \u00f6nemlidir.\u201d<\/p>\n<h4><strong>Google Pinpoint ve PDF&#8217;leri Fethetmek i\u00e7in Yeni \u00d6zellikleri<\/strong><\/h4>\n<p>2020&#8217;de GIJN, Google Journalist Studio&#8217;dan yeni bir AI destekli belge ayr\u0131\u015ft\u0131rma arac\u0131n\u0131n\u00a0<a href=\"https:\/\/gijn.org\/2020\/10\/26\/tools-for-campaign-sources-disinfo-spying-ai-search-and-election-day-scenarios-from-gijnelectionwatchdog\/\">kullan\u0131ma sunuldu\u011funu<\/a>\u00a0ilk duyuranlardan biriydi ve \u015fimdi &#8221;\u00a0<a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/journaliststudio.google.com\/pinpoint\/collections\">Pinpoint<\/a>\u00a0&#8221; olarak markaland\u0131.\u00a0Yeni piyasaya s\u00fcr\u00fclen arac\u0131, \u00e7ok say\u0131da belge ve resimde h\u0131zla arama yapabilen geli\u015fmi\u015f OCR&#8217;ye sahip &#8220;turbo \u015farjl\u0131 bir Ctrl-F&#8221; i\u015flevi olarak tan\u0131mlad\u0131k.\u00a0IRE22&#8217;deki bir veri oturumunda Jacoby, Pinpoint&#8217;in o zamandan beri profesyonel gazeteciler i\u00e7in kolay eri\u015fime sahip \u00fccretsiz, dijital bir ana araca d\u00f6n\u00fc\u015ft\u00fc\u011f\u00fcn\u00fc s\u00f6yledi, k\u0131smen geli\u015ftiricilerinin ara\u015ft\u0131rmac\u0131 gazetecilerin girdileri sayesinde.Jacoby, Pinpoint&#8217;in veri \u00f6zelliklerinin art\u0131k \u015funlar\u0131 i\u00e7erdi\u011fini g\u00f6sterdi:<\/p>\n<ul>\n<li>&#8220;Fak\u00fclte&#8221; gibi tek bir anahtar kelime ararsan\u0131z, bu kelimeyi yaln\u0131zca y\u00fckledi\u011finiz ara\u015ft\u0131rma dosyan\u0131zda bulmakla kalmaz, ayn\u0131 zamanda &#8220;\u00f6\u011fretmen&#8221; veya &#8220;kamp\u00fcs&#8221; veya &#8220;profes\u00f6r&#8221; gibi ilgili kelimeleri de vurgular. Ayr\u0131ca aranan terim i\u00e7in agresif varyasyonlar\u0131 da bulur; Portekizce, \u0130spanyolca, Frans\u0131zca ve Leh\u00e7e dahil yedi dili destekliyor ve istenmeyen terimleri eksi i\u015faretiyle hari\u00e7 tutabilir.<\/li>\n<li>Taranm\u0131\u015f veya PDF belgeleri demetleri &#8211; hatta elle yaz\u0131lm\u0131\u015f karalama sayfalar\u0131 &#8211; y\u00fckleyin ve bunlar\u0131 h\u0131zla &#8220;canl\u0131&#8221;, aranabilir, kopyalanabilir metin belgelerine d\u00f6n\u00fc\u015ft\u00fcrebilir.\u00a0Yataydan farkl\u0131 y\u00f6nlerde \u00e7al\u0131\u015fan kelimeleri bile okur.<\/li>\n<\/ul>\n<ul>\n<li>Ara\u00e7, yaln\u0131zca g\u00f6r\u00fcnt\u00fclerdeki tabelalar\u0131 veya grafitileri tan\u0131y\u0131p metne d\u00f6n\u00fc\u015ft\u00fcrmekle kalmayacak, ayn\u0131 zamanda g\u00f6r\u00fcnt\u00fclerin arka plan\u0131nda plaketlerde veya duyuru panolar\u0131nda fark etti\u011fi uzun k\u00fc\u00e7\u00fck metin pasajlar\u0131n\u0131 yeniden \u00fcretebilir.\u00a0(Pinpoint demosu s\u0131ras\u0131nda, yo\u011fun, a\u00e7\u0131l\u0131 bir biyografik plaket \u00fczerindeki k\u00fc\u00e7\u00fck yaz\u0131lar\u0131 tek bir foto\u011frafta okuyup i\u015fleyebildi\u011finde gazeteci kat\u0131l\u0131mc\u0131lar\u0131ndan sesli bir nefes geldi. Bir NBC Telemundo muhabiri\u00a0<a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/ValezkaGil\">Valezka Gil<\/a>\u00a0, &#8220;Aman Tanr\u0131m! Sen! az \u00f6nce hayat\u0131m\u0131 de\u011fi\u015ftirdim \u2013 bana \u00e7ok zaman kazand\u0131racak.\u201d)<\/li>\n<li>Jacoby, sesli ve g\u00f6r\u00fcnt\u00fcl\u00fc de\u015fifre \u00f6zelli\u011finin art\u0131k o kadar geli\u015fmi\u015f oldu\u011funu ve sesli r\u00f6portajlar\u0131n\u0131n aranabilir de\u015fifrelerini olu\u015fturmak i\u00e7in\u00a0<a href=\"https:\/\/gijn.org\/2022\/01\/27\/how-secure-are-journalists-favorite-transcription-tools\/\">Trint veya Otter gibi<\/a> k\u00fc\u00e7\u00fck abonelik \u00fccretleri olan \u00f6zel de\u015fifre hizmetleri yerine \u00fccretsiz Pinpoint arac\u0131n\u0131 kulland\u0131\u011f\u0131n\u0131 s\u00f6yl\u00fcyor. &#8220;Bu tek \u00f6zellik, o ara\u00e7lara benziyor, ancak \u00fccretsiz&#8221; dedi. \u201cTrint ve Otter&#8217;\u0131n yapmad\u0131\u011f\u0131 bir \u015fey, kimin konu\u015ftu\u011funu tan\u0131mlamamas\u0131 ve her ki\u015fiye bir isim atamamas\u0131 \u00f6rne\u011fin\u00a0 &#8216;Hoparl\u00f6r 2&#8217; gibi. Ancak konu\u015fmadaki mant\u0131ksal k\u0131r\u0131lmalar\u0131 ve seslerdeki b\u00fck\u00fclme noktalar\u0131n\u0131 belirler. Metin transkriptinde bir noktaya t\u0131klayabilirsiniz ve o noktada oynatmaya ba\u015flayacakt\u0131r.\u201d<\/li>\n<\/ul>\n<p>Jacoby, Pinpoint&#8217;in \u00f6zelliklerine \u00fccretsiz eri\u015fimin art\u0131k \u00e7ok kolay oldu\u011funu ve teknisyenlerinden b\u00fcy\u00fck projeler i\u00e7in ekstra depolama talep edilebilece\u011fini s\u00f6yledi.&#8221;Kullanmak i\u00e7in onay alman\u0131z gerekiyor, ancak ben ve kar\u0131m\u00a0 gazeteci oldu\u011fumuz i\u00e7in kaydoldu\u011fumuzda neredeyse an\u0131nda onayland\u0131k&#8221; dedi. &#8220;Bir i\u015f e-posta adresine ihtiyac\u0131n\u0131z olabilir, ancak i\u00e7eri girmek zor de\u011fil ve oradaki ekip \u00e7ok duyarl\u0131.&#8221;Dezavantaj\u0131? Pinpoint tamamen \u00e7evrim i\u00e7i bir hizmet. &#8220;Bu, bir internet ba\u011flant\u0131s\u0131na ihtiyac\u0131n\u0131z oldu\u011fu ve belgelerinizi bir yerde bir sunucuya y\u00fckledi\u011finiz anlam\u0131na gelir ve diyelim ki Google mahkeme celbi ald\u0131ysa belgelerinizin teslim edilmesi olas\u0131d\u0131r&#8221; diye uyard\u0131. \u201cAyr\u0131ca, OCR belgesinin bir kopyas\u0131n\u0131 indirmenize izin vermiyor Pinpoint&#8217;te ya\u015f\u0131yor, bu y\u00fczden metni kopyalay\u0131p yap\u0131\u015ft\u0131rman\u0131z gerekiyor. Ama muhtemelen sekt\u00f6rdeki en iyi OCR&#8217;ye sahip.&#8221;<\/p>\n<div id=\"attachment_541329\" class=\"wp-caption module image alignnone\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-541329 size-large\" src=\"https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/Pinpoint-Plaque-771x393.png\" alt=\"\" width=\"771\" height=\"393\" srcset=\"https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/Pinpoint-Plaque-771x393.png 771w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/Pinpoint-Plaque-336x171.png 336w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/Pinpoint-Plaque-768x391.png 768w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/Pinpoint-Plaque-1170x596.png 1170w, https:\/\/gijn.org\/wp-content\/uploads\/2022\/07\/Pinpoint-Plaque.png 1385w\" sizes=\"auto, (max-width: 771px) 100vw, 771px\" \/>Kenny JacobyIRE22&#8217;deki gazeteciler, \u00fccretsiz Google Pinpoint arac\u0131n\u0131n optik karakter tan\u0131ma (OCR) \u00f6zelli\u011finin, bu foto\u011fraftaki mavi biyografik plaket \u00fczerindeki yaz\u0131 kadar k\u00fc\u00e7\u00fck metinleri okumak ve kopyalamak i\u00e7in yeterince g\u00fc\u00e7l\u00fc oldu\u011funu \u00f6\u011frenince \u015fa\u015f\u0131rd\u0131lar.\u00a0Resim: Kenny Jacoby<\/p>\n<\/div>\n<h4><\/h4>\n<h4>Web Sitelerindeki Veriler i\u00e7in ImportHTML\/ XML Hack<\/h4>\n<p>ProPublica&#8217;dan Craig Silverman&#8217;\u0131n\u00a0<a href=\"https:\/\/gijn.org\/2022\/04\/04\/elections-guide-for-investigative-reporters-chapter-1-new-election-digging-tools\/\">k\u0131sa s\u00fcre \u00f6nce GIJN i\u00e7in g\u00f6sterdi\u011fi gibi<\/a> : Herhangi bir web sitesinin arkas\u0131ndaki kaynak kodu, ara\u015ft\u0131rmac\u0131 gazeteciler i\u00e7in \u00e7ok say\u0131da kazma arac\u0131 sa\u011flar ve kodlay\u0131c\u0131 olmayanlar i\u00e7in \u00fcrk\u00fct\u00fcc\u00fc g\u00f6r\u00fcn\u00fcm\u00fcne ra\u011fmen &#8220;Control-F&#8221; veya &#8220;Command-&#8221; d\u0131\u015f\u0131nda hi\u00e7bir beceri gerektirmez. F\u201d, gezinmek i\u00e7in. Google E-Tablolar&#8217;da IRE22&#8217;de yap\u0131lan bir oturumda, serbest gazeteci <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/SamanthaSunne\">Samantha Sunne<\/a> , bu kodun web sitelerindeki uzun tablolar\u0131 veya belirli veri \u00f6\u011felerini kolayca almak ve saniyeler i\u00e7inde t\u00fcm verilerini ihtiyac\u0131n\u0131z olan bi\u00e7imde doldurmak i\u00e7in nas\u0131l kullan\u0131labilece\u011fini g\u00f6sterdi. Bir elektronik tabloda. Dosyan\u0131za y\u00fczlerce kutuyu tek tek kopyalay\u0131p yap\u0131\u015ft\u0131rman\u0131za gerek yok. Teknik, Google E-Tablolar&#8217;a\u00a0 ilk, sol \u00fcst kutusunda ihtiyac\u0131n\u0131z olan bir kaynak kod \u00f6\u011fesini bir web sayfas\u0131ndan (\u00f6rne\u011fin, kar\u015f\u0131daki sayfada be\u011fendi\u011finiz bir veri tablosu olu\u015fturan kod \u00e7\u0131karmak i\u00e7in talimat veren bir form\u00fcl yaz\u0131l\u0131r. Asl\u0131nda, herhangi bir sitede iyi bi\u00e7imlendirilmi\u015f bir veri tablosu \u00e7\u0131karmak i\u00e7in ger\u00e7ekten herhangi bir koda bakman\u0131z gerekmez. Sadece \u015fu ad\u0131mlar\u0131 izleyin:<\/p>\n<blockquote><p>Bir web sayfas\u0131ndan tek bir veri tablosunu i\u00e7e aktarmak i\u00e7in \u2013 ne kadar uzun olursa olsun \u2013 a\u015fa\u011f\u0131daki form\u00fcl\u00fc Google E-Tablolar&#8217;a yazman\u0131z yeterlidir: =IMPORTHTML(\u201cURL\u201d, \u201ctablo\u201d) Veriler bir liste olarak bi\u00e7imlendirilmi\u015fse, \u201cliste\u201dyi deneyin. \u201d yerine \u201ctablo\u201d \u2013 ve \u00f6rne\u011fin bir sayfadaki ikinci listeyi istiyorsan\u0131z, virg\u00fcl ve bo\u015fluktan sonra 2 rakam\u0131n\u0131 eklemeyi deneyin: =IMPORTHTML(\u201cURL\u201d, \u201clist\u201d, 2)<\/p><\/blockquote>\n<p><a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/www.fdic.gov\/resources\/resolutions\/bank-failures\/failed-bank-list\/\">GIJN, US Federal Deposit Insurance Corp.&#8217;un web sitesinden 564 ba\u015far\u0131s\u0131z bankan\u0131n bulundu\u011fu<\/a> bir tabloyu i\u00e7e aktarmak i\u00e7in bu hack&#8217;i denedi\u011finde t\u00fcm s\u00fcre\u00e7 FDIC URL&#8217;sini kopyalamaktan Google E-Tablolar&#8217;\u0131 a\u00e7maya ve t\u00fcm banka listesini m\u00fckemmel bir \u015fekilde s\u00fctunlar halinde bi\u00e7imlendirmeye kadar\u00a0 15 saniyeden az s\u00fcrd\u00fc. Ancak, URL&#8217;den sonra bir virg\u00fcl ve parantez i\u00e7indeki iki \u00f6\u011fenin etraf\u0131nda t\u0131rnak i\u015faretleri dahil olmak \u00fczere form\u00fcl i\u00e7in gereken tam noktalama i\u015faretlerini kullanmak \u00f6nemlidir. Dikkat \u00e7ekici bir \u015fekilde, canl\u0131 web sitesi verilerinde yap\u0131lan g\u00fcncellemeler de otomatik olarak Google E-Tablosunda g\u00f6r\u00fcnecektir &#8211; bu nedenle, bu g\u00fcncelleme i\u015flevini devre d\u0131\u015f\u0131 b\u0131rakmad\u0131\u011f\u0131n\u0131z s\u00fcrece, ara\u015ft\u0131rman\u0131z s\u0131ras\u0131nda sayfay\u0131 s\u00fcrekli kontrol etmeniz gerekmez.<\/p>\n<p>Yine de Sunne, muhabirlerin html \u00f6\u011felerine en az\u0131ndan biraz a\u015fina olmalar\u0131n\u0131n, bilgisayarlar\u0131n kar\u015f\u0131l\u0131kl\u0131 sayfalarda g\u00f6rd\u00fc\u011f\u00fcm\u00fcz verileri nas\u0131l paketlediklerini anlamalar\u0131n\u0131n, hatal\u0131 bi\u00e7imlendirilmi\u015f bilgileri i\u015flemeyi kolayla\u015ft\u0131rman\u0131n ve daha fazlas\u0131n\u0131 kazman\u0131n \u00f6nemli oldu\u011funu s\u00f6yledi. Daha geli\u015fmi\u015f form\u00fcllerle daha derine inin.<\/p>\n<p>Herhangi bir sayfay\u0131 olu\u015fturan kodu bulmak i\u00e7in sitedeki herhangi bir bo\u015f veya beyaz alana sa\u011f t\u0131klay\u0131n ve &#8220;sayfa kayna\u011f\u0131n\u0131 g\u00f6r\u00fcnt\u00fcle&#8221; veya &#8220;sayfa kayna\u011f\u0131n\u0131 g\u00f6ster&#8221; se\u00e7ene\u011fine t\u0131klay\u0131n. Genel olarak, hat\u0131rlanmas\u0131 gereken \u00f6nemli nokta, insan odakl\u0131 web sayfas\u0131nda g\u00f6rd\u00fc\u011f\u00fcn\u00fcz t\u00fcm kelimelerin bilgisayar kaynak kodu sayfas\u0131nda da g\u00f6r\u00fcnmesi gerekti\u011fini s\u00f6yledi, b\u00f6ylece herhangi bir veri terimini bulmak i\u00e7in basit\u00e7e \u201cCtrl-F\u201d yapabilirsiniz.<\/p>\n<p>Kodu, onu yakalamak i\u00e7in hangi \u00f6\u011fe etiketlerinin kullan\u0131ld\u0131\u011f\u0131n\u0131 g\u00f6r\u00fcn ve form\u00fcldeki bu etiketlerle denemeler yap\u0131n. Sunne, &#8220;Yararl\u0131 olsa da, ImportHTML form\u00fcl\u00fc yaln\u0131zca tablolar\u0131 ve listeleri \u00e7ekebilir ancak ba\u015fka bir form\u00fcl, ImportXML, herhangi bir html \u00f6\u011fesini \u00e7ekebilir,&#8221; diye a\u00e7\u0131klad\u0131. \u201c\u00c7ok benziyor\u00a0 e\u015fittir i\u015fareti; form\u00fcl ad\u0131, URL ancak \u00e7ok daha spesifik olabilirsiniz.\u201d Bunu nas\u0131l yapaca\u011f\u0131n\u0131z a\u015fa\u011f\u0131da a\u00e7\u0131klanm\u0131\u015ft\u0131r:<\/p>\n<blockquote><p>Bir web sayfas\u0131ndaki belirli veri \u00f6\u011felerini i\u00e7e aktarmak i\u00e7in \u2013 tek tek tablo sat\u0131rlar\u0131 veya yaln\u0131zca kal\u0131n metin veya ba\u015fl\u0131klar gibi \u2013 a\u015fa\u011f\u0131daki gibi bir form\u00fcl deneyin (veri ba\u015fl\u0131klar\u0131 \u00f6rne\u011fi i\u00e7in): =IMPORTXML(\u201cURL\u201d, \u201c\/\/h2\u201d) veya bu (tablo sat\u0131rlar\u0131 i\u00e7in): =IMPORTXML(\u201cURL\u201d, \u201c\/\/table\/tr\u201d)<\/p><\/blockquote>\n<p><a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/www.codecademy.com\/article\/glossary-html\">Muhabirlerin html s\u00f6zl\u00fcklerinde<\/a> bulabilece\u011fi \u201c\/\/h2\u201d (ba\u015fl\u0131k) ve \u201c\/tr\u201d (tablo sat\u0131r\u0131) gibi yayg\u0131n olarak kullan\u0131lan bir\u00e7ok html \u00f6\u011fesi vard\u0131r ancak Sunne gazetecilerin verileri \u00e7evreleyen \u00f6\u011feleri basit\u00e7e not etmelerini \u00f6nerir. \u0130htiya\u00e7 duyarlar ve bir sonraki veri i\u00e7e aktarmalar\u0131n\u0131 iyile\u015ftirmeye yard\u0131mc\u0131 olabilecek temel bilgisayar jargon etiketlerini tan\u0131mlarlar. Pratik yapmak i\u00e7in genellikle birka\u00e7 veri listesi ve tablosuna sahip olan b\u00fcy\u00fck Wikipedia sitelerinde <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/www.youtube.com\/watch?v=7B4tPczv-H8\">bu iki veri kaz\u0131ma tekni\u011fini kullanmay\u0131 deneyin.<\/a><\/p>\n<h4>\u00c7evrimd\u0131\u015f\u0131 Verileri G\u00fcvenli Bir \u015eekilde Ay\u0131klamak i\u00e7in ImageMagick ile Tesseract<\/h4>\n<p>USA Today&#8217;den Kenny Jacoby,\u00a0<a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/github.com\/tesseract-ocr\/tesseract\">Tesseract<\/a> adl\u0131 a\u00e7\u0131k kaynakl\u0131 bir OCR motorunun , giri\u015f verilerinin kalitesi yeterince iyiyse, hassas belgeler ve b\u00fcy\u00fck veri ar\u015fivleri i\u00e7in harika bir veri \u00e7\u0131karma \u00e7\u00f6z\u00fcm\u00fc sundu\u011funu s\u00f6yledi. Dikkat \u00e7ekici bir \u015fekilde, en son s\u00fcr\u00fcm\u00fc <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Tesseract_(software)\">100&#8217;den fazla dili<\/a>\u00a0ve \u0130branice veya Arap\u00e7a sa\u011fdan sola yaz\u0131lan metinleri de tan\u0131r.<\/p>\n<p>Tesseract, metin katman\u0131 olmayan g\u00f6r\u00fcnt\u00fcleri se\u00e7ilebilir ve aranabilir PDF&#8217;lere d\u00f6n\u00fc\u015ft\u00fcr\u00fcr ve Jacoby, \u00f6zellikle b\u00fcy\u00fck toplu &#8220;d\u00fcz&#8221; belgeleri canl\u0131, kopyalanabilir metne d\u00f6n\u00fc\u015ft\u00fcrmede g\u00fc\u00e7l\u00fc oldu\u011funu s\u00f6yledi. Bunun genel olarak, muhabirlerin \u00f6nce PDF belgelerini y\u00fcksek \u00e7\u00f6z\u00fcn\u00fcrl\u00fckl\u00fc g\u00f6r\u00fcnt\u00fclere ideal olarak, a\u00e7\u0131k kaynakl\u0131 <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/imagemagick.org\/index.php\">ImageMagick arac\u0131n\u0131<\/a> kullanarak d\u00f6n\u00fc\u015ft\u00fcrmesi ve ard\u0131ndan kaz\u0131nm\u0131\u015f verileri elde etmek i\u00e7in bunlar\u0131 Tesseract&#8217;a beslemesi gerekti\u011fi anlam\u0131na geldi\u011fini s\u00f6yledi.<\/p>\n<p>Jacoby, &#8220;OCR&#8217;si Pinpoint kadar iyi de\u011fil ama olduk\u00e7a iyi,&#8221; dedi. &#8220;Ancak b\u00fcy\u00fck bir avantaj, \u00e7evrimd\u0131\u015f\u0131 olmas\u0131 her \u015feyi yerel olarak, terminalinizde yapabilirsiniz, bu nedenle hassas i\u015fler i\u00e7in iyidir. Toplu d\u00f6n\u00fc\u015ft\u00fcrmeler i\u00e7in ger\u00e7ekten iyidir. 1.000 belgenin her biri i\u00e7in hepsini OCR yapabilirsiniz.\u201d<\/p>\n<p>\u201cG\u00f6r\u00fcnt\u00fcn\u00fcn kalitesini veya kontrast\u0131 art\u0131rman\u0131z gerekebilir ancak ImageMagick ile g\u00f6r\u00fcnt\u00fcn\u00fcn kalitesini art\u0131rabilirsiniz\u201d diye ekledi. Ayr\u0131ca Jacoby, Wall Street Journal ara\u015ft\u0131rmac\u0131 muhabiri Chad Day&#8217;in Tesseract ve ImageMagick ara\u00e7lar\u0131 hakk\u0131nda Github&#8217;da bulunabilecek ayr\u0131nt\u0131l\u0131 bir k\u0131lavuz <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/github.com\/chadday\/nicar_ocr\">\u00f6nerdi<\/a>\u00a0.<\/p>\n<p>Tesseract \u00e7\u00f6z\u00fcm\u00fc baz\u0131 &#8220;orta&#8221; kodlama becerileri gerektirse de Jacoby, bunun komut sat\u0131r\u0131 becerilerine sahip bir ki\u015finin program\u0131 tek bir ziyarette kurabilece\u011fi ve muhabire iki k\u0131sa sat\u0131r sa\u011flayabilece\u011fi tek seferlik bir senaryo olabilece\u011fini s\u00f6yledi. Daha sonra gelecekteki her veri ay\u0131klamas\u0131 i\u00e7in ekleyebilecekleri. Jacoby, PDF formatlar\u0131nda bas\u0131lm\u0131\u015f tablolar\u0131 \u00e7\u0131karmak i\u00e7in OpenNews ve ProPublica&#8217;dan gazeteciler taraf\u0131ndan olu\u015fturulan daha iyi bilinen bir a\u00e7\u0131k kaynak arac\u0131 olan <a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/tabula.technology\/\">Tabula uygulamas\u0131n\u0131 \u00f6nerdi.\u00a0<\/a><\/p>\n<p>\u201cAsl\u0131nda veri tablolar\u0131n\u0131 PDF&#8217;lerden kurtar\u0131yor ve bunlar\u0131 elektronik tablolara d\u00f6k\u00fcyor\u201d diye a\u00e7\u0131klad\u0131. <span class=\"goog-text-highlight\">Tabula, muhabirlerin istedikleri verileri \u00e7\u0131karmak i\u00e7in bilgisayar ekranlar\u0131nda bir masan\u0131n etraf\u0131na basit\u00e7e bir kutu \u00e7izmelerine ve ayr\u0131ca kenarl\u0131ks\u0131z olanlar da dahil olmak \u00fczere tablolar\u0131 otomatik olarak alg\u0131lamas\u0131na olanak tan\u0131r.\u00a0<\/span><\/p>\n<p>Tabula &#8220;canl\u0131&#8221; veya OCR&#8217;l\u0131 belgeler gerektirirken, arac\u0131n Tesseract taraf\u0131ndan olu\u015fturulan metin dosyalar\u0131yla iyi \u00e7al\u0131\u015ft\u0131\u011f\u0131n\u0131 s\u00f6yledi.\u00a0&#8220;Ayr\u0131ca \u00e7evrimd\u0131\u015f\u0131, bu y\u00fczden \u00e7ok \u00f6zel&#8221; dedi.<\/p>\n<h4>Ek kaynaklar<\/h4>\n<p><em><a href=\"https:\/\/gijn.org\/2021\/07\/28\/digging-up-hidden-data-with-the-web-inspector\/\">Web Denet\u00e7isi ile Gizli Verileri \u00c7\u0131karma<\/a><br \/>\n<\/em><em><a href=\"https:\/\/gijn.org\/2020\/12\/17\/why-web-scraping-is-vital-to-democracy\/\">Web Scraping Neden Demokrasi \u0130\u00e7in \u00c7ok \u00d6nemlidir?<\/a><br \/>\n<\/em><em><a href=\"https:\/\/gijn.org\/2021\/07\/13\/tips-for-building-a-database-for-investigations\/\">Ara\u015ft\u0131rmalar i\u00e7in Veritaban\u0131 Olu\u015fturmaya Y\u00f6nelik \u0130pu\u00e7lar\u0131<\/a><\/em><\/p>\n<hr \/>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-309506 alignleft\" src=\"https:\/\/gijn.org\/wp-content\/uploads\/2021\/02\/Rowan-Philp-140x140-1.png\" alt=\"Rowan-Philp-140x140\" width=\"93\" height=\"93\" srcset=\"https:\/\/gijn.org\/wp-content\/uploads\/2021\/02\/Rowan-Philp-140x140-1.png 140w, https:\/\/gijn.org\/wp-content\/uploads\/2021\/02\/Rowan-Philp-140x140-1-60x60.png 60w\" sizes=\"auto, (max-width: 93px) 100vw, 93px\" \/><em><strong><a href=\"https:\/\/gijn.org\/about\/staff-member\/rowan-philp\/\">Rowan Philp<\/a><\/strong>\u00a0, GIJN i\u00e7in bir muhabirdir.\u00a0Eskiden G\u00fcney Afrika<a rel=\"noopener\" target=\"_blank\" href=\"https:\/\/www.timeslive.co.za\/sunday-times\/\">\u00a0Sunday Times<\/a>\u00a0ba\u015f muhabiriydi .\u00a0Bir d\u0131\u015f muhabir olarak, d\u00fcnya \u00e7ap\u0131nda iki d\u00fczineden fazla \u00fclkeden haberler, siyaset, yolsuzluk ve \u00e7at\u0131\u015fmalar hakk\u0131nda haber yapt\u0131.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>GIJN Toolbox&#8217;\u0131n bu say\u0131s\u0131nda, hantal belgeleri aranabilir elektronik tablolara d\u00f6n\u00fc\u015ft\u00fcrmek i\u00e7in veri \u00e7\u0131karma ve optik karakter tan\u0131ma (OCR) ara\u00e7lar\u0131na ili\u015fkin IRE22 konferans\u0131ndaki en son geli\u015fmeleri inceliyoruz.<\/p>\n","protected":false},"author":3031167,"featured_media":1140317,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_price":"","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"","_tribe_ticket_capacity":"0","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"republication-tracker-tool-hide-widget":false,"footnotes":"","_tec_slr_enabled":"","_tec_slr_layout":""},"categories":[23204,23200],"tags":[14457,13831,14466,14484,14485,13961],"gijn_topic":[18933,18925],"series":[],"gijn_language":[17789],"gijn_region":[18919],"class_list":["post-553287","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ipucu-sayfasi","category-kaynak","tag-arac-ve-teknik","tag-arastirmaci-gazetecilik","tag-gijn-arac-kutusu","tag-veri-cekme","tag-veri-cikarmak","tag-veri-kazima","gijn_topic-arastirma-ipuclari-ve-araclar","gijn_topic-veri-gazeteciligi","gijn_language-tr-tr","gijn_region-afrika-tr"],"acf":[],"ticketed":false,"_links":{"self":[{"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/posts\/553287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/users\/3031167"}],"replies":[{"embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/comments?post=553287"}],"version-history":[{"count":0,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/posts\/553287\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/media\/1140317"}],"wp:attachment":[{"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/media?parent=553287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/categories?post=553287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/tags?post=553287"},{"taxonomy":"gijn_topic","embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/gijn_topic?post=553287"},{"taxonomy":"series","embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/series?post=553287"},{"taxonomy":"gijn_language","embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/gijn_language?post=553287"},{"taxonomy":"gijn_region","embeddable":true,"href":"https:\/\/gijn.org\/tr\/wp-json\/wp\/v2\/gijn_region?post=553287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}